An Automated Machine Learning system for longitudinal classification built on GAMA and Scikit-Longitudinal — Paper · Documentation · PyPi Index
Auto-Sklong is an Automated Machine Learning (AutoML) library developed on top of the
General Machine Learning Assistant (GAMA) framework.
It introduces a dedicated search space
that combines Scikit-Longitudinal and
Scikit-learn models to tackle longitudinal classification tasks.
More specifically, Auto-Sklong is designed to perform combinatorial optimisation of both algorithm selection
and the tuning of their associated hyperparameters in the context of longitudinal machine learning.
Wait, what is Longitudinal Data - In layman's terms?
Longitudinal data is a "time-lapse" snapshot of the same subject, entity, or group tracked over time periods, similar to checking in on patients to see how they change. For instance, doctors may monitor a patient's blood pressure, weight, and cholesterol every year for a decade to identify health trends or risk factors. This data is more useful for predicting future results than a one-time survey because it captures evolution, patterns, and cause-effect throughout time.
Important
We are currently revamping the whole repository to support:
- Longitudinal-data-aware post-hoc ensembling
- Python 3.10+ via Scikit-Longitudinal 0.1.8+
- New documentation
Please bear with us while we work on this. The focus has primarily been on bringing Scikit-Longitudinal support to Python 3.10+, integrating new algorithms from the community, and finalising multi-class support, all of which are now very close to completion.
To install Auto-Sklong:
pip install auto-sklongTo install a specific version:
pip install auto-sklong==0.0.1Tip
Want to use Jupyter Notebook, Marimo, Google Colab, or JupyterLab?
Head to the Getting Started section of the documentation.
Caution
Auto-Sklong is currently compatible with Python 3.9 only.
This limitation stems from the Deep Forest dependency.
Follow updates on this GitHub issue.
Here's how to run AutoML on longitudinal data with Auto-Sklong:
from sklearn.metrics import classification_report
from scikit_longitudinal.data_preparation import LongitudinalDataset
from gama.GamaLongitudinalClassifier import GamaLongitudinalClassifier
# Load your dataset (replace "stroke.csv" with your actual dataset path)
dataset = LongitudinalDataset("./stroke.csv")
# Set up the target column and split the data
dataset.load_data_target_train_test_split(
target_column="class_stroke_wave_4",
)
# Set up feature groups (temporal dependencies)
dataset.setup_features_group(input_data="elsa")
# Initialise the AutoML system
automl = GamaLongitudinalClassifier(
features_group=dataset.feature_groups(),
non_longitudinal_features=dataset.non_longitudinal_features(),
feature_list_names=dataset.data.columns.tolist(),
max_total_time=3600, # Time budget in seconds
)
# Fit the AutoML system
automl.fit(dataset.X_train, dataset.y_train)
# Make predictions
y_pred = automl.predict(dataset.X_test)
# Print the classification report
print(classification_report(dataset.y_test, y_pred))More detailed examples and tutorials can be found in the documentation.
If you use Auto-Sklong in your research, please cite our paper:
@INPROCEEDINGS{10821737,
author={Provost, Simon and Freitas, Alex A.},
booktitle={2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
title={Auto-Sklong: A New AutoML System for Longitudinal Classification},
year={2024},
volume={},
number={},
pages={2021-2028},
keywords={Pipelines;Optimization;Predictive models;Classification algorithms;Conferences;Bioinformatics;Biomedical computing;Automated Machine Learning;AutoML;Longitudinal Classification;Scikit-Longitudinal;GAMA},
doi={10.1109/BIBM62325.2024.10821737}}Auto-Sklong is licensed under the MIT License.