Working with Autogluon
Contents
Working with Autogluon¶
This guide goes through how to use this package with autogluon hyperparameter tuning package.
Load the train and test datasets¶
First, we’ll go ahead and grab the train and test sets for the arcene data set using the tabben
package.
from tabben.datasets import OpenTabularDataset
train_ds = OpenTabularDataset('./data/', 'arcene')
test_ds = OpenTabularDataset('./data/', 'arcene', split='test')
This dataset has a large number of features, some of which are intentionally meaningless. (The attributes are not assigned to meaningful concepts either.)
print(f'Number of Attributes: {train_ds.num_inputs}')
print(f'Attributes: {train_ds.input_attributes}')
For this dataset, we can get the metric functions that we should use (for consistency across everyone’s runs) for evaluating on the test set. Autogluon will only use 1 metric (that it tests on its validation data set), so we just choose one of them.
from tabben.evaluators.autogluon import get_metrics
eval_metrics = get_metrics(train_ds.task, classes=train_ds.num_classes)
print(eval_metrics)
Train the set of models¶
Now we can use autogluon to automatically train a large set of different models and evaluate on all of them. We’ll use the TabularPredictor
class from autogluon.
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(
eval_metric=eval_metrics[0],
label=train_ds.output_attributes[0],
path='ag-covertype')
predictor.fit(
train_ds.dataframe().head(300), # artificially reduce the size of the dataset for faster demo
presets='medium_quality_faster_train')
We can check to make sure that autogluon inferred the correct task (binary classification for this dataset).
print(predictor.problem_type)
print(predictor.feature_metadata)
Evaluate the model¶
Now, we’re ready to evaluate our dataset. We can evaluate using autogluon’s leaderboard
method and supply our extra metrics that we want to compare by.
X_test = test_ds.dataframe().drop(columns=test_ds.output_attributes)
y_pred = predictor.predict(X_test)
predictor.leaderboard(test_ds.dataframe(), silent=True, extra_metrics=eval_metrics[1:])
(If you’re looking at the leaderboard in the notebook, the ‘score_test’ column represents the auroc metric that was passed to the TabularPredictor
constructor.)
This code was last run with the following versions (if you’re looking at the no-output webpage, see the notebook in the repository for versions):
from importlib.metadata import version
packages = ['autogluon', 'tabben']
for pkg in packages:
print(f'{pkg}: {version(pkg)}')