Working with Autogluon¶

This guide goes through how to use this package with autogluon hyperparameter tuning package.

Load the train and test datasets¶

First, we’ll go ahead and grab the train and test sets for the arcene data set using the tabben package.

from tabben.datasets import OpenTabularDataset

train_ds = OpenTabularDataset('./data/', 'arcene')
test_ds = OpenTabularDataset('./data/', 'arcene', split='test')

This dataset has a large number of features, some of which are intentionally meaningless. (The attributes are not assigned to meaningful concepts either.)

print(f'Number of Attributes: {train_ds.num_inputs}')
print(f'Attributes: {train_ds.input_attributes}')

For this dataset, we can get the metric functions that we should use (for consistency across everyone’s runs) for evaluating on the test set. Autogluon will only use 1 metric (that it tests on its validation data set), so we just choose one of them.

from tabben.evaluators.autogluon import get_metrics

eval_metrics = get_metrics(train_ds.task, classes=train_ds.num_classes)

print(eval_metrics)

Train the set of models¶

Now we can use autogluon to automatically train a large set of different models and evaluate on all of them. We’ll use the TabularPredictor class from autogluon.

from autogluon.tabular import TabularPredictor

predictor = TabularPredictor(
    eval_metric=eval_metrics[0],
    label=train_ds.output_attributes[0], 
    path='ag-covertype')

predictor.fit(
    train_ds.dataframe().head(300),  # artificially reduce the size of the dataset for faster demo
    presets='medium_quality_faster_train')

We can check to make sure that autogluon inferred the correct task (binary classification for this dataset).

print(predictor.problem_type)
print(predictor.feature_metadata)

Evaluate the model¶

Now, we’re ready to evaluate our dataset. We can evaluate using autogluon’s leaderboard method and supply our extra metrics that we want to compare by.

X_test = test_ds.dataframe().drop(columns=test_ds.output_attributes)

y_pred = predictor.predict(X_test)

predictor.leaderboard(test_ds.dataframe(), silent=True, extra_metrics=eval_metrics[1:])

(If you’re looking at the leaderboard in the notebook, the ‘score_test’ column represents the auroc metric that was passed to the TabularPredictor constructor.)

This code was last run with the following versions (if you’re looking at the no-output webpage, see the notebook in the repository for versions):

from importlib.metadata import version

packages = ['autogluon', 'tabben']

for pkg in packages:
    print(f'{pkg}: {version(pkg)}')

Tabben 0.0.8.dev0 documentation

Working with Autogluon

Contents

Working with Autogluon¶

Load the train and test datasets¶

Train the set of models¶

Evaluate the model¶