Basic Package Usage
Contents
Basic Package Usage¶
Accessing a Dataset¶
There are PyTorch Dataset objects available from tabben.datasets
. For example,
from tabben.datasets import OpenTabularDataset
from torch.utils.data import DataLoader
# load the arcene dataset (default is train split) and
# save the data to the current directory
ds = OpenTabularDataset('./', 'arcene')
for inputs, labels in DataLoader(ds, batch_size=4):
# do stuff with inputs and labels
pass
All the currently implemented datasets are accessible this way.
We can also access these tabular datasets as either numpy arrays or pandas dataframes:
from tabben.datasets import OpenTabularDataset
# load the training set as numpy arrays (these are NOT copies)
ds = OpenTabularDataset('./', 'covertype') # defaults are numpy arrays of the training set
train_X, train_y = ds.numpy()
# load as a single pandas dataframe
df = ds.dataframe()
ds_inputs = df[ds.input_attributes]
ds_outputs = df[ds.output_attributes]
For a list of all the currently implemented datasets in the benchmark (except for CIFAR10), there’s the function:
from tabben.datasets import list_datasets
print(list_datasets())
Evaluating the Results of a Model¶
There are standard metrics available (either from scikit-metrics or compatible with autogluon). For most cases,
from tabben.evaluators import get_metrics
eval_metrics = get_metrics('classification', classes=2)