Contents

Custom Datasets

Contents

Custom Datasets

It’s possible to add private or other datasets in the benchmark (at least locally). The data needs to be in a NPZ file in a particular format (see the developer’s documentation in the scripts directory), and then you can call the register_dataset function:

from tabben.datasets import register_dataset

register_dataset(
        'name-of-my-dataset',
        'classification',   # or regression
        data_location='https://url.to/a/npz/file/hosted/somewhere.npz',
        outputs=1,
        classes=5,
)

If you want to contribute a dataset to the official benchmark, you could run this with the keyword argument persist=True to save this dataset to the data file and open a pull request to the main repository (assuming built from source). See the contributing guidelines for details.

Helpers

To check that a NPZ dataset file can be loaded by this package (after registering as above), there’s a validate_dataset_file function to check the contents:

from tabben.datasets import validate_dataset_file

validate_dataset_file('path/or/url/to/file.npz')