# Custom Datasets

It's possible to add private or other datasets in the benchmark (at least locally). The data needs to be in a NPZ file in a particular format (see the developer's documentation in the [scripts](https://github.com/TabbenBenchmark/tabben/blob/main/scripts/README.md) directory), and then you can call the `register_dataset` function:
```python
from tabben.datasets import register_dataset

register_dataset(
        'name-of-my-dataset',
        'classification',   # or regression
        data_location='https://url.to/a/npz/file/hosted/somewhere.npz',
        outputs=1,
        classes=5,
)
```

If you want to contribute a dataset to the official benchmark, you could run this with the keyword argument `persist=True` to save this dataset to the data file and open a pull request to the main repository (assuming built from source). See the [contributing guidelines](https://github.com/TabbenBenchmark/tabben/blob/main/CONTRIBUTING.md) for details.

## Helpers

To check that a NPZ dataset file can be loaded by this package (after registering as above), there's a `validate_dataset_file` function to check the contents:
```python
from tabben.datasets import validate_dataset_file

validate_dataset_file('path/or/url/to/file.npz')
```