Arcene Dataset

Arcene Dataset

The arcene dataset was originally used in a NeurIPS (then NIPS) challenge on feature selection in 2003. As a result, this dataset has a very large number of opaque features with not that many examples, and some features are intentionally useless.

You can find more description of the dataset from its UCI ML Repo page.

Data Preprocessing

For the train-test split, we use the original ‘train’ split as our full training data and the original ‘valid’ validation split is used as our ‘test’ split.

We don’t do any further preprocessing of the data.