Preview This is a draft of the public start-kit. Baseline numbers and dataset entries are placeholder values until the warm-up split opens on Jul 1, 2026.
K · search GitHub Register
Get started · 15 minutes from clone to leaderboard

Start-kit & baselines.

One pip install, one neuralbench command, one submission file. The start-kit ships the same evaluation harness the organizers use, every baseline reported on the leaderboard, and a deterministic toy split you can iterate on before the warm-up opens.

Python ≥ 3.10 PyTorch ≥ 2.2 CUDA optional BIDS-first MIT licensed
bash · install + smoke-test
1# 1. Install
2$ pip install neuralbench eegdash braindecode
3
4# 2. Three-step CLI workflow
5$ neuralbench eeg audiovisual_stimulus --download
6$ neuralbench eeg audiovisual_stimulus --prepare
7$ neuralbench eeg audiovisual_stimulus -m eegnet
Five steps · clone → submit

From a fresh clone to a leaderboard entry.

The path is identical for every track. Change the track flag, swap the model, and the rest of the pipeline stays the same.

First clone to first submission~ 15 minutes (toy split)
Full warm-up training run2 – 6 GPU-hours
Minimum hardwareCPU works · 8 GB GPU recommended
  1. Step 01 INSTALL

    Install neuralbench.

    ~ 60 seconds · pip

    pip install neuralbench pulls in the benchmark runner, task registry, and evaluator. Pair it with eegdash for streamed BIDS data and braindecode for the baseline model zoo.

    Python≥ 3.10
    PyTorch≥ 2.2 (CUDA optional)
    Disk~ 4 GB for cached tasks
  2. Step 02 DATA

    Pull a track's warm-up split.

    ~ 5 minutes · streamed BIDS

    EEGDash streams the BIDS layout; only the recordings you reference are downloaded. The warm-up split for every track lives under ~/.cache/eegdash/ after the first call.

    Track 1 (EEG-to-IMG)XX GB
    Track 2 (BCI)X.X GB
    Track 3 (Sleep)XX GB
    Track 4 (EMG)X GB
  3. Step 03 TRAIN

    Train a baseline.

    2 – 6 GPU-hours · warm-up split

    Every track ships at least one baseline you can train with a single neuralbench eeg <task> -m <model> call. Task and model names are documented in the NeuralBench registry.

    Default modelEEGNet · X.X M params
    OptimizerAdamW · cosine schedule
    SeedFixed by config
  4. Step 04 SCORE

    Score against the held-out split.

    ~ 1 minute · local

    The evaluator is the exact code we run on Codabench. Run the task locally first with neuralbench eeg <task>; the run writes a predictions Parquet with hyperparameters and the git SHA of your model code for the audit phase.

    Metric (T1)Top-5 retrieval accuracy
    Metric (T2)Balanced accuracy
    Metric (T3)W-bMAE on onset (s)
    Metric (T4)Character error rate
  5. Step 05 SUBMIT

    Upload to Codabench.

    ~ 30 seconds · web UI or API

    Drop the submission.parquet on the Codabench page for your track. Five submissions per team per day during the sealed phase. Final ranking is the best of your last five submissions.

    PlatformCodabench
    Daily cap (final phase)5 / team / day
    Audit at top-of-leaderboardReproducibility check
End-to-end Python · BCI track

A baseline from scratch, in twenty lines.

The same pattern works for every track: load with EEGDash, train with Braindecode, score with NeuralBench. Swap the dataset id and the model; the rest of the pipeline stays the same.

python · train_baseline.py
1from neuralbench.main import Experiment
2
3# Modality + task + model identifiers from the NeuralBench task registry
4exp = Experiment(
5 modality="eeg",
6 task="audiovisual_stimulus",
7 model="eegnet",
8)
9exp.run()
10
11# See: facebookresearch.github.io/neuroai/neuralbench/auto_examples/
bash · same flow, no Python file
1# 1. Download the task's data
2$ neuralbench eeg audiovisual_stimulus --download
3
4# 2. (Optional) Warm the preprocessing cache
5$ neuralbench eeg audiovisual_stimulus --prepare
6
7# 3. Run the benchmark with the EEGNet baseline
8$ neuralbench eeg audiovisual_stimulus -m eegnet
9
10# Try a foundation baseline instead
11$ neuralbench eeg audiovisual_stimulus -m reve
Baselines · what to beat placeholder scores

Pre-computed baselines on representative public datasets.

Numbers below are NeuralBench replications on representative public datasets, mean ± std, taken from the competition proposal's Table 1. They are the warm-up targets to beat. The official warm-up split scores arrive Jul 1, 2026.

Baselines sourceNeuralBench (Banville et al., 2026)
Hosted weightsHuggingFace · neural-interfaces26
LicenseMIT
Track Baseline model Family Params Train (GPU-h) Proposal baseline Code
T1 · EEG-to-IMG Chance reference 0 0 Top-5 2.22 ± 0.31
T1 · EEG-to-IMG EEGNet (Lawhern et al., 2018) CNN 0.04 M 2 Top-5 28.13 ± 0.14 configs/img/eegnet.yaml
T1 · EEG-to-IMG REVE (Elouahidi et al., 2025) Foundation 14 M 0.5 (probe) Top-5 84.75 ± 0.38 configs/img/reve_frozen.yaml
T2 · BCI Chance reference 0 0 Bal. Acc 24.81 ± 1.03
T2 · BCI EEGNet CNN 0.04 M 4 Bal. Acc 58.58 ± 0.34 configs/bci/eegnet.yaml
T2 · BCI REVE Foundation 14 M 1 (probe) Bal. Acc 68.04 ± 0.73 configs/bci/reve.yaml
T3 · Sleep Chance reference 0 0 W-bMAE 205.42 ± 0.01 s
T3 · Sleep EEGNet-sleep Sleep 0.04 M 4 W-bMAE 143.30 ± 0.40 s configs/sleep/eegnet.yaml
T3 · Sleep REVE-sleep Foundation 14 M 1 (probe) W-bMAE 134.89 ± 2.02 s configs/sleep/reve.yaml
T4 · EMG Chance reference 0 0 CER 96.71 ± 0.00 %
T4 · EMG EMG2QwertyNet (Sivakumar et al., 2024) EMG 5.3 M 8 CER 25.14 ± 2.30 % configs/emg/qwerty.yaml
All values mean ± std from NeuralBench (Banville et al., 2026), Table 1. Sleep metric is W-bMAE (weighted-binned MAE) per the proposal §1.4. See live leaderboard →
Rules of the road

Compute, external data, and what counts as "your model".

Compute ALLOWED

Train on whatever you have.

Training compute is uncapped. The single constraint is inference: the scoring container must complete a full test pass in under 60 minutes on one H100 or H200 instance, so that audit cost stays bounded and per-team runtime stays comparable. AWS provides complimentary instance hours to finalists; details are listed on the awards page.

Inference budget60 min · H100 / H200
No capon training
Audit hardwareCodabench container
External data ALLOWED

Pretrain on anything public.

Public, redistributable datasets are fair game. Closed datasets and the sealed test split are not. Declare every external corpus you used in the method description that ships with your final submission; the audit cross-checks this against a re-run of neuralbench eeg <task> from your committed config.

OKOpenNeuro · MOABB · public BIDS
Not OKSealed test · private clinical
Reproducibility AUDITED

The top-of-board has to replay.

Top-3 per track go through a reproducibility audit on Oct 1, 2026. We re-run your training pipeline from the committed config, then score the resulting weights against the sealed test split. If the score is within tolerance of your submission, you stay on the board. If not, you drop.

Tolerance±2 σ on metric
Audit leadArnaud Delorme (EEGLAB)
Foundation track CONSTRAINT

One encoder, four tracks.

Track 5 requires a single set of encoder weights reused across all four tracks — EEG-to-IMG, BCI, Sleep, and EMG. Heads can be track-specific; the encoder cannot. Encoders fine-tuned on sealed evaluation data, or retrained per-track, are rejected at audit, because the comparison would otherwise reduce to four independent single-task models.

EncoderShared · frozen or fine-tuned
HeadsPer-track allowed
Start-kit drops Jun 1, 2026 · Warm-up Jul 1

Ready to run your first baseline?

Clone the start-kit, run the toy split, and you're on the leaderboard in fifteen minutes. Everything is MIT-licensed: fork freely, but please cite the white paper and the 2025 challenge that this work continues.

bash · clone & smoke-test
1$ git clone https://github.com/facebookresearch/neuroai
2$ cd neuroai
3$ pip install -e .
4
5# 30-second sanity check
6$ neuralbench eeg audiovisual_stimulus --debug