docs / eeg-emg-foundation-challenge / challenge / leaderboard

⌘ K · search Start-kit Register

Public leaderboard · live during final phase

Leaderboard.

One ranking per track. Refreshed every 15 minutes during the sealed phase. Top-3 per track go through a reproducibility audit at test-freeze — only audited entries appear in the final NeurIPS rankings.

Get on the board → Scoring methodology

Warm-up: Jul 1 — Jul 31 Sealed: Aug 1 — Sep 1 Audit: Oct 1 Final: Nov 1

Submissions snapshot · placeholder refresh · 15 min

184 teams

1,247 submissions

5 tracks

Track 01 · EEG-to-IMG placeholder rows

Evoked visual retrieval — top-5 accuracy.

Rank held-out candidate images from EEG epochs. Targets are frozen DINOv2-giant embeddings. Higher is better.

MetricTop-5 retrieval accuracy

Tie-breakTop-1 accuracy

SponsorAlljoined

# Submission Affiliation Top-5 Δ

#01 REVE-pretrained EPFL · Lausanne 84.7 —

#02 BIOT-large (probe) Yale + ETH Zurich 79.3 −5.4

#03 LaBraM-img Tsinghua 75.1 −9.6

#04 REVE (frozen, baseline) Organizers 71.4 −13.3

#05 EEGConformer-img Donders 52.8 −31.9

#06 EEGNet-visual (baseline) Organizers 28.1 −56.6

N = 6 of 47 submissions · last update 14 min ago How scoring works ↓

Track 02 · BCI decoding placeholder rows

Calibration-stable command decoding.

Generalize motor imagery, mental math, and word association labels to later sessions without recalibration. Higher is better.

MetricBalanced accuracy

Tie-breakMean per-class F1

SponsorMeta FAIR Brain & AI

# Submission Affiliation Bal. Acc Δ

#01 REVE-BCI EPFL · Lausanne 68.0 —

#02 BIOT-bci Yale + ETH Zurich 65.7 −2.3

#03 EEGConformer Donders 62.4 −5.6

#04 SignalJEPA-bci ETH Zurich 60.1 −7.9

#05 EEGNet-baseline Inria Bordeaux 58.6 −9.4

#06 FBCSP-LDA (classical) Organizers · MOABB 54.2 −13.8

N = 6 of 318 submissions · last update 9 min ago How scoring works ↓

Track 03 · Sleep onset placeholder rows

Latency to stable N2 — wearable EEG.

Estimate seconds from recording start to stable sleep onset on consumer-grade EEG. Lower is better.

MetricMean absolute error (s)

Tie-breakMedian absolute error

SponsorInteraXon

# Submission Affiliation MAE (s) Δ

#01 LaBraM-sleep Tsinghua 134.9 —

#02 REVE-sleep EPFL · Lausanne 138.7 +3.8

#03 U-Sleep (transfer) Copenhagen 141.4 +6.5

#04 EEGNet-sleep (baseline) Inria 143.3 +8.4

#05 BIOT-sleep Yale 147.1 +12.2

#06 YASA-rule (classical) Organizers 192.4 +57.5

N = 6 of 91 submissions · last update 11 min ago How scoring works ↓

Track 04 · EMG-to-Text placeholder rows

Wristband EMG to typed text.

Decode typed keystrokes from surface EMG across users, anatomy, and re-placement. Lower is better.

MetricCharacter error rate (%)

Tie-breakWord error rate

SponsorMeta Reality Labs

# Submission Affiliation CER (%) Δ

#01 EMG2QwertyNet-v2 Meta Reality Labs 22.4 —

#02 EMG2QwertyNet (baseline) Organizers 25.1 +2.7

#03 EMG-Conformer Imperial College 26.8 +4.4

#04 wav2vec-EMG CMU 31.9 +9.5

#05 CNN-CTC (baseline) Organizers 38.4 +16.0

N = 5 of 64 submissions · last update 6 min ago How scoring works ↓

Track 05 · Foundation transfer placeholder rows

One shared EEG encoder across three tracks.

Rank shared encoders by their per-track score averaged across EEG-to-IMG, BCI, and Sleep. Normalized so 100 = perfect on every track. Higher is better.

MetricMean rank score (0–100)

ConstraintSingle shared encoder

AuditWeights identity check

# Encoder Affiliation Mean rank Δ

#01 BIOT-large (shared) Yale + ETH Zurich 82.1 —

#02 REVE EPFL · Lausanne 79.8 −2.3

#03 LaBraM Tsinghua 76.5 −5.6

#04 SignalJEPA ETH Zurich 71.2 −10.9

#05 CBraMod Beijing IAR 68.9 −13.2

#06 EEGPT SJTU 66.1 −16.0

N = 6 of 23 submissions · last update 18 min ago How scoring works ↓

Scoring & refresh policy

How a submission becomes a number on this page.

The scoring code is open-source and identical between local neuralbench score and the Codabench server. The only thing the server adds is the sealed test split.

Refresh cadence 15 min

Live during the final phase.

Codabench evaluates each upload immediately. The leaderboard page on this site is regenerated every 15 minutes — there can be a short lag between submission and what you see here.

Final phaseAug 1 — Sep 1, 2026

Daily cap5 / team / day

Aggregation BEST-OF-5

Final score = best of last five.

The public board shows your best-ever number. The final NeurIPS ranking, however, only considers your last five submissions. This rewards focused iteration over exhaustive lottery search.

Public boardBest ever

Final rankingBest of last 5

Reproducibility audit OCT 1

Top-3 per track replay from config.

We re-run the committed training pipeline against the sealed split. Within ±2 σ of the submitted score, you stay on the board. Outside, you drop. Audit is led by Arnaud Delorme (EEGLAB).

Tolerance±2 σ on metric

Audit windowOct 1 — Nov 1

Anonymity OPT-IN

You choose when to reveal.

You can submit under an anonymized handle during the public phase. Affiliations only appear on the board after you opt in — usually right before the final ranking publishes on Nov 1. Useful for double-blind paper submissions.

Reveal deadlineOct 25, 2026

DefaultAnonymous handle

Formal definitions

Scoring math, mirrored from the proposal.

These are the equations the evaluator actually runs. They appear in §Error bars, §Test-set sizing, and §Overall ranking of the NeurIPS proposal and are reproduced here for participants who want to reason about score variance and ranking before submitting.

Error bars and significance

Higher-is-better tracks (EEG-to-IMG, BCI) maximise the primary metric; lower-is-better tracks (sleep onset, EMG-to-text) minimise the error. For pairwise significance within each track, each of \(B = 10{,}000\) paired bootstrap draws yields a difference between two teams' scores. Let \(\mathcal{S}^{(b)}_{\mathrm{team}_1}\) and \(\mathcal{S}^{(b)}_{\mathrm{team}_2}\) denote the two teams' scores at draw \(b\), with \(\Pr_b[\cdot]\) the empirical probability over those draws.

\[ \Delta^{(b)}_{\mathrm{team}_1, \mathrm{team}_2} = \mathcal{S}^{(b)}_{\mathrm{team}_1} - \mathcal{S}^{(b)}_{\mathrm{team}_2} \]

The two-sided bootstrap p-value \(p_{\mathrm{boot}} = 2\min(\Pr_b[\Delta^{(b)} \le 0],\; \Pr_b[\Delta^{(b)} \ge 0])\) is Holm-adjusted across the family of prize-relevant comparisons to control the family-wise error rate in the post-competition analysis. The public leaderboard ranks by point estimate and flags neighbours with unadjusted \(p_{\mathrm{boot}} > 0.05\) (equivalently, paired CI of \(\Delta\) contains zero) as statistically indistinguishable. Top-1, top-3, and top-5 rank stability (the bootstrap probability that a team holds those positions) accompanies each rank.

Test-set sizing

The hidden-test size for each track is chosen so the expected half-width of the 95% interval falls below \(\nu_t\), the smallest practically meaningful difference for that track. \(\hat{\sigma}_t\) is the pilot standard deviation at the top-level bootstrap unit and \(n_{\mathrm{eff},t}\) is the number of independent held-out units. If a dataset cannot support this target, intervals widen and ties are reported rather than over-interpreting small margins.

\[ 1.96\,\hat{\sigma}_t / \sqrt{n_{\mathrm{eff},t}} \le \nu_t \]

Overall ranking

Each valid submission gets rank points \(P_{\mathrm{team},t}\) on its track (linearly interpolated against the field, so the top of the field scores 1 and the bottom scores 0). The submitted-track average summarises a team's record across the tracks it entered; the all-track score averages over all four task-specific tracks, padding missing tracks with zero so transfer is rewarded over single-track wins. \(r_{\mathrm{team},t}\) is the team's rank, \(N_t\) is the number of valid submissions on the track, and \(T_{\mathrm{team}}\) is the set of tracks the team submitted.

\[ P_{\mathrm{team},t} = \begin{cases} 1-\dfrac{r_{\mathrm{team},t}-1}{N_t-1}, & N_t>1, \\[4pt] 1, & N_t=1, \end{cases} \qquad \mathcal{S}_{\mathrm{submitted}}(\mathrm{team}) = \frac{1}{|T_{\mathrm{team}}|}\sum_{t\in T_{\mathrm{team}}} P_{\mathrm{team},t} \] \[ \mathcal{S}_{\mathrm{all}}(\mathrm{team}) = \frac{1}{4}\sum_{t\in T} P^{\star}_{\mathrm{team},t}, \qquad P^{\star}_{\mathrm{team},t} = \begin{cases} P_{\mathrm{team},t}, & t\in T_{\mathrm{team}}, \\ 0, & t\notin T_{\mathrm{team}}. \end{cases} \]

Warm-up opens Jul 1, 2026

Take a baseline and beat it.

Every track ships at least one fully-trained baseline. The start-kit walks you from clone to submission.parquet in fifteen minutes. From there, it's a leaderboard fight.

Open the start-kit → See the prizes

bash · score + upload

 1$ neuralbench score bci --out submission.parquet
 2$ neuralbench upload submission.parquet --track bci
 3  ↳ uploaded · queued #1248