Meta’s NeuralBench: A Unified Benchmark for EEG-Based NeuroAI Models
Introduction
The intersection of deep learning and neuroscience—often called NeuroAI—has grown rapidly. Researchers now adapt self-supervised learning from language and vision to build brain foundation models that can be fine-tuned for tasks like seizure detection or decoding visual perception. Yet evaluating these models has remained chaotic. Different teams use custom preprocessing, different datasets, and report results on narrow tasks, making fair comparisons nearly impossible. Meta AI’s new NeuralBench framework aims to bring order to this field.

The Fragmented State of NeuroAI Benchmarks
Existing benchmarking efforts are scattered. For instance, the MOABB benchmark covers over 148 brain-computer interfacing datasets but only evaluates five downstream tasks. Other tools like EEG-Bench, EEG-FM-Bench, and AdaBrain-Bench each have limited scope—some focus on few datasets, others on a single task family. For modalities like MEG and fMRI, no systematic benchmark exists at all. This fragmentation means claims of “generalizable” or “foundational” models often rely on cherry-picked tasks without a common reference point.
What NeuralBench Offers
NeuralBench v1.0—NeuralBench-EEG—is the largest open benchmark of its kind. It includes:
- 36 downstream tasks covering clinical, cognitive, and BCI domains
- 94 curated datasets from public repositories
- 9,478 subjects and 13,603 hours of EEG recordings
- 14 deep learning architectures evaluated under a unified interface
All models are tested using the same preprocessing pipelines, train/validation splits, and evaluation metrics, enabling direct comparison across tasks.
How NeuralBench Works
The framework is built on three modular Python packages, each handling a distinct stage of the pipeline.
NeuralFetch: Dataset Acquisition
This package handles downloading and curating data from public repositories like OpenNeuro, DANDI, and NEMAR. It ensures data is consistently formatted and versioned, removing the headache of manual collection.
NeuralSet: Data Preparation
Once raw data is fetched, NeuralSet prepares it as PyTorch-ready dataloaders. It wraps existing neuroscience tools like MNE-Python and nilearn for preprocessing, and integrates with Hugging Face to extract stimulus embeddings for tasks involving images, speech, or text.
NeuralTrain: Model Training and Evaluation
NeuralTrain provides modular training code built on PyTorch-Lightning, Pydantic, and the exca execution and caching library. It standardizes hyperparameters, training loops, and evaluation metrics across all models.
Using NeuralBench
After installation via pip install neuralbench, the framework is controlled through a command-line interface (CLI). Running a task involves three simple commands: download the data, prepare the cache, and execute. Every task is configured via a lightweight YAML file that specifies:
- The data source
- Train/validation/test splits
- Preprocessing steps
- Target processing
- Training hyperparameters
- Evaluation metrics
This standardisation ensures reproducibility and enables researchers to easily extend the benchmark with new models or datasets.
Impact and Future Directions
By providing a unified evaluation framework, NeuralBench enables fair comparisons that were previously impossible. It will help identify which architectures truly generalize across EEG tasks, accelerate progress in clinical applications, and guide the development of next-generation brain-computer interfaces. Meta AI has open-sourced the framework under a permissive license, inviting the community to contribute new tasks, datasets, and models. In future releases, the team plans to extend coverage to MEG and fMRI, broadening the scope of NeuroAI benchmarking.
For more details, see the Meta AI research publication.
Related Articles
- How to Adapt Your AI Development Plans After Apple’s Mac Mini Price Surge
- Real-Time Hallucination Correction: A Self-Healing Layer for RAG Systems
- Building a Smart Conference Assistant with .NET's Composable AI Stack: A Q&A Guide
- ConferencePulse: Building a Live AI-Powered Conference Assistant with .NET's Composable AI Stack
- Everything About Why Secure Data Movement Is the Zero Trust Bottleneck Nobody...
- The Quiet Superiority of a 2021 Quantization Method Over Its 2026 Counterpart
- 10 Essential Steps to Craft a High-Performance Knowledge Base for AI Models
- iPhone Push Notification Database Exposed Signal Messages Despite App Deletion, FBI Investigation Reveals