Details
- Meta announced TRIBE v2 (Trimodal Brain Encoder), a foundation model that predicts how the human brain responds to images, sounds, videos, and text by generating high-resolution fMRI brain activity predictions without requiring actual brain scans.
- The model was trained on brain imaging data from over 700 healthy volunteers exposed to diverse media including podcasts, films, images, and written content, a dramatic expansion from the original TRIBE (Algonauts 2025 award-winner) which used data from just four individuals.
- TRIBE v2 delivers a 70-fold resolution increase over comparable systems and enables zero-shot prediction—forecasting brain responses for new individuals, languages, and tasks without retraining, consistently outperforming standard neuroscience modeling approaches.
- The model uses pretrained audio, video, and text embeddings processed by a transformer architecture to create universal representations across all stimuli, tasks, and individuals, learned from fMRI data tracking blood flow as a proxy for neural activity.
- Meta is releasing the research paper, model weights, code under CC BY-NC license, and an interactive demo website for non-commercial research use, positioning the tool to reduce reliance on human subjects for preliminary hypothesis testing in neuroscience experiments.
Impact
TRIBE v2 accelerates neuroscience research by enabling computational simulation of brain responses, potentially reducing time and cost for clinical trials and treatment development for neurological disorders affecting hundreds of millions. The open release of a multimodal brain encoding model trained on unprecedented neuroimaging scale provides the AI research community with insights into biological neural network processing that could inform future artificial neural network architectures and Meta's broader brain-computer interface roadmap within Reality Labs.