Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at \\href{this https URL}{GitHub}.</p>\n","updatedAt":"2026-07-01T03:25:46.581Z","author":{"_id":"674d31be541f12e9c95275ee","avatarUrl":"/avatars/22233c7078833192274377ebec66e6c1.svg","fullname":"HaitaoWu","name":"Haitao999","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8772860169410706},"editors":["Haitao999"],"editorAvatarUrls":["/avatars/22233c7078833192274377ebec66e6c1.svg"],"reactions":[],"isReport":false}},{"id":"6a45c305ac28cd806e046379","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false},"createdAt":"2026-07-02T01:46:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion](https://huggingface.co/papers/2605.29591) (2026)\n* [Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs](https://huggingface.co/papers/2605.18172) (2026)\n* [Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification](https://huggingface.co/papers/2606.18249) (2026)\n* [From Pixels to Words -- Towards Native One-Vision Models at Scale](https://huggingface.co/papers/2605.28820) (2026)\n* [MindAlign: Decoding Inner Speech from fMRI Signals via Multimodal Embedding Alignment under Limited Data](https://huggingface.co/papers/2606.20696) (2026)\n* [MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding](https://huggingface.co/papers/2605.24523) (2026)\n* [ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations](https://huggingface.co/papers/2606.11188) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.29591\">Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.18172\">Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.18249\">Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.28820\">From Pixels to Words -- Towards Native One-Vision Models at Scale</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.20696\">MindAlign: Decoding Inner Speech from fMRI Signals via Multimodal Embedding Alignment under Limited Data</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.24523\">MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.11188\">ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code>@librarian-bot recommend</code></p>\n","updatedAt":"2026-07-02T01:46:45.776Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7131557464599609},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.30319","authors":[{"_id":"6a44889841f04ae4d7ad97c9","name":"Haitao Wu","hidden":false},{"_id":"6a44889841f04ae4d7ad97ca","name":"Qirui Zhang","hidden":false},{"_id":"6a44889841f04ae4d7ad97cb","name":"Zhouheng Yao","hidden":false},{"_id":"6a44889841f04ae4d7ad97cc","name":"Shangquan Sun","hidden":false},{"_id":"6a44889841f04ae4d7ad97cd","name":"Qihao Zheng","hidden":false},{"_id":"6a44889841f04ae4d7ad97ce","name":"Mianxin Liu","hidden":false},{"_id":"6a44889841f04ae4d7ad97cf","name":"Chi Zhang","hidden":false},{"_id":"6a44889841f04ae4d7ad97d0","name":"Wanli Ouyang","hidden":false},{"_id":"6a44889841f04ae4d7ad97d1","name":"Chunfeng Song","hidden":false},{"_id":"6a44889841f04ae4d7ad97d2","name":"Changqing Zhang","hidden":false},{"_id":"6a44889841f04ae4d7ad97d3","name":"Jiamin Wu","hidden":false}],"publishedAt":"2026-06-29T00:00:00.000Z","submittedOnDailyAt":"2026-07-01T00:00:00.000Z","title":"BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language","submittedOnDailyBy":{"_id":"674d31be541f12e9c95275ee","avatarUrl":"/avatars/22233c7078833192274377ebec66e6c1.svg","isPro":false,"fullname":"HaitaoWu","user":"Haitao999","type":"user","name":"Haitao999"},"summary":"Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at https://github.com/HaitaoWuTJU/BrainJanus{GitHub}.","upvotes":5,"discussionId":"6a44889841f04ae4d7ad97d4","ai_summary":"BrainJanus represents the first unified brain model integrating brain, vision, and language through a shared Omni space, enabling bidirectional mapping between neural activity and sensory stimuli via a tokenized representation and autoregressive architecture.","ai_keywords":["brain encoding","brain decoding","multimodal integration","Unified Brain Tokenizer","Omni space","All-in-One autoregressive architecture","next-token prediction","any-to-any generation","zero-shot generalization","biological topography"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"674d31be541f12e9c95275ee","avatarUrl":"/avatars/22233c7078833192274377ebec66e6c1.svg","isPro":false,"fullname":"HaitaoWu","user":"Haitao999","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"65c4eb7cd1dcbd30d86febec","avatarUrl":"/avatars/001c8f02e8ce794b2c21883628b2da72.svg","isPro":false,"fullname":"free-bit","user":"free-bit","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.30319.md","query":{}}">
BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language
Authors: ,
,
,
,
,
,
,
,
,
,
Abstract
BrainJanus represents the first unified brain model integrating brain, vision, and language through a shared Omni space, enabling bidirectional mapping between neural activity and sensory stimuli via a tokenized representation and autoregressive architecture.
Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at https://github.com/HaitaoWuTJU/BrainJanus{GitHub}.
Community
Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at \href{this https URL}{GitHub}.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.30319 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.30319 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.30319 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.