Hugging Face Daily Papers · · 7 min read

BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at \\href{this https URL}{GitHub}.</p>\n","updatedAt":"2026-07-01T03:25:46.581Z","author":{"_id":"674d31be541f12e9c95275ee","avatarUrl":"/avatars/22233c7078833192274377ebec66e6c1.svg","fullname":"HaitaoWu","name":"Haitao999","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8772860169410706},"editors":["Haitao999"],"editorAvatarUrls":["/avatars/22233c7078833192274377ebec66e6c1.svg"],"reactions":[],"isReport":false}},{"id":"6a45c305ac28cd806e046379","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false},"createdAt":"2026-07-02T01:46:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion](https://huggingface.co/papers/2605.29591) (2026)\n* [Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs](https://huggingface.co/papers/2605.18172) (2026)\n* [Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification](https://huggingface.co/papers/2606.18249) (2026)\n* [From Pixels to Words -- Towards Native One-Vision Models at Scale](https://huggingface.co/papers/2605.28820) (2026)\n* [MindAlign: Decoding Inner Speech from fMRI Signals via Multimodal Embedding Alignment under Limited Data](https://huggingface.co/papers/2606.20696) (2026)\n* [MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding](https://huggingface.co/papers/2605.24523) (2026)\n* [ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations](https://huggingface.co/papers/2606.11188) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.29591\">Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.18172\">Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.18249\">Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.28820\">From Pixels to Words -- Towards Native One-Vision Models at Scale</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.20696\">MindAlign: Decoding Inner Speech from fMRI Signals via Multimodal Embedding Alignment under Limited Data</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.24523\">MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.11188\">ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code>@librarian-bot recommend</code></p>\n","updatedAt":"2026-07-02T01:46:45.776Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":372,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7131557464599609},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.30319","authors":[{"_id":"6a44889841f04ae4d7ad97c9","name":"Haitao Wu","hidden":false},{"_id":"6a44889841f04ae4d7ad97ca","name":"Qirui Zhang","hidden":false},{"_id":"6a44889841f04ae4d7ad97cb","name":"Zhouheng Yao","hidden":false},{"_id":"6a44889841f04ae4d7ad97cc","name":"Shangquan Sun","hidden":false},{"_id":"6a44889841f04ae4d7ad97cd","name":"Qihao Zheng","hidden":false},{"_id":"6a44889841f04ae4d7ad97ce","name":"Mianxin Liu","hidden":false},{"_id":"6a44889841f04ae4d7ad97cf","name":"Chi Zhang","hidden":false},{"_id":"6a44889841f04ae4d7ad97d0","name":"Wanli Ouyang","hidden":false},{"_id":"6a44889841f04ae4d7ad97d1","name":"Chunfeng Song","hidden":false},{"_id":"6a44889841f04ae4d7ad97d2","name":"Changqing Zhang","hidden":false},{"_id":"6a44889841f04ae4d7ad97d3","name":"Jiamin Wu","hidden":false}],"publishedAt":"2026-06-29T00:00:00.000Z","submittedOnDailyAt":"2026-07-01T00:00:00.000Z","title":"BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language","submittedOnDailyBy":{"_id":"674d31be541f12e9c95275ee","avatarUrl":"/avatars/22233c7078833192274377ebec66e6c1.svg","isPro":false,"fullname":"HaitaoWu","user":"Haitao999","type":"user","name":"Haitao999"},"summary":"Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at https://github.com/HaitaoWuTJU/BrainJanus{GitHub}.","upvotes":5,"discussionId":"6a44889841f04ae4d7ad97d4","ai_summary":"BrainJanus represents the first unified brain model integrating brain, vision, and language through a shared Omni space, enabling bidirectional mapping between neural activity and sensory stimuli via a tokenized representation and autoregressive architecture.","ai_keywords":["brain encoding","brain decoding","multimodal integration","Unified Brain Tokenizer","Omni space","All-in-One autoregressive architecture","next-token prediction","any-to-any generation","zero-shot generalization","biological topography"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"674d31be541f12e9c95275ee","avatarUrl":"/avatars/22233c7078833192274377ebec66e6c1.svg","isPro":false,"fullname":"HaitaoWu","user":"Haitao999","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"65c4eb7cd1dcbd30d86febec","avatarUrl":"/avatars/001c8f02e8ce794b2c21883628b2da72.svg","isPro":false,"fullname":"free-bit","user":"free-bit","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.30319.md","query":{}}">
Papers
arxiv:2606.30319

BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

Published on Jun 29
· Submitted by
HaitaoWu
on Jul 1
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

BrainJanus represents the first unified brain model integrating brain, vision, and language through a shared Omni space, enabling bidirectional mapping between neural activity and sensory stimuli via a tokenized representation and autoregressive architecture.

Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at https://github.com/HaitaoWuTJU/BrainJanus{GitHub}.

Community

Paper submitter about 23 hours ago

Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at \href{this https URL}{GitHub}.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.30319
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.30319 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.30319 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.30319 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers