Hugging Face Daily Papers · · 6 min read

Beyond IID: How General Are Tabular Foundation Models, Really?

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

How General Are Tabular Foundation Models, Really? 🤨</p>\n<p>Our new benchmark, BeyondArena, from our paper \"Beyond IID: How General Are Tabular Foundation Models, Really?\" shows across 142 curated datasets:</p>\n<p>→ Tabular foundation models excel on tiny-to-medium IID data🥊<br>→ Tree-based and deep learning models still dominate on non-IID, large, and high-dimensional data❗</p>\n<p>BeyondArena extends TabArena-v0.1 from only small-to-medium IID data to the hard cases (temporal splits, grouped data, very large or very high-dimensional tables, free-text columns, high-cardinality features). BeyondArena allows model developers to tackle the challenges that would make a model truly foundational. It spans:</p>\n<ul>\n<li>Diverse tabular task types: IID, non-IID temporal, and non-IID grouped</li>\n<li>A wide range of dataset sizes and feature dimensionalities (from 100 to 1 million rows, from 3 to 22k features)</li>\n<li>Hard feature types: free text and high-cardinality categorical</li>\n<li>Datasets drawn from a broad range of disciplines</li>\n</ul>\n<p>To make this kind of benchmarking rigorous, we also release Data Foundry, a Python framework and metadata schema for curating tabular datasets for predictive ML.</p>\n<p>How to engage with BeyondArena? </p>\n<ul>\n<li>Read the Paper: <a href=\"https://arxiv.org/abs/2606.30410\" rel=\"nofollow\">https://arxiv.org/abs/2606.30410</a></li>\n<li>Run a benchmark with our code: <a href=\"https://tabarena.ai/code\" rel=\"nofollow\">https://tabarena.ai/code</a></li>\n<li>Curate a new dataset with Data Foundry: <a href=\"https://github.com/TabArena/data-foundry\" rel=\"nofollow\">https://github.com/TabArena/data-foundry</a></li>\n<li>Check out our dataset curation notes: <a href=\"https://tabarena.github.io/data-foundry/\" rel=\"nofollow\">https://tabarena.github.io/data-foundry/</a></li>\n<li>Investigate our datasets: <a href=\"https://huggingface.co/datasets/TabArena/BeyondArena\">https://huggingface.co/datasets/TabArena/BeyondArena</a></li>\n</ul>\n<p>Relation to TabArena: BeyondArena is our research work towards TabArena-v0.2, the next generation of tabular benchmarking. Right now, BeyondArena is already fully integrated into the TabArena ecosystem! We will go from BeyondArena to TabArena-v0.2 after adding more datasets and baselines. Our recommendation is to ensure your model works well on TabArena-v0.1, then try it on BeyondArena! </p>\n<p><a href=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/LqDHCiByFzf_RhSeLvoeA.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/LqDHCiByFzf_RhSeLvoeA.png\" alt=\"Screenshot 2026-06-30 094431\"></a><br><a href=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/w2USyojoPOpC5rQV3RHaz.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/w2USyojoPOpC5rQV3RHaz.png\" alt=\"Screenshot 2026-06-30 094727\"></a><br><a href=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/N2G0E5_HHVwTD4W_eO2YP.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/N2G0E5_HHVwTD4W_eO2YP.png\" alt=\"Screenshot 2026-06-30 094540\"></a><br><a href=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/P8DYD6pyVgdMLg9F4bgf2.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/P8DYD6pyVgdMLg9F4bgf2.png\" alt=\"Screenshot 2026-06-30 094512\"></a><br><a href=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/NabG4UELj9n039rh0tF5u.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/677d00ac70b9142c01cc90f9/NabG4UELj9n039rh0tF5u.png\" alt=\"Screenshot 2026-06-30 094456\"></a></p>\n","updatedAt":"2026-06-30T10:11:28.786Z","author":{"_id":"677d00ac70b9142c01cc90f9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/677d00ac70b9142c01cc90f9/c2utzkEFwdUHPLN8-_AHb.png","fullname":"Lennart Purucker","name":"LennartPurucker","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7344551086425781},"editors":["LennartPurucker"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/677d00ac70b9142c01cc90f9/c2utzkEFwdUHPLN8-_AHb.png"],"reactions":[{"reaction":"🔥","users":["Tuana","innixma"],"count":2}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.30410","authors":[{"_id":"6a436c5c763f63ca3757eb04","name":"Lennart Purucker","hidden":false},{"_id":"6a436c5c763f63ca3757eb05","name":"Andrej Tschalzev","hidden":false},{"_id":"6a436c5c763f63ca3757eb06","name":"Nick Erickson","hidden":false},{"_id":"6a436c5c763f63ca3757eb07","name":"Gioia Blayer","hidden":false},{"_id":"6a436c5c763f63ca3757eb08","name":"David Holzmüller","hidden":false},{"_id":"6a436c5c763f63ca3757eb09","name":"Alan Arazi","hidden":false},{"_id":"6a436c5c763f63ca3757eb0a","name":"Alexander Pfefferle","hidden":false},{"_id":"6a436c5c763f63ca3757eb0b","name":"Mustafa Tajjar","hidden":false},{"_id":"6a436c5c763f63ca3757eb0c","name":"Gaël Varoquaux","hidden":false},{"_id":"6a436c5c763f63ca3757eb0d","name":"Frank Hutter","hidden":false}],"publishedAt":"2026-06-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"Beyond IID: How General Are Tabular Foundation Models, Really?","submittedOnDailyBy":{"_id":"677d00ac70b9142c01cc90f9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/677d00ac70b9142c01cc90f9/c2utzkEFwdUHPLN8-_AHb.png","isPro":false,"fullname":"Lennart Purucker","user":"LennartPurucker","type":"user","name":"LennartPurucker"},"summary":"Foundation models for predictive machine learning on tabular data have recently gained significant traction in academia and industry. Research communities across disciplines are increasingly evaluating tabular foundation models on diverse datasets and tasks. However, these task- and discipline-specific evaluations remain largely inaccessible to model researchers because benchmark software and evaluation protocols are fragmented. As a result, model researchers rely on standard benchmarks, which are mostly defined for tasks where tabular foundation models already excel. The most challenging scenarios are excluded, limiting meaningful progress in the field by focusing on marginal improvements on IID data rather than on broader, more demanding challenges. To overcome this, we introduce BeyondArena, the first unified holistic benchmark for tabular data that supports diverse task types (IID, temporal, grouped), across sample size and feature dimensionality scales, with diverse feature types (with text, with high cardinality) from a broad range of disciplines. To enable unified benchmarking beyond standard benchmarks, we introduce Data Foundry, a Python framework and metadata schema for curating tabular datasets for predictive machine learning. Our results across 11 models and 142 curated datasets show that existing tabular foundation models excel on tiny- to medium-sized IID data, while traditional tree-based and deep learning models still dominate on non-IID, large, and high-dimensional datasets. BeyondArena guides model research for the most demanding challenges in tabular data, enabling progress towards truly foundational tabular models.","upvotes":36,"discussionId":"6a436c5d763f63ca3757eb0e","projectPage":"https://tabarena.ai/","ai_summary":"Tabular foundation models show varying performance across different data conditions, with traditional methods still outperforming newer approaches on complex, large-scale datasets.","ai_keywords":["tabular foundation models","predictive machine learning","benchmarking","Data Foundry","IID data","non-IID data","high-dimensional datasets","tree-based models","deep learning models"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"681759b3ea68a1d61ecf7cb2","name":"TabArena","fullname":"TabArena","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/677d00ac70b9142c01cc90f9/CnJrKydixYX6R10mHP1B0.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"677d00ac70b9142c01cc90f9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/677d00ac70b9142c01cc90f9/c2utzkEFwdUHPLN8-_AHb.png","isPro":false,"fullname":"Lennart Purucker","user":"LennartPurucker","type":"user"},{"_id":"696dfad5489de75a728ec14a","avatarUrl":"/avatars/3c3964af9d3e6a36f47efcccf032d419.svg","isPro":false,"fullname":"Alan Arazi","user":"alanprior","type":"user"},{"_id":"64802fb6c57f629056c59966","avatarUrl":"/avatars/d5ecabaceeba759969855acf512b6649.svg","isPro":false,"fullname":"Eilam Shapira","user":"EilamSha","type":"user"},{"_id":"682e21a3f9c621a2f8eb4278","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/E4v1-RMgiOn9SvkzNQiLW.png","isPro":false,"fullname":"Andrej Tschalzev","user":"atschalz","type":"user"},{"_id":"610c1e1a423fe7d80928aefd","avatarUrl":"/avatars/8591584d678cf7fddace01e223953a63.svg","isPro":false,"fullname":"Itay Itzhak","user":"itay1itzhak","type":"user"},{"_id":"666ab6d38b6feadc10367851","avatarUrl":"/avatars/9eca7e2b33a3edbdff9e23904268d023.svg","isPro":false,"fullname":"Tomer Keren","user":"tomer-keren","type":"user"},{"_id":"697c8b15a7f796854ef333c4","avatarUrl":"/avatars/94de3a736fac914944f1b57609e3819a.svg","isPro":false,"fullname":"Joel Wang","user":"joelhenwang","type":"user"},{"_id":"62f78cb8b9fda55613c91dff","avatarUrl":"/avatars/6781e4f18eeae38f5befbed519fe41f8.svg","isPro":false,"fullname":"Yaniv Nikankin","user":"yanivnik","type":"user"},{"_id":"60d84af7eac5e05d4594f010","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60d84af7eac5e05d4594f010/KnGxUR7OUOAGg0S67tRaY.png","isPro":false,"fullname":"Alan Arazi","user":"alana89","type":"user"},{"_id":"69b15cf17337febadc61ce1b","avatarUrl":"/avatars/e8361301f2009ba0b25330e8a63b59b4.svg","isPro":false,"fullname":"Georg Grab","user":"ggprior","type":"user"},{"_id":"677e687b21a72e51d996b366","avatarUrl":"/avatars/0f968c985e9744247b8a7f213d0f47ea.svg","isPro":false,"fullname":"Sauraj","user":"saurajg","type":"user"},{"_id":"651ecd7faa888068acb3f75b","avatarUrl":"/avatars/980856a022bf5a25c268a5be46c08f74.svg","isPro":false,"fullname":"Runavot","user":"Aurjfbkfkehz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"681759b3ea68a1d61ecf7cb2","name":"TabArena","fullname":"TabArena","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/677d00ac70b9142c01cc90f9/CnJrKydixYX6R10mHP1B0.png"},"query":{}}">
Papers
arxiv:2606.30410

Beyond IID: How General Are Tabular Foundation Models, Really?

Published on Jun 29
· Submitted by
Lennart Purucker
on Jun 30
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Tabular foundation models show varying performance across different data conditions, with traditional methods still outperforming newer approaches on complex, large-scale datasets.

Foundation models for predictive machine learning on tabular data have recently gained significant traction in academia and industry. Research communities across disciplines are increasingly evaluating tabular foundation models on diverse datasets and tasks. However, these task- and discipline-specific evaluations remain largely inaccessible to model researchers because benchmark software and evaluation protocols are fragmented. As a result, model researchers rely on standard benchmarks, which are mostly defined for tasks where tabular foundation models already excel. The most challenging scenarios are excluded, limiting meaningful progress in the field by focusing on marginal improvements on IID data rather than on broader, more demanding challenges. To overcome this, we introduce BeyondArena, the first unified holistic benchmark for tabular data that supports diverse task types (IID, temporal, grouped), across sample size and feature dimensionality scales, with diverse feature types (with text, with high cardinality) from a broad range of disciplines. To enable unified benchmarking beyond standard benchmarks, we introduce Data Foundry, a Python framework and metadata schema for curating tabular datasets for predictive machine learning. Our results across 11 models and 142 curated datasets show that existing tabular foundation models excel on tiny- to medium-sized IID data, while traditional tree-based and deep learning models still dominate on non-IID, large, and high-dimensional datasets. BeyondArena guides model research for the most demanding challenges in tabular data, enabling progress towards truly foundational tabular models.

Community

This comment has been hidden

How General Are Tabular Foundation Models, Really? 🤨

Our new benchmark, BeyondArena, from our paper "Beyond IID: How General Are Tabular Foundation Models, Really?" shows across 142 curated datasets:

→ Tabular foundation models excel on tiny-to-medium IID data🥊
→ Tree-based and deep learning models still dominate on non-IID, large, and high-dimensional data❗

BeyondArena extends TabArena-v0.1 from only small-to-medium IID data to the hard cases (temporal splits, grouped data, very large or very high-dimensional tables, free-text columns, high-cardinality features). BeyondArena allows model developers to tackle the challenges that would make a model truly foundational. It spans:

  • Diverse tabular task types: IID, non-IID temporal, and non-IID grouped
  • A wide range of dataset sizes and feature dimensionalities (from 100 to 1 million rows, from 3 to 22k features)
  • Hard feature types: free text and high-cardinality categorical
  • Datasets drawn from a broad range of disciplines

To make this kind of benchmarking rigorous, we also release Data Foundry, a Python framework and metadata schema for curating tabular datasets for predictive ML.

How to engage with BeyondArena?

Relation to TabArena: BeyondArena is our research work towards TabArena-v0.2, the next generation of tabular benchmarking. Right now, BeyondArena is already fully integrated into the TabArena ecosystem! We will go from BeyondArena to TabArena-v0.2 after adding more datasets and baselines. Our recommendation is to ensure your model works well on TabArena-v0.1, then try it on BeyondArena!

Screenshot 2026-06-30 094431
Screenshot 2026-06-30 094727
Screenshot 2026-06-30 094540
Screenshot 2026-06-30 094512
Screenshot 2026-06-30 094456

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.30410 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.30410 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers