Hugging Face Daily Papers · June 30, 2026 · 4 min read

How Good Can Linear Models Be for Time-Series Forecasting?

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

<a href=\"https://cdn-uploads.huggingface.co/production/uploads/65e5d21ff959be1ecafb69d7/4Zoiwl6ZQYKxqtEV-cfx8.webp\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/65e5d21ff959be1ecafb69d7/4Zoiwl6ZQYKxqtEV-cfx8.webp\" alt=\"teaser\"></a>\nLinear models aren't weak; their preprocessing is under-tuned. Closed-form Ridge, with only preprocessing tuned (L, r, α, augmentation), matches/beats Transformer, MLP & CNN baselines on 6/8 long-horizon benchmarks at orders-of-magnitude lower cost --- and there's no universal lookback: optimal L* ∝ H^b, exponents from −0.19 to +0.46 across datasets.\n","updatedAt":"2026-06-30T02:26:29.935Z","author":{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","fullname":"Lang Huang","name":"LayneH","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7613971829414368},"editors":["LayneH"],"editorAvatarUrls":["/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg"],"reactions":[],"isReport":false}},{"id":"6a438cb066706fcb45b58a2a","author":{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","fullname":"Lang Huang","name":"LayneH","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-30T09:30:24.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2026-06-30T09:31:02.244Z","author":{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","fullname":"Lang Huang","name":"LayneH","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.27282","authors":[{"_id":"6a420ac00dbbc53604b66af7","name":"Lang Huang","hidden":false},{"_id":"6a420ac00dbbc53604b66af8","name":"Jinglue Xu","hidden":false},{"_id":"6a420ac00dbbc53604b66af9","name":"Luke Darlow","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/65e5d21ff959be1ecafb69d7/SursbHliGYrtPBPcisp_0.mp4"],"publishedAt":"2026-06-25T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"How Good Can Linear Models Be for Time-Series Forecasting?","submittedOnDailyBy":{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","isPro":false,"fullname":"Lang Huang","user":"LayneH","type":"user","name":"LayneH"},"summary":"Time-series forecasting research has been moving steadily toward larger architectures, from specialized transformers to general-purpose foundation models, on the assumption that capacity is what unlocks accuracy. We take the opposite position: most of the gap can be closed at far lower cost by tuning preprocessing rather than scaling models. We use Ridge regression as the testbed, since it has a closed-form solution and interpretable weights, which let the optimal hyperparameters be read off the search directly. We search over context length, local normalization, regularization, and augmentation on eight standard benchmarks and find three patterns. (1) Optimal lookback is strongly series-specific and often non-monotonic in forecast horizon, with fitted power-law exponents ranging from +0.46 on ETTm2 to -0.19 on Exchange and Traffic, challenging the convention that longer horizons need longer history. (2) Normalizing over a learned trailing fraction of the context, rather than its entirety, is almost universally preferred. (3) Series within the same dataset often disagree on hyperparameters; the optimal degree of cross-series sharing varies from fully shared to fully per-series. The resulting models beat prior linear forecasters on most dataset-horizon entries and exceed Transformer, MLP, and CNN baselines on six of eight benchmarks. The optimized hyperparameters also serve as a diagnostic on the data itself, revealing structures that larger models absorb silently into their learned parameters.","upvotes":6,"discussionId":"6a420ac00dbbc53604b66afa","projectPage":"https://sakanaai.github.io/SearchCast/","githubRepo":"https://github.com/SakanaAI/SearchCast","githubRepoAddedBy":"user","ai_summary":"Research demonstrates that preprocessing optimizations, particularly in context length, normalization, and regularization, can significantly improve time-series forecasting accuracy more effectively than scaling model architectures.","ai_keywords":["time-series forecasting","transformers","foundation models","Ridge regression","context length","local normalization","regularization","augmentation","hyperparameters","forecast horizon","cross-series sharing"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":3,"organization":{"_id":"65dfe46e4de6f5d5664ef3af","name":"SakanaAI","fullname":"Sakana AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/644b983f0fbe4830f192c4f5/7bSM-11NGnrhHd15hN4W_.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","isPro":false,"fullname":"Lang Huang","user":"LayneH","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"69784ee157d6f95b148a8402","avatarUrl":"/avatars/9a2e46e99bd6baf3568d721786797c70.svg","isPro":false,"fullname":"le","user":"latentexplorer666","type":"user"},{"_id":"697c8b15a7f796854ef333c4","avatarUrl":"/avatars/94de3a736fac914944f1b57609e3819a.svg","isPro":false,"fullname":"Joel Wang","user":"joelhenwang","type":"user"},{"_id":"651ecd7faa888068acb3f75b","avatarUrl":"/avatars/980856a022bf5a25c268a5be46c08f74.svg","isPro":false,"fullname":"Runavot","user":"Aurjfbkfkehz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65dfe46e4de6f5d5664ef3af","name":"SakanaAI","fullname":"Sakana AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/644b983f0fbe4830f192c4f5/7bSM-11NGnrhHd15hN4W_.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.27282.md","query":{}}">

Papers

arxiv:2606.27282

How Good Can Linear Models Be for Time-Series Forecasting?

Published on Jun 25

· Submitted by

Lang Huang on Jun 30

Sakana AI

Upvote

Authors:

Abstract

Research demonstrates that preprocessing optimizations, particularly in context length, normalization, and regularization, can significantly improve time-series forecasting accuracy more effectively than scaling model architectures.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Time-series forecasting research has been moving steadily toward larger architectures, from specialized transformers to general-purpose foundation models, on the assumption that capacity is what unlocks accuracy. We take the opposite position: most of the gap can be closed at far lower cost by tuning preprocessing rather than scaling models. We use Ridge regression as the testbed, since it has a closed-form solution and interpretable weights, which let the optimal hyperparameters be read off the search directly. We search over context length, local normalization, regularization, and augmentation on eight standard benchmarks and find three patterns. (1) Optimal lookback is strongly series-specific and often non-monotonic in forecast horizon, with fitted power-law exponents ranging from +0.46 on ETTm2 to -0.19 on Exchange and Traffic, challenging the convention that longer horizons need longer history. (2) Normalizing over a learned trailing fraction of the context, rather than its entirety, is almost universally preferred. (3) Series within the same dataset often disagree on hyperparameters; the optimal degree of cross-series sharing varies from fully shared to fully per-series. The resulting models beat prior linear forecasters on most dataset-horizon entries and exceed Transformer, MLP, and CNN baselines on six of eight benchmarks. The optimized hyperparameters also serve as a diagnostic on the data itself, revealing structures that larger models absorb silently into their learned parameters.

View arXiv page View PDF Project page GitHub 3 Add to collection

Community

LayneH

Paper submitter about 23 hours ago

Linear models aren't weak; their preprocessing is under-tuned.
Closed-form Ridge, with only preprocessing tuned (L, r, α, augmentation), matches/beats Transformer, MLP & CNN baselines on 6/8 long-horizon benchmarks at orders-of-magnitude lower cost --- and there's no universal lookback: optimal L* ∝ H^b, exponents from −0.19 to +0.46 across datasets.

LayneH

Paper submitter about 16 hours ago

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.27282

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.27282 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.27282 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.27282 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

How Good Can Linear Models Be for Time-Series Forecasting?

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers