<a href=\"https://cdn-uploads.huggingface.co/production/uploads/65e5d21ff959be1ecafb69d7/4Zoiwl6ZQYKxqtEV-cfx8.webp\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/65e5d21ff959be1ecafb69d7/4Zoiwl6ZQYKxqtEV-cfx8.webp\" alt=\"teaser\"></a></p>\n<p><strong>Linear models aren't weak; their preprocessing is under-tuned</strong>.<br>Closed-form Ridge, with only preprocessing tuned (L, r, α, augmentation), matches/beats Transformer, MLP & CNN baselines on 6/8 long-horizon benchmarks at orders-of-magnitude lower cost --- and there's no universal lookback: optimal L* ∝ H^b, exponents from −0.19 to +0.46 across datasets.</p>\n","updatedAt":"2026-06-30T02:26:29.935Z","author":{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","fullname":"Lang Huang","name":"LayneH","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7613971829414368},"editors":["LayneH"],"editorAvatarUrls":["/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg"],"reactions":[],"isReport":false}},{"id":"6a438cb066706fcb45b58a2a","author":{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","fullname":"Lang Huang","name":"LayneH","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-30T09:30:24.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2026-06-30T09:31:02.244Z","author":{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","fullname":"Lang Huang","name":"LayneH","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.27282","authors":[{"_id":"6a420ac00dbbc53604b66af7","name":"Lang Huang","hidden":false},{"_id":"6a420ac00dbbc53604b66af8","name":"Jinglue Xu","hidden":false},{"_id":"6a420ac00dbbc53604b66af9","name":"Luke Darlow","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/65e5d21ff959be1ecafb69d7/SursbHliGYrtPBPcisp_0.mp4"],"publishedAt":"2026-06-25T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"How Good Can Linear Models Be for Time-Series Forecasting?","submittedOnDailyBy":{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","isPro":false,"fullname":"Lang Huang","user":"LayneH","type":"user","name":"LayneH"},"summary":"Time-series forecasting research has been moving steadily toward larger architectures, from specialized transformers to general-purpose foundation models, on the assumption that capacity is what unlocks accuracy. We take the opposite position: most of the gap can be closed at far lower cost by tuning preprocessing rather than scaling models. We use Ridge regression as the testbed, since it has a closed-form solution and interpretable weights, which let the optimal hyperparameters be read off the search directly. We search over context length, local normalization, regularization, and augmentation on eight standard benchmarks and find three patterns. (1) Optimal lookback is strongly series-specific and often non-monotonic in forecast horizon, with fitted power-law exponents ranging from +0.46 on ETTm2 to -0.19 on Exchange and Traffic, challenging the convention that longer horizons need longer history. (2) Normalizing over a learned trailing fraction of the context, rather than its entirety, is almost universally preferred. (3) Series within the same dataset often disagree on hyperparameters; the optimal degree of cross-series sharing varies from fully shared to fully per-series. The resulting models beat prior linear forecasters on most dataset-horizon entries and exceed Transformer, MLP, and CNN baselines on six of eight benchmarks. The optimized hyperparameters also serve as a diagnostic on the data itself, revealing structures that larger models absorb silently into their learned parameters.","upvotes":6,"discussionId":"6a420ac00dbbc53604b66afa","projectPage":"https://sakanaai.github.io/SearchCast/","githubRepo":"https://github.com/SakanaAI/SearchCast","githubRepoAddedBy":"user","ai_summary":"Research demonstrates that preprocessing optimizations, particularly in context length, normalization, and regularization, can significantly improve time-series forecasting accuracy more effectively than scaling model architectures.","ai_keywords":["time-series forecasting","transformers","foundation models","Ridge regression","context length","local normalization","regularization","augmentation","hyperparameters","forecast horizon","cross-series sharing"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":3,"organization":{"_id":"65dfe46e4de6f5d5664ef3af","name":"SakanaAI","fullname":"Sakana AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/644b983f0fbe4830f192c4f5/7bSM-11NGnrhHd15hN4W_.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65e5d21ff959be1ecafb69d7","avatarUrl":"/avatars/5f7d0e99cd5c06623768d56c27c1cf93.svg","isPro":false,"fullname":"Lang Huang","user":"LayneH","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"69784ee157d6f95b148a8402","avatarUrl":"/avatars/9a2e46e99bd6baf3568d721786797c70.svg","isPro":false,"fullname":"le","user":"latentexplorer666","type":"user"},{"_id":"697c8b15a7f796854ef333c4","avatarUrl":"/avatars/94de3a736fac914944f1b57609e3819a.svg","isPro":false,"fullname":"Joel Wang","user":"joelhenwang","type":"user"},{"_id":"651ecd7faa888068acb3f75b","avatarUrl":"/avatars/980856a022bf5a25c268a5be46c08f74.svg","isPro":false,"fullname":"Runavot","user":"Aurjfbkfkehz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65dfe46e4de6f5d5664ef3af","name":"SakanaAI","fullname":"Sakana AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/644b983f0fbe4830f192c4f5/7bSM-11NGnrhHd15hN4W_.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.27282.md","query":{}}">
How Good Can Linear Models Be for Time-Series Forecasting?
Abstract
Research demonstrates that preprocessing optimizations, particularly in context length, normalization, and regularization, can significantly improve time-series forecasting accuracy more effectively than scaling model architectures.
Time-series forecasting research has been moving steadily toward larger architectures, from specialized transformers to general-purpose foundation models, on the assumption that capacity is what unlocks accuracy. We take the opposite position: most of the gap can be closed at far lower cost by tuning preprocessing rather than scaling models. We use Ridge regression as the testbed, since it has a closed-form solution and interpretable weights, which let the optimal hyperparameters be read off the search directly. We search over context length, local normalization, regularization, and augmentation on eight standard benchmarks and find three patterns. (1) Optimal lookback is strongly series-specific and often non-monotonic in forecast horizon, with fitted power-law exponents ranging from +0.46 on ETTm2 to -0.19 on Exchange and Traffic, challenging the convention that longer horizons need longer history. (2) Normalizing over a learned trailing fraction of the context, rather than its entirety, is almost universally preferred. (3) Series within the same dataset often disagree on hyperparameters; the optimal degree of cross-series sharing varies from fully shared to fully per-series. The resulting models beat prior linear forecasters on most dataset-horizon entries and exceed Transformer, MLP, and CNN baselines on six of eight benchmarks. The optimized hyperparameters also serve as a diagnostic on the data itself, revealing structures that larger models absorb silently into their learned parameters.
Community

Linear models aren't weak; their preprocessing is under-tuned.
Closed-form Ridge, with only preprocessing tuned (L, r, α, augmentation), matches/beats Transformer, MLP & CNN baselines on 6/8 long-horizon benchmarks at orders-of-magnitude lower cost --- and there's no universal lookback: optimal L* ∝ H^b, exponents from −0.19 to +0.46 across datasets.
This comment has been hidden Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.27282 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.27282 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.27282 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.