Hugging Face Daily Papers · June 30, 2026 · 3 min read

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

model: <a href=\"https://huggingface.co/nvidia/Real-time_RE-USE\">https://huggingface.co/nvidia/Real-time_RE-USE</a><br>HF Space interactive demo: <a href=\"https://huggingface.co/spaces/nvidia/Real-time_RE-USE\">https://huggingface.co/spaces/nvidia/Real-time_RE-USE</a></p>\n","updatedAt":"2026-06-30T15:23:40.746Z","author":{"_id":"64eedc0acb8740adb13ad95b","avatarUrl":"/avatars/2cca0e914aaa1ec1890a9950336b0ee8.svg","fullname":"Szu-Wei","name":"Weisberger2009","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.5467450022697449},"editors":["Weisberger2009"],"editorAvatarUrls":["/avatars/2cca0e914aaa1ec1890a9950336b0ee8.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.25621","authors":[{"_id":"6a3c9b06f3facdb67e9ff102","name":"Szu-Wei Fu","hidden":false},{"_id":"6a3c9b06f3facdb67e9ff103","name":"Rong Chao","hidden":false},{"_id":"6a3c9b06f3facdb67e9ff104","name":"Xuesong Yang","hidden":false},{"_id":"6a3c9b06f3facdb67e9ff105","name":"Sung-Feng Huang","hidden":false},{"_id":"6a3c9b06f3facdb67e9ff106","name":"Ante Jukić","hidden":false},{"_id":"6a3c9b06f3facdb67e9ff107","name":"Yu Tsao","hidden":false},{"_id":"6a3c9b06f3facdb67e9ff108","name":"Yu-Chiang Frank Wang","hidden":false}],"publishedAt":"2026-06-24T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications","submittedOnDailyBy":{"_id":"64eedc0acb8740adb13ad95b","avatarUrl":"/avatars/2cca0e914aaa1ec1890a9950336b0ee8.svg","isPro":false,"fullname":"Szu-Wei","user":"Weisberger2009","type":"user","name":"Weisberger2009"},"summary":"Different real-time speech applications impose distinct latency budgets, often requiring separately trained enhancement models for each scenario. In this paper, we propose a one-for-all, real-time universal speech enhancement model that provides explicit control over both algorithmic and computational latency. Algorithmic latency is flexibly adjusted via configurable look-ahead frames. To avoid learning inefficiency caused by varying padding configurations, we introduce parallel convolutional layers corresponding to different look-ahead settings. Computational latency is controlled through an early-exit mechanism, enabling inference at different network depths. To narrow the performance gap between specialized and flexible models, we propose a two-stage training strategy with a shared-to-multiple decoder transition. Overall, the proposed framework enables a single model to be deployed across diverse latency budgets without retraining separate models.","upvotes":16,"discussionId":"6a3c9b06f3facdb67e9ff109","projectPage":"https://huggingface.co/nvidia/Real-time_RE-USE","ai_summary":"A universal speech enhancement model with configurable algorithmic and computational latency controls using parallel convolutions and early-exit mechanisms.","ai_keywords":["speech enhancement","real-time","latency budget","look-ahead frames","parallel convolutional layers","early-exit mechanism","two-stage training","shared-to-multiple decoder transition"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65df9200dc3292a8983e5017/Vs5FPVCH-VZBipV3qKTuy.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64eedc0acb8740adb13ad95b","avatarUrl":"/avatars/2cca0e914aaa1ec1890a9950336b0ee8.svg","isPro":false,"fullname":"Szu-Wei","user":"Weisberger2009","type":"user"},{"_id":"666afb91e936f6cbcfc8b50c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/666afb91e936f6cbcfc8b50c/_lcbPagwDTn02TaOSDxUq.jpeg","isPro":false,"fullname":"Chin-Yang Lin","user":"linjohnss","type":"user"},{"_id":"64ae22dd1aee69ece065cdcd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ae22dd1aee69ece065cdcd/JG7QaHIrr4i2k4uwR4pZK.png","isPro":false,"fullname":"Min-Hung Chen","user":"cmhungsteve","type":"user"},{"_id":"6312cab05beb528b5c1500e3","avatarUrl":"/avatars/fc27d245dbf06a41c38aa8da880a02cb.svg","isPro":false,"fullname":"Fu-En Yang","user":"FuEnYang","type":"user"},{"_id":"602e45160daeb0df2a81b244","avatarUrl":"/avatars/f6bf69f0c1342f8cfad05d5775e59bf4.svg","isPro":true,"fullname":"Seokju Cho","user":"hamacojr","type":"user"},{"_id":"6513030fb3a463e17df56edd","avatarUrl":"/avatars/867bd4316b2de758654ad3a84ea868c1.svg","isPro":false,"fullname":"Hyun, Jeongseok","user":"js-hyun","type":"user"},{"_id":"686f806aeb53e7adba46c3de","avatarUrl":"/avatars/db11c84d602cefa72ba409c8292e4191.svg","isPro":true,"fullname":"guoguoc","user":"woshichaoren123","type":"user"},{"_id":"65b33e5f7cd0069ad648c4e8","avatarUrl":"/avatars/1a746ea535cffa92ea08006e05ea414a.svg","isPro":false,"fullname":"Ryo Hachiuma","user":"rhachiuma","type":"user"},{"_id":"63f727b1bd28622c9b951114","avatarUrl":"/avatars/19fec633419e00e5363f0229c7c40b8d.svg","isPro":false,"fullname":"Chaehyun Kim","user":"chyun","type":"user"},{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","isPro":false,"fullname":"Byung-Kwan Lee","user":"BK-Lee","type":"user"},{"_id":"6819c16353612b577d082401","avatarUrl":"/avatars/fc9d5c14230048cabb6a1ac9ac94f8f9.svg","isPro":false,"fullname":"Sung-Feng Huang","user":"sungfengh","type":"user"},{"_id":"64b74920fe6a108d03fed767","avatarUrl":"/avatars/a2c05b809c36fa5fab8e1a43b3e67051.svg","isPro":false,"fullname":"Minki Kang","user":"Nardien","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65df9200dc3292a8983e5017/Vs5FPVCH-VZBipV3qKTuy.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.25621.md","query":{}}">

Papers

arxiv:2606.25621

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

Published on Jun 24

· Submitted by

Szu-Wei on Jun 30

NVIDIA

Upvote

Authors:

Abstract

A universal speech enhancement model with configurable algorithmic and computational latency controls using parallel convolutions and early-exit mechanisms.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Different real-time speech applications impose distinct latency budgets, often requiring separately trained enhancement models for each scenario. In this paper, we propose a one-for-all, real-time universal speech enhancement model that provides explicit control over both algorithmic and computational latency. Algorithmic latency is flexibly adjusted via configurable look-ahead frames. To avoid learning inefficiency caused by varying padding configurations, we introduce parallel convolutional layers corresponding to different look-ahead settings. Computational latency is controlled through an early-exit mechanism, enabling inference at different network depths. To narrow the performance gap between specialized and flexible models, we propose a two-stage training strategy with a shared-to-multiple decoder transition. Overall, the proposed framework enables a single model to be deployed across diverse latency budgets without retraining separate models.