Hugging Face Daily Papers · June 30, 2026 · 3 min read

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Accepted by ECCV 2026, project page: <a href=\"https://live-edit.github.io/\" rel=\"nofollow\">https://live-edit.github.io/</a></p>\n","updatedAt":"2026-06-30T03:58:14.674Z","author":{"_id":"63871e72189d6915f1d3a87b","avatarUrl":"/avatars/4978b3ad59b5712c73fa692193837ecd.svg","fullname":"Jack Ma","name":"YueMafighting","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7158933877944946},"editors":["YueMafighting"],"editorAvatarUrls":["/avatars/4978b3ad59b5712c73fa692193837ecd.svg"],"reactions":[{"reaction":"👍","users":["Qinghew","YueMafighting","fffffky","aileshuo"],"count":4}],"isReport":false}},{"id":"6a43642ae1716b2db387bb6c","author":{"_id":"66e93deadff507737f0f39f5","avatarUrl":"/avatars/aaa4e8c54b66ea3cb091a0b5a6f72131.svg","fullname":"Feng","name":"fffffky","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-30T06:37:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Very impressive!","html":"<p>Very impressive!</p>\n","updatedAt":"2026-06-30T06:37:30.176Z","author":{"_id":"66e93deadff507737f0f39f5","avatarUrl":"/avatars/aaa4e8c54b66ea3cb091a0b5a6f72131.svg","fullname":"Feng","name":"fffffky","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.46779656410217285},"editors":["fffffky"],"editorAvatarUrls":["/avatars/aaa4e8c54b66ea3cb091a0b5a6f72131.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.26740","authors":[{"_id":"6a43289e763f63ca3757e831","name":"Xinyu Wang","hidden":false},{"_id":"6a43289e763f63ca3757e832","name":"Chongbo Zhao","hidden":false},{"_id":"6a43289e763f63ca3757e833","name":"Fangneng Zhan","hidden":false},{"_id":"6a43289e763f63ca3757e834","name":"Yue Ma","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/63871e72189d6915f1d3a87b/60OGJiv2KA8MY07jcYmZ3.qt"],"publishedAt":"2026-06-25T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing","submittedOnDailyBy":{"_id":"63871e72189d6915f1d3a87b","avatarUrl":"/avatars/4978b3ad59b5712c73fa692193837ecd.svg","isPro":false,"fullname":"Jack Ma","user":"YueMafighting","type":"user","name":"YueMafighting"},"summary":"Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly developed for synthesis and cannot be directly applied to editing due to the strict preservation requirement and region-specific control. In this work, we present a novel streaming video editing framework that performs causal, frame-by-frame editing with strong content preservation and real-time responsiveness. Our key design is a three-stage distillation pipeline that progressively transfers editing capability from a powerful bidirectional foundation model to an efficient unidirectional streaming editor, enabling stable long-horizon edits without sacrificing visual fidelity. To further support real-time deployment, we introduce an AR-oriented mask cache that reuses region-related computation across frames, substantially reducing redundant processing and accelerating inference. Finally, we establish a dedicated benchmark for streaming video editing. Extensive evaluations demonstrate that our method achieves state-of-the-art visual quality among streaming baselines while drastically boosting inference speed to 12.66 FPS, making it suitable for interactive and augmented reality applications.","upvotes":70,"discussionId":"6a43289e763f63ca3757e835","projectPage":"https://arxiv.org/abs/2606.26740","githubRepo":"https://github.com/cp-cp/LiveEdit","githubRepoAddedBy":"user","ai_summary":"A novel streaming video editing framework enables causal, frame-by-frame editing with stable long-horizon preservation and real-time responsiveness through a three-stage distillation pipeline and AR-oriented mask cache.","ai_keywords":["streaming video editing","causal editing","frame-by-frame editing","content preservation","real-time responsiveness","three-stage distillation pipeline","bidirectional foundation model","unidirectional streaming editor","long-horizon edits","AR-oriented mask cache","inference speed","interactive applications","augmented reality"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":34,"organization":{"_id":"628735cbc83a2d6ab8d14a66","name":"Tsinghua","fullname":"Tsinghua University","avatar":"https://www.gravatar.com/avatar/6c5c1441e3283e7543342e59277ea219?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63871e72189d6915f1d3a87b","avatarUrl":"/avatars/4978b3ad59b5712c73fa692193837ecd.svg","isPro":false,"fullname":"Jack Ma","user":"YueMafighting","type":"user"},{"_id":"68e90e155c45b1f6dcef8436","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UvWeI4WgRCUTDZavKrFmq.png","isPro":false,"fullname":"zhaochongbo","user":"cbfsixxx","type":"user"},{"_id":"6411c801e872ae3fb1e2c96e","avatarUrl":"/avatars/f8898dc13d700e545eedbbfab1c18353.svg","isPro":true,"fullname":"Franklin","user":"Franklinzhang","type":"user"},{"_id":"699d20a9d61de1aa72e2e1a6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/699d20a9d61de1aa72e2e1a6/mZm2AA5vuelkjbTeYC_7a.jpeg","isPro":false,"fullname":"wangmaohan","user":"my-lora-01","type":"user"},{"_id":"633407947eb49986ce070a6c","avatarUrl":"/avatars/84245495d36f605a900950a3a76d4386.svg","isPro":false,"fullname":"Eason","user":"songyiren","type":"user"},{"_id":"6826bcdecaf89edf94b6ff29","avatarUrl":"/avatars/0387cb6e5f9f11b23410a34457033fcf.svg","isPro":false,"fullname":"Zhitao","user":"Zhitao-He","type":"user"},{"_id":"646f3418a6a58aa29505fd30","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f3418a6a58aa29505fd30/1z13rnpb6rsUgQsYumWPg.png","isPro":false,"fullname":"QINGHE WANG","user":"Qinghew","type":"user"},{"_id":"62c14609ac1b639c2d87192c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1656833489364-noauth.png","isPro":false,"fullname":"SII-liangtianyi","user":"tianyilt","type":"user"},{"_id":"6942ddf588bd5c4b8102adfc","avatarUrl":"/avatars/460780bf94ee87e755a03770c9b90838.svg","isPro":false,"fullname":"Emily Chan","user":"Emily777","type":"user"},{"_id":"66c62a3013962d19a81d65f1","avatarUrl":"/avatars/da00258857b24896e421356b52e61a85.svg","isPro":false,"fullname":"kf","user":"gaokf","type":"user"},{"_id":"64892de6cbda0d1cdb95a0ab","avatarUrl":"/avatars/2bfbf91755bf7130cebdc84be001a19d.svg","isPro":false,"fullname":"XuTianling","user":"Emmatl","type":"user"},{"_id":"67891d3f42b73f87eaed95aa","avatarUrl":"/avatars/49046efeffc045e4c2a8e4a13e94f9f4.svg","isPro":false,"fullname":"Zeqian Long","user":"3087richard","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":2,"organization":{"_id":"628735cbc83a2d6ab8d14a66","name":"Tsinghua","fullname":"Tsinghua University","avatar":"https://www.gravatar.com/avatar/6c5c1441e3283e7543342e59277ea219?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.26740.md","query":{}}">

Papers

arxiv:2606.26740

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Published on Jun 25

· Submitted by

Jack Ma on Jun 30

#2 Paper of the day

Tsinghua University

Upvote

Authors:

Abstract

A novel streaming video editing framework enables causal, frame-by-frame editing with stable long-horizon preservation and real-time responsiveness through a three-stage distillation pipeline and AR-oriented mask cache.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly developed for synthesis and cannot be directly applied to editing due to the strict preservation requirement and region-specific control. In this work, we present a novel streaming video editing framework that performs causal, frame-by-frame editing with strong content preservation and real-time responsiveness. Our key design is a three-stage distillation pipeline that progressively transfers editing capability from a powerful bidirectional foundation model to an efficient unidirectional streaming editor, enabling stable long-horizon edits without sacrificing visual fidelity. To further support real-time deployment, we introduce an AR-oriented mask cache that reuses region-related computation across frames, substantially reducing redundant processing and accelerating inference. Finally, we establish a dedicated benchmark for streaming video editing. Extensive evaluations demonstrate that our method achieves state-of-the-art visual quality among streaming baselines while drastically boosting inference speed to 12.66 FPS, making it suitable for interactive and augmented reality applications.