Hugging Face Daily Papers · June 30, 2026 · 3 min read

SAM2Matting: Generalized Image and Video Matting

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

<video src=\"https://cdn-uploads.huggingface.co/production/uploads/67ff29ecbf6889a333c69c7a/HYwwqh0Vis7CrjhwONZhc.mp4\" controls=\"\" class=\"max-w-full!\"></video></p>","updatedAt":"2026-06-30T14:44:51.509Z","author":{"_id":"67ff29ecbf6889a333c69c7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/zilMQrxIgUKYvHBVCHaKL.jpeg","fullname":"Henghui Ding","name":"HenghuiDing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5076522827148438},"editors":["HenghuiDing"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/zilMQrxIgUKYvHBVCHaKL.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.27339","authors":[{"_id":"6a3df0c73b43e283349ec1c1","name":"Ruiqi Shen","hidden":false},{"_id":"6a3df0c73b43e283349ec1c2","name":"Guangquan Jie","hidden":false},{"_id":"6a3df0c73b43e283349ec1c3","name":"Chang Liu","hidden":false},{"_id":"6a3df0c73b43e283349ec1c4","name":"Henghui Ding","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/67ff29ecbf6889a333c69c7a/Cvk3eFyeT9M0d7lDCw1iR.mp4"],"publishedAt":"2026-06-25T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"SAM2Matting: Generalized Image and Video Matting","submittedOnDailyBy":{"_id":"67ff29ecbf6889a333c69c7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/zilMQrxIgUKYvHBVCHaKL.jpeg","isPro":false,"fullname":"Henghui Ding","user":"HenghuiDing","type":"user","name":"HenghuiDing"},"summary":"Despite impressive advances in image matting, video matting remains challenging due to the inherent gap between high-level tracking, which requires frame-wise understanding, and low-level matting, which focuses on extremely fine-grained details. Existing methods attempt this with expensive and narrowly-scoped video matting datasets, which may limit out-of-domain generalization and compromise tracking robustness. We rethink the paradigm with SAM2Matting, a tracker-to-matting framework that advances VOS trackers to high-fidelity video matting. Specifically, it decouples the task by enhancing a foundational tracker (e.g., SAM2, SAM3) with a region-proposal bridge and dedicated matting heads, enabling the uncompromised tracker to handle temporal consistency while the matting components resolve fine-grained details. Notably, despite being trained only on images, SAM2Matting establishes new state-of-the-art performance on video matting, supports diverse prompt types, maintains strong temporal consistency, and demonstrates robust generalization across both human-centric and in-the-wild scenarios.","upvotes":6,"discussionId":"6a3df0c73b43e283349ec1c5","projectPage":"https://henghuiding.com/SAM2Matting/","githubRepo":"https://github.com/FudanCVL/SAM2Matting","githubRepoAddedBy":"user","ai_summary":"SAM2Matting advances video matting by decoupling tracking and matting tasks through a tracker-to-matting framework that leverages foundational trackers with region-proposal bridges and dedicated matting heads.","ai_keywords":["video matting","VOS trackers","SAM2","SAM3","region-proposal bridge","matting heads","temporal consistency","out-of-domain generalization","prompt types"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":27,"organization":{"_id":"68942389bd697013fd0c2df8","name":"FudanCVL","fullname":"FudanCVL","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/w_oRCf4rMPmNy62G-sI9p.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"687b1491392477cd3f670a78","avatarUrl":"/avatars/7189730a0e210040536a007c07887292.svg","isPro":false,"fullname":"Hongje Seong","user":"hongjeseong","type":"user"},{"_id":"67ff29ecbf6889a333c69c7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/zilMQrxIgUKYvHBVCHaKL.jpeg","isPro":false,"fullname":"Henghui Ding","user":"HenghuiDing","type":"user"},{"_id":"66d6f9c0b9e69dfa9b64efcb","avatarUrl":"/avatars/00e59deb409cc15585abf021e59c3611.svg","isPro":false,"fullname":"JinyuLiu","user":"JinyuLiu","type":"user"},{"_id":"65be40ed0a0c57943fc73a85","avatarUrl":"/avatars/96200772bee143bca3ccc6b7d3130d75.svg","isPro":false,"fullname":"Axe","user":"SongTang","type":"user"},{"_id":"66216897c92239f49974a07e","avatarUrl":"/avatars/def610a1f68bbbc80df945fe81b12ce0.svg","isPro":false,"fullname":"Jason Shen","user":"jasonshen-sh","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68942389bd697013fd0c2df8","name":"FudanCVL","fullname":"FudanCVL","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/w_oRCf4rMPmNy62G-sI9p.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.27339.md","query":{}}">

Papers

arxiv:2606.27339

SAM2Matting: Generalized Image and Video Matting

Published on Jun 25

· Submitted by

Henghui Ding on Jun 30

FudanCVL

Upvote

Authors:

Abstract

SAM2Matting advances video matting by decoupling tracking and matting tasks through a tracker-to-matting framework that leverages foundational trackers with region-proposal bridges and dedicated matting heads.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Despite impressive advances in image matting, video matting remains challenging due to the inherent gap between high-level tracking, which requires frame-wise understanding, and low-level matting, which focuses on extremely fine-grained details. Existing methods attempt this with expensive and narrowly-scoped video matting datasets, which may limit out-of-domain generalization and compromise tracking robustness. We rethink the paradigm with SAM2Matting, a tracker-to-matting framework that advances VOS trackers to high-fidelity video matting. Specifically, it decouples the task by enhancing a foundational tracker (e.g., SAM2, SAM3) with a region-proposal bridge and dedicated matting heads, enabling the uncompromised tracker to handle temporal consistency while the matting components resolve fine-grained details. Notably, despite being trained only on images, SAM2Matting establishes new state-of-the-art performance on video matting, supports diverse prompt types, maintains strong temporal consistency, and demonstrates robust generalization across both human-centric and in-the-wild scenarios.

View arXiv page View PDF Project page GitHub 27 Add to collection

Community

HenghuiDing

Paper submitter about 10 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.27339

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.27339 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.27339 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.27339 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

SAM2Matting: Generalized Image and Video Matting

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers