Has anyone tried this approach with Fast Byte Latent Transformers ? [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Paper Referred:- https://arxiv.org/pdf/2412.09871v1
Has anyone switched the transformer in the entropy model here to a Mamba model ? What could be the possible changes ?
Just a ML fresher asking a genuine, since Mamba is more popular and saves computer (O(n)).
Thanking you in advance !
[link] [comments]
More from r/MachineLearning
-
How papers are selected for Best Paper, Oral, or Highlight presentation at major ML/CV conferences such as CVPR, ICCV, ECCV, NeurIPS, and ICLR? [D]
Jul 2
-
BMVC 2026 Review Discussion Thread [D]
Jul 2
-
Books/Resources to improve mathematical foundations for ML research [D]
Jul 2
-
IN 2026 ML BOOK OUTDATED? [D]
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.