r/MachineLearning · · 1 min read

Has anyone tried this approach with Fast Byte Latent Transformers ? [R]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Paper Referred:- https://arxiv.org/pdf/2412.09871v1

Has anyone switched the transformer in the entropy model here to a Mamba model ? What could be the possible changes ?

Just a ML fresher asking a genuine, since Mamba is more popular and saves computer (O(n)).

Thanking you in advance !

submitted by /u/SoloLeveller07
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning