r/LocalLLaMA · · 3 min read

SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.

I’m pretty jaded like most of y’all. I don’t really get excited by new models much anymore. Last few weeks have been kinda meh to be honest. Monday, I stumbled upon SenseNova’s Mixture of Transformers models and they seem kinda like a different animal than other typical image gen models.
I managed to get a couple of them running and I have to say that this series of models is impressing me when it comes to generating and editing dense infographics.

I haven’t seen anything except for Ideogram 4 get close to what these can make in terms of infographics. While Ideogram 4 is great, Ideogram’s license sucks, SenseNova is Apache 2, so that puts them over the top when going head-to-head in my book.

Now I know, I know, the latest SenseNova-u1 version 2 is not in GGUF form yet, but that’s not a problem. What I did and what you can do is tell your favorite coding harness to “take the SenseNova model and wrap it in a FastAPI wrapper and serve it as both an OpenAi-compatible image generation endpoint and a image editing endpoint in a single docker container” and let that cook for a while and boom, Bob’s your uncle. In a bit you’ll have you an image generation API endpoint that you can point your favorite chat client to as an image generator / editor. This will let you skip all that ComfyUI spaghetti-looking interface bullshit. I’ve never been a fan of ComfyUI and don’t think I ever will. Change my mind.

There are several different versions of the SendeNova U1 models that you can try. If you want to.

Infographic V2 just came out a couple days ago and is the 50 Step base model. By the way it can make pretty much any image, it’s just trained to do infographics really well.

https://huggingface.co/sensenova/SenseNova-U1-8B-MoT-Infographic-V2

Infographic V1 8 Step LORA is like a lower-quality “flash” type model merge that is super speedy but not as high quality obviously because 8 steps is less than 50 (duh).

https://huggingface.co/sensenova/SenseNova-U1-8B-MoT-LoRAs/blob/main/SenseNova-U1-8B-MoT-Infographic-LoRA-8step-V1.0.safetensors

Infographic V1 50 Step base is also available but there is no reason to use it anymore unless you want to use it with the 8 Step LoRA for high speed generation.

https://huggingface.co/sensenova/SenseNova-U1-8B-MoT-Infographic

They also recently released an “Interleaved images” model which is really interesting.

https://huggingface.co/sensenova/SenseNova-U1-8B-MoT-Interleaved

The interleaved version will let you generate a series of related images, with consistent characters, fonts, colors, etc. Use cases for it include making slide decks with a consistent theme, making story books, etc. You have to serve the interleaved version differently because multiple images is not something a standard OpenAI-compatible Image generator endpoint can handle yet, so you need to create a tool pipeline with emitter events to serve multiple images in a single chat. I’m sure your harness can figure out how to set it up for you, mine did.

Anyway I thought these models were interesting and fun to get running. You’ll probably need about 36 GB of VRAM for the full bf16, but there are some quants and different GGUFs available as well. I think the smallest one I saw needed like 16GB.

submitted by /u/Porespellar
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA