Stable Diffusion 1.5 vs SDXL vs 3.5 vs Flux.1 vs OmniGen vs SANA
A comparison between different free Text-to-Image models running on a local machine, covering Stable Diffusion (1.5, SDXL, 3.5), Flux.1 (Schnell, Dev), Omnigen and the new, super-fast SANA-model published by NVIDIA, rating their image-quality, prompt-adherence, speed and VRAM-requirements.
I've tried to make the comparisons as fair and unbiased as possible, using the Google-Parti-Prompts method. So I created 1,391 images (107 per model) in 11 challenges and 12 categories and rated their quality. Still, it's a personal view but hopefully it can give you some guidance to make your own decisions.
This paper describes the whole evaluation process, as well as the classification and rating of each model.
You can also watch my tutorial on this topic on YouTube: