Evaluation Report on the Image Generation Capabilities of AI Models

6 Mar 2025

Zhenhui (Jack) Jiang1, Zhengyu Wu1, Jiaxin Li1, Haozhe Xu2, Yifan Wu1,Yi Lu1/ 蒋镇辉1,武正昱1,李佳欣1,徐昊哲2,吴轶凡1,鲁艺1
1HKU Business School, 2Shool of Management, Xi'an Jiaotong University


Abstract

The frontier of AI models has evolved beyond text processing to encompass the ability to understand and generate visual content. These models not only comprehend images but also generate visual content based on textual prompts. This study presents a systematic evaluation of the image generation capabilities of AI models, focusing on two core tasks: generating new images and revising existing images. Using carefully curated multidimensional test sets, we conducted a comprehensive evaluation of 22 AI models with image generation capabilities, including 15 text-to-image models and 7 multimodal large language models. The results show that ByteDance’s Dreamina and Doubao, as well as Baidu’s ERNIE Bot, demonstrate impressive performance in both new image generation and image revision tasks. Overall, multimodal large language models deliver superior performance compared to text-to-image models.


Complete Rankings