Research Summary

Evaluation Report on the Image Generation Capabilities of AI Models

Zhenhui (Jack) Jiang¹, Zhengyu Wu¹, Jiaxin Li¹, Haozhe Xu², Yifan Wu¹,Yi Lu¹/ 蒋镇辉¹，武正昱¹，李佳欣¹，徐昊哲²，吴轶凡¹，鲁艺¹
¹HKU Business School, ²Shool of Management, Xi'an Jiaotong University

Abstract

The frontier of AI models has evolved beyond text processing to encompass the ability to understand and generate visual content. These models not only comprehend images but also generate visual content based on textual prompts. This study presents a systematic evaluation of the image generation capabilities of AI models, focusing on two core tasks: generating new images and revising existing images. Using carefully curated multidimensional test sets, we conducted a comprehensive evaluation of 22 AI models with image generation capabilities, including 15 text-to-image models and 7 multimodal large language models. The results show that ByteDance’s Dreamina and Doubao, as well as Baidu’s ERNIE Bot, demonstrate impressive performance in both new image generation and image revision tasks. Overall, multimodal large language models deliver superior performance compared to text-to-image models.

Complete Rankings

The full report can be accessed HERE.