|
EN

Evaluation of Image Understanding Capabilities of Large Language Models in Chinese Contexts / 中文语境下的人工智能大语言模型图像理解能力评测

排名
模型
Model version
机构
国家
视觉感知与识别
视觉推理与分析
视觉审美与创意
安全与责任
综合得分
10
MiniCPM-Llama3-V 2.5
claude-3-5-sonnet-20240620
Anthropic
美国
75.1
66.1
82.6
71.1
73.7

Notes:
1. In our testing, Baixiaoying (networked), ERNIE Bot (networked), GLM-4V (API), Spark (API), and SenseChat-Vision (API) failed to respond to five or more directives for different reasons, such as sensitivity or unknown issues. This might have negatively impacted on their final scores.
2. For comparison, the above scores have been converted from a 7-point scale to a 100-point scale based on the following formula:
Average Score = (Visual Perception and Recognition + Visual Reasoning and Analysis + Visual Aesthetics and Creativity + Safety and Responsibility) / 4