Research Summary

Evaluation of Advanced AI Reasoning Capabilities in Chinese-Language Contexts

Zhenhui (Jack) Jiang¹, Yi Lu¹ , Yifan Wu¹, Haozhe Xu², Zhengyu Wu¹, Jiaxin Li¹/ 蒋镇辉¹,鲁艺¹,吴轶凡¹,徐昊哲²,武正昱¹,李佳欣¹
¹HKU Business School, The University of Hong Kong, Hong Kong, ² School of Management, Xi'an Jiaotong University, P. R. China.

Abstract

With large language models (LLMs) evolving from “able to chat” toward “able to think”, artificial intelligence technology has experienced explosive growth in 2025. However, their deficiencies in advanced reasoning are also becoming more apparent. In light of this, the Artificial Intelligence Evaluation Laboratory (AIEL), led by Professor Jack Jiang at the University of Hong Kong, assessed 37 large language models from China and the US released up to September 2025, focusing on both multimodal reasoning and Olympiad-level reasoning capabilities. On one hand, they found that in multimodal reasoning the GPT series leads decisively with Doubao 1.5 Pro (Thinking) a competitive challenger; on the other hand, their results showed, in Olympiad-level reasoning, GPT-5 (Thinking) and Gemini 2.5 Pro performed exceptionally well, topping the leaderboard. Overall, US models hold a clear advantage in advanced reasoning. Although Chinese models have made notable progress in multimodal reasoning, they still exhibited performance gaps in more complex reasoning tasks.

Complete Rankings(Multimodal)

The full report can be accessed HERE.

Complete Rankings(Olympiad)

The full report can be accessed HERE.