Sep 2025
Zhenhui (Jack) Jiang1, Yi Lu1 , Yifan
Wu1, Haozhe Xu2, Zhengyu Wu1, Jiaxin
Li1/
蒋镇辉1,鲁艺1,吴轶凡1,徐昊哲2,武正昱1,李佳欣1
1HKU Business School, The University of Hong Kong, Hong
Kong,
2 School of Management, Xi'an Jiaotong University, P. R.
China.
Abstract
With large language models (LLMs) evolving from “able to chat” toward “able to think”, artificial intelligence technology has experienced explosive growth in 2025. However, their deficiencies in advanced reasoning are also becoming more apparent. In light of this, the Artificial Intelligence Evaluation Laboratory (AIEL), led by Professor Jack Jiang at the University of Hong Kong, assessed 37 large language models from China and the US released up to September 2025, focusing on both multimodal reasoning and Olympiad-level reasoning capabilities. On one hand, they found that in multimodal reasoning the GPT series leads decisively with Doubao 1.5 Pro (Thinking) a competitive challenger; on the other hand, their results showed, in Olympiad-level reasoning, GPT-5 (Thinking) and Gemini 2.5 Pro performed exceptionally well, topping the leaderboard. Overall, US models hold a clear advantage in advanced reasoning. Although Chinese models have made notable progress in multimodal reasoning, they still exhibited performance gaps in more complex reasoning tasks.