Zhenhui (Jack) Jiang1, Jiaxin Li1, Xiangyu Wang2, Yi Lu1, Yifan
Wu1, Yisen Hong3, Haozhe Xu4, Zhengyu Wu1,
/ 蒋镇辉1,李佳欣1,王祥雨2,鲁艺1,吴轶凡1,洪逸森3,徐昊哲4,武正昱1
1HKU Business School, The University of Hong Kong, Hong Kong,
2 Department of Information Management, Peking University, P. R. China,
3 Department of Computer Science and Technology, Tsinghua University, P. R. China,
4 School of Management, Xi'an Jiaotong University, P. R. China.
Abstract
If GPT, Claude, Gemini, DeepSeek, Qwen, and other leading AI models were deployed into live financial markets at the same time, which one would prove to be the best trader? To find out, the Artificial Intelligence Evaluation Lab at HKU Business School, led by Professor Jack Jiang, launched Agentic Trader, a benchmarking platform designed to evaluate the autonomous trading capabilities of AI agents in live foreign exchange markets. The project places AI Agents powered by leading large language models—including GPT, Claude, Gemini, DeepSeek, Qwen, Grok, GLM, Kimi, MiniMax, and Seed (Doubao)—into the real-world foreign exchange market environment and allows them to trade autonomously under identical conditions. By tracking their performance over time, the benchmark assesses how effectively different models make decisions, manage risk, and adapt to changing market conditions. Over six weeks of live trading, meaningful performance gaps have begun to emerge. By the end of the current evaluation period, Qwen, Kimi, and Seed have generated the strongest cumulative returns, while GLM and GPT have remained broadly near break-even. DeepSeek, MiniMax, and Claude, by contrast, have recorded more substantial losses. The participating models exhibited distinct trading styles and preferences in trading activity, risk-taking, and position management. The study further found that higher trading frequency did not necessarily translate into higher returns. The evaluation remains ongoing, and future research will continue to examine how AI systems perform in real-world financial markets over longer time horizons.
The full report can be accessed HERE.