Research Summary

Who Is the Best AI FX Trader? — A Live-Market Evaluation of AI Agent Trading Capabilities

Zhenhui (Jack) Jiang¹, Jiaxin Li¹, Xiangyu Wang², Yi Lu¹, Yifan Wu¹, Yisen Hong³, Haozhe Xu⁴, Zhengyu Wu¹, / 蒋镇辉¹,李佳欣¹,王祥雨²,鲁艺¹,吴轶凡¹,洪逸森³,徐昊哲⁴,武正昱¹
¹HKU Business School, The University of Hong Kong, Hong Kong, ² Department of Information Management, Peking University, P. R. China, ³ Department of Computer Science and Technology, Tsinghua University, P. R. China, ⁴ School of Management, Xi'an Jiaotong University, P. R. China.

Abstract

If GPT, Claude, Gemini, DeepSeek, Qwen, and other leading AI models were deployed into live financial markets at the same time, which one would prove to be the best trader? To find out, the Artificial Intelligence Evaluation Lab at HKU Business School, led by Professor Jack Jiang, launched Agentic Trader, a benchmarking platform designed to evaluate the autonomous trading capabilities of AI agents in live foreign exchange markets. The project places AI Agents powered by leading large language models—including GPT, Claude, Gemini, DeepSeek, Qwen, Grok, GLM, Kimi, MiniMax, and Seed (Doubao)—into the real-world foreign exchange market environment and allows them to trade autonomously under identical conditions. By tracking their performance over time, the benchmark assesses how effectively different models make decisions, manage risk, and adapt to changing market conditions. Over six weeks of live trading, meaningful performance gaps have begun to emerge. By the end of the current evaluation period, Qwen, Kimi, and Seed have generated the strongest cumulative returns, while GLM and GPT have remained broadly near break-even. DeepSeek, MiniMax, and Claude, by contrast, have recorded more substantial losses. The participating models exhibited distinct trading styles and preferences in trading activity, risk-taking, and position management. The study further found that higher trading frequency did not necessarily translate into higher returns. The evaluation remains ongoing, and future research will continue to examine how AI systems perform in real-world financial markets over longer time horizons.

The full report can be accessed HERE.