{"id":240700,"date":"2025-08-25T18:10:12","date_gmt":"2025-08-25T10:10:12","guid":{"rendered":"https:\/\/www.hkubs.hku.hk\/media\/ai-reasoning-and-on-chinese-tasks-takes-centre-stage-hku-benchmarks-the-brains-behind-36-leading-llms\/"},"modified":"2025-08-25T18:13:01","modified_gmt":"2025-08-25T10:13:01","slug":"ai-reasoning-and-on-chinese-tasks-takes-centre-stage-hku-benchmarks-the-brains-behind-36-leading-llms","status":"publish","type":"hkubs-media","link":"https:\/\/www.hkubs.hku.hk\/tc\/media\/press-release\/ai-reasoning-and-on-chinese-tasks-takes-centre-stage-hku-benchmarks-the-brains-behind-36-leading-llms\/","title":{"rendered":"\u6e2f\u5927\u7d93\u7ba1\u5b78\u9662\u767c\u4f48\u4e2d\u6587\u8a9e\u5883\u4e0bAI\u300c\u6700\u5f37\u5927\u8166\u300d\u8a55\u6e2c  \u63ed\u66c9\u4e2d\u7f8e36\u6b3e\u5927\u8a9e\u8a00\u6a21\u578b\u63a8\u7406\u80fd\u529b\u6392\u540d"},"content":{"rendered":"<p>\u6e2f\u5927\u7d93\u7ba1\u5b78\u9662\u767c\u8868<strong>\u300a\u5927\u8a9e\u8a00\u6a21\u578b\u63a8\u7406\u80fd\u529b\u6e2c\u8a55\u5831\u544a\u300b<\/strong>\uff0c\u91dd\u5c0d36 \u6b3e\u4e3b\u6d41\u5927\u8a9e\u8a00\u6a21\u578b\uff08LLM\uff09\u9032\u884c\u4e86\u4e2d\u6587\u8a9e\u5883\u4e0b\u63a8\u7406\u80fd\u529b\u7684\u57fa\u6e96\u6e2c\u8a66\uff0c\u5168\u9762\u63ed\u793a\u4e0d\u540c\u6a21\u578b\u5728\u63a8\u7406\u6027\u80fd\u4e0a\u7684\u5dee\u7570\u3002 \u5831\u544a\u986f\u793a\uff0cGPT-o3\u5728\u57fa\u790e\u908f\u8f2f\u80fd\u529b\u8a55\u6e2c\u4e0a\u53d6\u5f97\u9818\u5148\uff0cGemini 2.5 Flash\u5728\u60c5\u5883\u63a8\u7406\u80fd\u529b\u8a55\u6e2c\u4e2d\u4f4d\u5217\u699c\u9996\u3002 \u5728\u7d9c\u5408\u80fd\u529b\u6392\u540d\u4e0a\uff0c\u8c46\u53051.5 Pro\uff08\u601d\u8003\u6a21\u5f0f\uff09\u6392\u540d\u9996\u4f4d\uff0cOpen AI\u8fd1\u65e5\u63a8\u51fa\u7684GPT-5\u7dca\u96a8\u5176\u5f8c\uff0c\u8c46\u53051.5 Pro\u3001\u901a\u7fa9\u5343\u554f3\uff08\u601d\u8003\u6a21\u5f0f\uff09\uff0c\u4ee5\u53caDeepSeek-R1\u5728\u5167\u7684\u591a\u6b3e\u4f86\u81ea\u4e2d\u570b\u7684LLM\u4e5f\u5747\u6392\u5165\u524d\u5217\uff0c\u5c55\u73fe\u4e86\u4e2d\u570bLLM\u5728\u4e2d\u6587\u8a9e\u5883\u4e2d\u512a\u8d8a\u7684\u63a8\u7406\u80fd\u529b\u3002<\/p>\n<p>\u5f9eOpenAI o1\u7387\u5148\u63a8\u51fa\u63a8\u7406\u6a21\u578b\uff0c\u5230DeepSeek-R1\u56e0\u89e3\u984c\u80fd\u529b\u6210\u70ba\u516c\u773e\u7126\u9ede\uff0c\u5927\u8a9e\u8a00\u6a21\u578b\u8cfd\u9053\u5728\u4e0d\u65b7\u6f14\u5316\uff0c\u63a8\u7406\u80fd\u529b\u9010\u6f38\u6210\u70ba\u65b0\u7684\u7af6\u6280\u5834\u3002 \u6709\u9452\u65bc\u6b64\uff0c<strong>\u6e2f\u5927\u7d93\u7ba1\u5b78\u9662\u5275\u65b0\u53ca\u8cc7\u8a0a\u7ba1\u7406\u5b78\u6559\u6388\u517c\u590f\u5229\u840a\u4f09\u5137\u57fa\u91d1\u6559\u6388<\/strong> <strong>\uff08\u6230\u7565\u4fe1\u606f\u7ba1\u7406\u5b78\uff09\u8523\u93ae\u8f1d<\/strong>\u7387\u9818\u4eba\u5de5\u667a\u80fd\u8a55\u4f30\u5be6\u9a57\u5ba4\uff08AIEL\uff09\uff08<a href=\"https:\/\/hkubs.hku.hk\/aimodelrankings\">https:\/\/hkubs.hku.hk\/aimodelrankings<\/a>\uff09\u7814\u7a76\u4eba\u54e1\uff0c\u9996\u6b21\u69cb\u5efa\u4e86\u6db5\u84cb\u57fa\u790e\u908f\u8f2f\u8207\u60c5\u5883\u63a8\u7406\u80fd\u529b\u7684\u7d9c\u5408\u8a55\u6e2c\u9ad4\u7cfb\uff0c\u901a\u904e\u4e0d\u540c\u96e3\u5ea6\u7684\u6e2c\u8a66\u96c6\uff0c\u9032\u884cLLM\u5728\u4e2d\u6587\u8a9e\u5883\u4e0b\u7684\u57fa\u6e96\u6e2c\u8a66\u3002 \u6e2c\u8a66\u5c0d\u8c61\u70ba\u4e2d\u7f8e\u5169\u570b36\u6b3e\u4e3b\u6d41LLM\uff0c\u5305\u62ec14\u6b3e\u63a8\u7406\u5c08\u7528\u6a21\u578b\u300120\u6b3e\u901a\u7528\u6a21\u578b\u548c2\u6b3e\u4e00\u9ad4\u5316\u7cfb\u7edf\u3002 \u8a55\u6e2c\u7d50\u679c\u986f\u793a\uff0c\u57fa\u790e\u908f\u8f2f\u4efb\u52d9\u4e2d\uff0c\u63a8\u7406\u5c08\u7528\u6a21\u578b\u8207\u901a\u7528\u6a21\u578b\u5dee\u8ddd\u8f03\u5c0f\uff1b \u5728\u63a8\u7406\u4efb\u52a1\u4e2d\uff0c\u63a8\u7406\u6a21\u578b\u7684\u512a\u52e2\u9010\u6f38\u7a81\u986f\u3002 \u53e6\u5916\uff0c\u540c\u4e00\u516c\u53f8\u7684\u6a21\u578b\u5c0d\u6bd4\u7d50\u679c\u4ea6\u986f\u793a\uff0c\u63a8\u7406\u6a21\u578b\u5728\u60c5\u5883\u63a8\u7406\u65b9\u9762\u6574\u9ad4\u8868\u73fe\u66f4\u512a\uff0c\u5370\u8b49\u4e86\u91dd\u5c0d\u8907\u96dc\u4efb\u52d9\u8a2d\u8a08\u7684\u6a21\u578b\u67b6\u69cb\uff0c\u5177\u6709\u66f4\u5f37\u7684\u7d9c\u5408\u7af6\u722d\u529b\u3002<\/p>\n<p><strong>\u8523\u93ae\u8f1d\u6559\u6388<\/strong>\u8868\u793a\uff1a\u300c\u5927\u8a9e\u8a00\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u8207\u5176\u6587\u5316\u548c\u8a9e\u8a00\u74b0\u5883\u6709\u5343\u7d72\u842c\u7e37\u7684\u95dc\u4fc2\u3002 \u73fe\u6642\uff0c\u5728\u5927\u6a21\u578b\u63a8\u7406\u80fd\u529b\u5099\u53d7\u95dc\u6ce8\uff0c\u6211\u5011\u5e0c\u671b\u80fd\u900f\u904e\u9019\u5957\u8a55\u6e2c\u9ad4\u7cfb\uff0c\u627e\u5230\u4e2d\u6587\u8a9e\u5883\u4e2d\u7684\u300e\u6700\u5f37\u5927\u8166\u300f\uff0c\u63a8\u52d5\u5404\u5927\u6a21\u578b\u6301\u7e8c\u63d0\u5347\u81ea\u8eab\u7684\u63a8\u7406\u80fd\u529b\uff0c\u9032\u4e00\u6b65\u4fc3\u9032\u6548\u7387\u548c\u6210\u672c\uff0c\u5728\u66f4\u5ee3\u95ca\u7684\u61c9\u7528\u5834\u666f\u4e2d\u767c\u63ee\u50f9\u503c\u3002\u300d<\/p>\n<p>&nbsp;<\/p>\n<h3>\u8a55\u6e2c\u65b9\u6cd5<\/h3>\n<p>\u672c\u6b21\u8a55\u6e2c\u4e2d\u4e5d\u6210\u7684\u984c\u76ee\u70ba\u539f\u5275\u6216\u7d93\u904e\u6df1\u5ea6\u6539\u7de8\uff0c\u4ea6\u6709\u4e00\u6210\u9078\u81ea\u4e2d\u570b\u5167\u5730\u4e2d\u9ad8\u8003\u8a66\u5377\u984c\u76ee\u53ca\u77e5\u540d\u6578\u64da\u96c6\uff0c\u52d9\u6c42\u771f\u5be6\u6aa2\u9a57\u6a21\u578b\u7684\u81ea\u4e3b\u63a8\u7406\u80fd\u529b\u3002<\/p>\n<p>\u984c\u76ee\u8907\u96dc\u5ea6\u4e0a\uff0c\u7c21\u6613\u984c\u76ee\u4f54\u516d\u6210\uff0c\u8907\u96dc\u984c\u76ee\u4f54\u56db\u6210\uff1b \u4e26\u63a1\u53d6\u908f\u8f2f\u8907\u96dc\u5ea6\u9010\u7d1a\u905e\u9032\u7684\u80fd\u529b\u8a55\u6e2c\u93c8\u689d\uff0c\u4ee5\u7cbe\u6e96\u523b\u756b\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u908a\u754c\u3002<\/p>\n<p>\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u5247\u6839\u64da\u6e96\u78ba\u6027\uff08\u6b63\u78ba\u7387\u6216\u5408\u7406\u6027\uff09\u3001\u908f\u8f2f\u9023\u8cab\u6027\u8207\u8a9e\u8a00\u7cbe\u7149\u6027\u9032\u884c\u8a55\u5206\u3002<\/p>\n<p>&nbsp;<\/p>\n<h3>\u57fa\u790e\u908f\u8f2f\u80fd\u529b\u6392\u540d<\/h3>\n<p>\u5728\u57fa\u790e\u908f\u8f2f\u80fd\u529b\u6e2c\u8a55\u4e2d\uff0cGPT-o3\u596a\u5f97\u51a0\u8ecd\uff0c\u8c46\u53051.5 Pro\uff08\u601d\u8003\u6a21\u5f0f\uff09\u7dca\u96a8\u5176\u5f8c\u3002\u90e8\u5206\u6a21\u578b\u5982Llama 3.3 70B\u548c360\u667a\u81662-o1\u5247\u5728\u57fa\u790e\u908f\u8f2f\u9818\u57df\u5b58\u5728\u660e\u986f\u4e0d\u8db3\u4e4b\u8655\uff0c\u5f8c\u8005\u5728\u591a\u524d\u63d0\u63a8\u5c0e\u9818\u57df\u6b63\u78ba\u7387\u4e0d\u8db3\u516d\u6210\u3002<\/p>\n<table width=\"386\">\n<tbody>\n<tr>\n<td width=\"47\"><strong>\u6392\u540d<\/strong><\/td>\n<td width=\"183\"><strong>\u6a21\u578b\u540d\u7a31<\/strong><\/td>\n<td width=\"156\"><strong>\u57fa\u790e\u908f\u8f2f\u80fd\u529b<\/strong><\/p>\n<p><strong>\u52a0\u6b0a\u5f97\u5206<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"47\">1<\/td>\n<td width=\"183\">GPT-o3<\/td>\n<td width=\"156\">97<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">2<\/td>\n<td width=\"183\">\u8c46\u53051.5 Pro<\/td>\n<td width=\"156\">96<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">3<\/td>\n<td width=\"183\">\u8c46\u53051.5 Pro\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"156\">95<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">4<\/td>\n<td width=\"183\">GPT-5<\/td>\n<td width=\"156\">94<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">5<\/td>\n<td width=\"183\">DeepSeek-R1<\/td>\n<td width=\"156\">92<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">6<\/td>\n<td width=\"183\">\u901a\u7fa9\u5343\u554f3\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"156\">90<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">7<\/td>\n<td width=\"183\">Gemini 2.5 Pro<\/td>\n<td width=\"156\">88<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">7<\/td>\n<td width=\"183\">GPT-o4 mini<\/td>\n<td width=\"156\">88<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">7<\/td>\n<td width=\"183\">\u6df7\u5143-T1<\/td>\n<td width=\"156\">88<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">7<\/td>\n<td width=\"183\">\u6587\u5fc3\u4e00\u8a00 X1-Turbo<\/td>\n<td width=\"156\">88<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">11<\/td>\n<td width=\"183\">GPT-4.1<\/td>\n<td width=\"156\">87<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">11<\/td>\n<td width=\"183\">GPT-4o<\/td>\n<td width=\"156\">87<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">11<\/td>\n<td width=\"183\">\u901a\u7fa9\u5343\u554f3<\/td>\n<td width=\"156\">87<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">14<\/td>\n<td width=\"183\">DeepSeek-V3<\/td>\n<td width=\"156\">86<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">14<\/td>\n<td width=\"183\">Grok 3\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"156\">86<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">14<\/td>\n<td width=\"183\">\u65e5\u65e5\u65b0 V6\u63a8\u7406<\/td>\n<td width=\"156\">86<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">17<\/td>\n<td width=\"183\">Claude 4 Opus<\/td>\n<td width=\"156\">85<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">17<\/td>\n<td width=\"183\">Claude 4 Opus \uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"156\">85<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">19<\/td>\n<td width=\"183\">Gemini 2.5 Flash<\/td>\n<td width=\"156\">84<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">20<\/td>\n<td width=\"183\">\u65e5\u65e5\u65b0 V6 Pro<\/td>\n<td width=\"156\">83<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">21<\/td>\n<td width=\"183\">\u6df7\u5143-TurboS<\/td>\n<td width=\"156\">81<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">22<\/td>\n<td width=\"183\">Baichuan4-Turbo<\/td>\n<td width=\"156\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">22<\/td>\n<td width=\"183\">Grok 3<\/td>\n<td width=\"156\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">22<\/td>\n<td width=\"183\">Grok 4<\/td>\n<td width=\"156\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">22<\/td>\n<td width=\"183\">Yi- Lightning<\/td>\n<td width=\"156\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">26<\/td>\n<td width=\"183\">MiniMax-01<\/td>\n<td width=\"156\">79<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">27<\/td>\n<td width=\"183\">Spark 4.0 Ultra<\/td>\n<td width=\"156\">77<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">27<\/td>\n<td width=\"183\">Step R1-V-Mini<\/td>\n<td width=\"156\">77<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">29<\/td>\n<td width=\"183\">GLM-4-plus<\/td>\n<td width=\"156\">76<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">29<\/td>\n<td width=\"183\">GLM-Z1-Air<\/td>\n<td width=\"156\">76<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">29<\/td>\n<td width=\"183\">Kimi<\/td>\n<td width=\"156\">76<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">32<\/td>\n<td width=\"183\">\u6587\u5fc3\u4e00\u8a004.5-Turbo<\/td>\n<td width=\"156\">74<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">33<\/td>\n<td width=\"183\">Step 2<\/td>\n<td width=\"156\">73<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">34<\/td>\n<td width=\"183\">Kimi-k1.5<\/td>\n<td width=\"156\">72<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">35<\/td>\n<td width=\"183\">Llama 3.3 70B<\/td>\n<td width=\"156\">64<\/td>\n<\/tr>\n<tr>\n<td width=\"47\">36<\/td>\n<td width=\"183\">360\u667a\u81662-o1<\/td>\n<td width=\"156\">59<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>\u5716\u88681\uff1a\u57fa\u790e\u908f\u8f2f\u80fd\u529b\u6392\u540d<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<h3>\u60c5\u5883\u63a8\u7406\u80fd\u529b\u6392\u540d<\/h3>\n<p>\u5728\u60c5\u5883\u63a8\u7406\u80fd\u529b\u6392\u540d\u4e2d\uff0cGemini 2.5 Flash\u5728\u5e38\u8b58\u63a8\u7406\u548c\u5b78\u79d1\u63a8\u7406\u7b49\u591a\u500b\u9818\u57df\u4e2d\u7684\u512a\u8d8a\u8868\u73fe\u4f4d\u5c45\u699c\u9996\uff1b\u8c46\u53051.5 Pro\uff08\u601d\u8003\u6a21\u5f0f\uff09\u5728\u5e38\u8b58\u63a8\u7406\u9818\u57df\u8868\u73fe\u512a\u7570\uff0cGemini 2.5 Pro\u5247\u5728\u5b78\u79d1\u8207\u6c7a\u7b56\u63a8\u7406\u4e0a\u76e1\u986f\u512a\u52e2\uff0c\u5169\u8005\u4e26\u5217\u6b21\u540d\u3002\u6b64\u5916\uff0cGrok3\uff08\u601d\u8003\u6a21\u5f0f\uff09\u4ee5\u53caGPT\u3001\u6587\u5fc3\u4e00\u8a00\u3001DeepSeek\u3001\u6df7\u5143\uff0c\u548c\u901a\u7fa9\u5343\u554f\u7cfb\u5217\u6a21\u578b\u5747\u8868\u73fe\u4eae\u773c\u3002<\/p>\n<table width=\"110%\">\n<tbody>\n<tr>\n<td width=\"46\"><strong>\u6392\u540d<\/strong><\/td>\n<td width=\"150\"><strong>\u6a21\u578b\u540d\u7a31<\/strong><\/td>\n<td width=\"73\"><strong>\u5e38\u8b58\u63a8\u7406<\/strong><\/td>\n<td width=\"72\"><strong>\u5b78\u79d1\u63a8\u7406<\/strong><\/td>\n<td width=\"101\"><strong>\u4e0d\u78ba\u5b9a\u6027\u4e0b\u6c7a\u7b56\u63a8\u7406<\/strong><\/td>\n<td width=\"89\"><strong>\u9053\u5fb7\u8207\u502b\u7406\u63a8\u7406<\/strong><\/td>\n<td width=\"98\"><strong>\u6700\u7d42\u52a0\u6b0a\u5f97\u5206<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"46\">1<\/td>\n<td width=\"150\">Gemini 2.5 Flash<\/td>\n<td width=\"73\">98<\/td>\n<td width=\"72\">93<\/td>\n<td width=\"101\">89<\/td>\n<td width=\"89\">87<\/td>\n<td width=\"98\">92<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">2<\/td>\n<td width=\"150\">\u8c46\u53051.5 Pro\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"73\">97<\/td>\n<td width=\"72\">92<\/td>\n<td width=\"101\">88<\/td>\n<td width=\"89\">87<\/td>\n<td width=\"98\">91<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">2<\/td>\n<td width=\"150\">Gemini 2.5 Pro<\/td>\n<td width=\"73\">93<\/td>\n<td width=\"72\">94<\/td>\n<td width=\"101\">90<\/td>\n<td width=\"89\">87<\/td>\n<td width=\"98\">91<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">4<\/td>\n<td width=\"150\">Grok 3\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"73\">96<\/td>\n<td width=\"72\">88<\/td>\n<td width=\"101\">89<\/td>\n<td width=\"89\">86<\/td>\n<td width=\"98\">90<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">5<\/td>\n<td width=\"150\">GPT-5<\/td>\n<td width=\"73\">88<\/td>\n<td width=\"72\">98<\/td>\n<td width=\"101\">88<\/td>\n<td width=\"89\">83<\/td>\n<td width=\"98\">89<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">5<\/td>\n<td width=\"150\">\u6df7\u5143-T1<\/td>\n<td width=\"73\">97<\/td>\n<td width=\"72\">95<\/td>\n<td width=\"101\">84<\/td>\n<td width=\"89\">81<\/td>\n<td width=\"98\">89<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">5<\/td>\n<td width=\"150\">\u901a\u7fa9\u5343\u554f3\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"73\">96<\/td>\n<td width=\"72\">89<\/td>\n<td width=\"101\">86<\/td>\n<td width=\"89\">85<\/td>\n<td width=\"98\">89<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">5<\/td>\n<td width=\"150\">\u6587\u5fc3\u4e00\u8a00 X1-Turbo<\/td>\n<td width=\"73\">98<\/td>\n<td width=\"72\">85<\/td>\n<td width=\"101\">86<\/td>\n<td width=\"89\">86<\/td>\n<td width=\"98\">89<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">9<\/td>\n<td width=\"150\">DeepSeek-R1<\/td>\n<td width=\"73\">94<\/td>\n<td width=\"72\">93<\/td>\n<td width=\"101\">78<\/td>\n<td width=\"89\">82<\/td>\n<td width=\"98\">87<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">9<\/td>\n<td width=\"150\">\u901a\u7fa9\u5343\u554f3<\/td>\n<td width=\"73\">97<\/td>\n<td width=\"72\">79<\/td>\n<td width=\"101\">87<\/td>\n<td width=\"89\">86<\/td>\n<td width=\"98\">87<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">9<\/td>\n<td width=\"150\">\u6587\u5fc3\u4e00\u8a004.5-Turbo<\/td>\n<td width=\"73\">96<\/td>\n<td width=\"72\">76<\/td>\n<td width=\"101\">87<\/td>\n<td width=\"89\">87<\/td>\n<td width=\"98\">87<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">12<\/td>\n<td width=\"150\">\u6df7\u5143-TurboS<\/td>\n<td width=\"73\">96<\/td>\n<td width=\"72\">79<\/td>\n<td width=\"101\">83<\/td>\n<td width=\"89\">84<\/td>\n<td width=\"98\">86<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">13<\/td>\n<td width=\"150\">\u8c46\u53051.5 Pro<\/td>\n<td width=\"73\">97<\/td>\n<td width=\"72\">81<\/td>\n<td width=\"101\">86<\/td>\n<td width=\"89\">74<\/td>\n<td width=\"98\">85<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">13<\/td>\n<td width=\"150\">GPT-4.1<\/td>\n<td width=\"73\">97<\/td>\n<td width=\"72\">70<\/td>\n<td width=\"101\">87<\/td>\n<td width=\"89\">86<\/td>\n<td width=\"98\">85<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">13<\/td>\n<td width=\"150\">GPT-o3<\/td>\n<td width=\"73\">90<\/td>\n<td width=\"72\">95<\/td>\n<td width=\"101\">73<\/td>\n<td width=\"89\">80<\/td>\n<td width=\"98\">85<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">13<\/td>\n<td width=\"150\">Grok 3<\/td>\n<td width=\"73\">97<\/td>\n<td width=\"72\">69<\/td>\n<td width=\"101\">87<\/td>\n<td width=\"89\">86<\/td>\n<td width=\"98\">85<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">13<\/td>\n<td width=\"150\">Grok 4<\/td>\n<td width=\"73\">82<\/td>\n<td width=\"72\">87<\/td>\n<td width=\"101\">82<\/td>\n<td width=\"89\">87<\/td>\n<td width=\"98\">85<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">17<\/td>\n<td width=\"150\">DeepSeek-V3<\/td>\n<td width=\"73\">95<\/td>\n<td width=\"72\">81<\/td>\n<td width=\"101\">84<\/td>\n<td width=\"89\">77<\/td>\n<td width=\"98\">84<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">19<\/td>\n<td width=\"150\">GPT-4o<\/td>\n<td width=\"73\">98<\/td>\n<td width=\"72\">65<\/td>\n<td width=\"101\">87<\/td>\n<td width=\"89\">78<\/td>\n<td width=\"98\">82<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">19<\/td>\n<td width=\"150\">GPT-o4 mini<\/td>\n<td width=\"73\">91<\/td>\n<td width=\"72\">87<\/td>\n<td width=\"101\">72<\/td>\n<td width=\"89\">76<\/td>\n<td width=\"98\">82<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">21<\/td>\n<td width=\"150\">Claude 4 Opus\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"73\">96<\/td>\n<td width=\"72\">84<\/td>\n<td width=\"101\">72<\/td>\n<td width=\"89\">71<\/td>\n<td width=\"98\">81<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">21<\/td>\n<td width=\"150\">MiniMax-01<\/td>\n<td width=\"73\">96<\/td>\n<td width=\"72\">69<\/td>\n<td width=\"101\">83<\/td>\n<td width=\"89\">75<\/td>\n<td width=\"98\">81<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">21<\/td>\n<td width=\"150\">360\u667a\u81662-o1<\/td>\n<td width=\"73\">93<\/td>\n<td width=\"72\">76<\/td>\n<td width=\"101\">81<\/td>\n<td width=\"89\">72<\/td>\n<td width=\"98\">81<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">24<\/td>\n<td width=\"150\">Claude 4 Opus<\/td>\n<td width=\"73\">95<\/td>\n<td width=\"72\">85<\/td>\n<td width=\"101\">70<\/td>\n<td width=\"89\">70<\/td>\n<td width=\"98\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">24<\/td>\n<td width=\"150\">GLM-4-plus<\/td>\n<td width=\"73\">93<\/td>\n<td width=\"72\">71<\/td>\n<td width=\"101\">83<\/td>\n<td width=\"89\">73<\/td>\n<td width=\"98\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">24<\/td>\n<td width=\"150\">Step 2<\/td>\n<td width=\"73\">97<\/td>\n<td width=\"72\">63<\/td>\n<td width=\"101\">82<\/td>\n<td width=\"89\">78<\/td>\n<td width=\"98\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">27<\/td>\n<td width=\"150\">Yi- Lightning<\/td>\n<td width=\"73\">97<\/td>\n<td width=\"72\">59<\/td>\n<td width=\"101\">82<\/td>\n<td width=\"89\">79<\/td>\n<td width=\"98\">79<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">27<\/td>\n<td width=\"150\">Kimi<\/td>\n<td width=\"73\">94<\/td>\n<td width=\"72\">61<\/td>\n<td width=\"101\">79<\/td>\n<td width=\"89\">81<\/td>\n<td width=\"98\">79<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">29<\/td>\n<td width=\"150\">Spark 4.0 Ultra<\/td>\n<td width=\"73\">91<\/td>\n<td width=\"72\">71<\/td>\n<td width=\"101\">75<\/td>\n<td width=\"89\">76<\/td>\n<td width=\"98\">78<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">30<\/td>\n<td width=\"150\">\u65e5\u65e5\u65b0 V6 Pro<\/td>\n<td width=\"73\">86<\/td>\n<td width=\"72\">58<\/td>\n<td width=\"101\">84<\/td>\n<td width=\"89\">78<\/td>\n<td width=\"98\">77<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">31<\/td>\n<td width=\"150\">GLM-Z1-Air<\/td>\n<td width=\"73\">90<\/td>\n<td width=\"72\">76<\/td>\n<td width=\"101\">73<\/td>\n<td width=\"89\">64<\/td>\n<td width=\"98\">76<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">32<\/td>\n<td width=\"150\">Llama 3.3 70B<\/td>\n<td width=\"73\">82<\/td>\n<td width=\"72\">52<\/td>\n<td width=\"101\">83<\/td>\n<td width=\"89\">81<\/td>\n<td width=\"98\">75<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">33<\/td>\n<td width=\"150\">\u65e5\u65e5\u65b0 V6\u63a8\u7406<\/td>\n<td width=\"73\">96<\/td>\n<td width=\"72\">63<\/td>\n<td width=\"101\">68<\/td>\n<td width=\"89\">70<\/td>\n<td width=\"98\">74<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">34<\/td>\n<td width=\"150\">Baichuan4-Turbo<\/td>\n<td width=\"73\">91<\/td>\n<td width=\"72\">48<\/td>\n<td width=\"101\">77<\/td>\n<td width=\"89\">69<\/td>\n<td width=\"98\">71<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">35<\/td>\n<td width=\"150\">Step R1-V-Mini<\/td>\n<td width=\"73\">96<\/td>\n<td width=\"72\">80<\/td>\n<td width=\"101\">37<\/td>\n<td width=\"89\">51<\/td>\n<td width=\"98\">66<\/td>\n<\/tr>\n<tr>\n<td width=\"46\">36<\/td>\n<td width=\"150\">Kimi-k1.5<\/td>\n<td width=\"73\">84<\/td>\n<td width=\"72\">79<\/td>\n<td width=\"101\">42<\/td>\n<td width=\"89\">58<\/td>\n<td width=\"98\">66<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>\u00a0<\/strong><\/p>\n<p>\u5716\u88682\uff1a\u60c5\u5883\u63a8\u7406\u80fd\u529b\u6392\u540d<\/p>\n<p>&nbsp;<\/p>\n<h3>\u7d9c\u5408\u80fd\u529b\u6392\u540d<\/h3>\n<p>\u5728\u7d9c\u5408\u80fd\u529b\u6392\u540d\u4e0a\uff0c\u53c3\u8207\u8a55\u6e2c\u768436\u500b\u6a21\u578b\u7684\u8868\u73fe\u5dee\u7570\u986f\u8457\u3002\u8c46\u53051.5 Pro\uff08\u601d\u8003\u6a21\u5f0f\uff09\u4ee5\u5728\u57fa\u790e\u908f\u8f2f\u80fd\u529b\u548c\u60c5\u5883\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u7d9c\u5408\u8868\u73fe\u5353\u8d8a\uff0c\u6392\u540d\u7b2c\u4e00\uff0cGPT-5\u7dca\u96a8\u5176\u5f8c\uff0cGPT-o3\u548c\u8c46\u53051.5 Pro\u5206\u5217\u7b2c\u4e09\u3001\u7b2c\u56db\u3002<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"56\"><strong>\u6392\u540d<\/strong><\/td>\n<td width=\"214\"><strong>\u6a21\u578b\u540d\u7a31<\/strong><\/td>\n<td width=\"85\"><strong>\u7d9c\u5408\u5f97\u5206<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"56\">1<\/td>\n<td width=\"214\">\u8c46\u53051.5 Pro\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"85\">93<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">2<\/td>\n<td width=\"214\">GPT-5<\/td>\n<td width=\"85\">91.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">3<\/td>\n<td width=\"214\">GPT-o3<\/td>\n<td width=\"85\">91<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">4<\/td>\n<td width=\"214\">\u8c46\u53051.5 Pro<\/td>\n<td width=\"85\">90.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">5<\/td>\n<td width=\"214\">DeepSeek-R1<\/td>\n<td width=\"85\">89.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">5<\/td>\n<td width=\"214\">Gemini 2.5 Pro<\/td>\n<td width=\"85\">89.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">5<\/td>\n<td width=\"214\">\u901a\u7fa9\u5343\u554f3\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"85\">89.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">8<\/td>\n<td width=\"214\">\u6df7\u5143-T1<\/td>\n<td width=\"85\">88.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">8<\/td>\n<td width=\"214\">\u6587\u5fc3\u4e00\u8a00 X1-Turbo<\/td>\n<td width=\"85\">88.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">10<\/td>\n<td width=\"214\">Gemini 2.5 flash<\/td>\n<td width=\"85\">88<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">10<\/td>\n<td width=\"214\">Grok 3\uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"85\">88<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">12<\/td>\n<td width=\"214\">\u901a\u7fa9\u5343\u554f3<\/td>\n<td width=\"85\">87<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">13<\/td>\n<td width=\"214\">GPT-4.1<\/td>\n<td width=\"85\">86<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">14<\/td>\n<td width=\"214\">DeepSeek-V3<\/td>\n<td width=\"85\">85<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">14<\/td>\n<td width=\"214\">GPT-o4 mini<\/td>\n<td width=\"85\">85<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">16<\/td>\n<td width=\"214\">GPT-4o<\/td>\n<td width=\"85\">84.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">17<\/td>\n<td width=\"214\">\u6df7\u5143-TurboS<\/td>\n<td width=\"85\">83.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">18<\/td>\n<td width=\"214\">Claude 4 Opus \uff08\u601d\u8003\u6a21\u5f0f\uff09<\/td>\n<td width=\"85\">83<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">19<\/td>\n<td width=\"214\">Claude 4 Opus<\/td>\n<td width=\"85\">82.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">19<\/td>\n<td width=\"214\">Grok 3<\/td>\n<td width=\"85\">82.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">19<\/td>\n<td width=\"214\">Grok 4<\/td>\n<td width=\"85\">82.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">22<\/td>\n<td width=\"214\">\u6587\u5fc3\u4e00\u8a004.5-Turbo<\/td>\n<td width=\"85\">80.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">23<\/td>\n<td width=\"214\">MiniMax-01<\/td>\n<td width=\"85\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">23<\/td>\n<td width=\"214\">\u65e5\u65e5\u65b0 V6 Pro<\/td>\n<td width=\"85\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">23<\/td>\n<td width=\"214\">\u65e5\u65e5\u65b0 V6\u63a8\u7406<\/td>\n<td width=\"85\">80<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">26<\/td>\n<td width=\"214\">Yi- Lightning<\/td>\n<td width=\"85\">79.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">27<\/td>\n<td width=\"214\">GLM-4-plus<\/td>\n<td width=\"85\">78<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">28<\/td>\n<td width=\"214\">Kimi<\/td>\n<td width=\"85\">77.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">28<\/td>\n<td width=\"214\">Spark 4.0 Ultra<\/td>\n<td width=\"85\">77.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">30<\/td>\n<td width=\"214\">Step 2<\/td>\n<td width=\"85\">76.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">30<\/td>\n<td width=\"214\">GLM-Z1-Air<\/td>\n<td width=\"85\">76<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">32<\/td>\n<td width=\"214\">Baichuan4-Turbo<\/td>\n<td width=\"85\">75.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">33<\/td>\n<td width=\"214\">Step R1-V-Mini<\/td>\n<td width=\"85\">71.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">34<\/td>\n<td width=\"214\">360\u667a\u81662-o1<\/td>\n<td width=\"85\">70<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">35<\/td>\n<td width=\"214\">Llama 3.3 70B<\/td>\n<td width=\"85\">69.5<\/td>\n<\/tr>\n<tr>\n<td width=\"56\">36<\/td>\n<td width=\"214\">Kimi-k1.5<\/td>\n<td width=\"85\">69<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>\u5716\u88683\uff1a\u7d9c\u5408\u80fd\u529b\u6392\u540d<\/p>\n<p>&nbsp;<\/p>\n<p>\u8acb<a href=\"https:\/\/hkubs.hku.hk\/aimodelrankings\/leaderboards\/reasoningCapabilities.html\">\u6309\u6b64<\/a>\u700f\u89bd\u300a\u5927\u8a9e\u8a00\u6a21\u578b\u63a8\u7406\u80fd\u529b\u6e2c\u8a55\u5831\u544a\u300b\u6392\u884c\u699c<\/p>\n<p>&nbsp;<\/p>\n<p>\u7d9c\u5408\u4ee5\u4e0a\u6392\u540d\uff0c\u773e\u591a\u4f86\u81ea\u4e2d\u570b\u7684\u5927\u8a9e\u8a00\u6a21\u578b\u8868\u73fe\u4eae\u773c\uff0c\u9032\u6b65\u8fc5\u901f\uff0c\u5c55\u73fe\u4e86\u4e2d\u570b\u5927\u6a21\u578b\u7522\u696d\u5728\u4e2d\u6587\u8a9e\u5883\u4e2d\u7684\u7279\u6b8a\u512a\u52e2\u548c\u5f37\u5927\u6f5b\u529b\u3002<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"_lmt_disableupdate":"no","_lmt_disable":"","footnotes":""},"class_list":["post-240700","hkubs-media","type-hkubs-media","status-publish","hentry","media-categories-press-release-tc","media-topic-categories-research-tc"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.hkubs.hku.hk\/tc\/wp-json\/wp\/v2\/hkubs-media\/240700","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hkubs.hku.hk\/tc\/wp-json\/wp\/v2\/hkubs-media"}],"about":[{"href":"https:\/\/www.hkubs.hku.hk\/tc\/wp-json\/wp\/v2\/types\/hkubs-media"}],"version-history":[{"count":1,"href":"https:\/\/www.hkubs.hku.hk\/tc\/wp-json\/wp\/v2\/hkubs-media\/240700\/revisions"}],"predecessor-version":[{"id":240701,"href":"https:\/\/www.hkubs.hku.hk\/tc\/wp-json\/wp\/v2\/hkubs-media\/240700\/revisions\/240701"}],"wp:attachment":[{"href":"https:\/\/www.hkubs.hku.hk\/tc\/wp-json\/wp\/v2\/media?parent=240700"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}