GLM 5.1 通过一条 OpenClaw 终端命令在编码基准测试中击败 Claude Opus 和 GPT-5.4。

编码实现📅 2026/04/18

#开发者#全自动#GitHub#低风险#手动触发#可复用#代码仓库#基准测试#大模型#报告

终端界面显示 GLM 5.1 在 OpenClaw 中运行并在 SWE Bench Pro 上超越 Claude 和 GPT 的分数对比

𝗚𝗟𝗠 𝟱.𝟭 𝗷𝘂𝘀𝘁 𝗯𝗲𝗮𝘁 𝗖𝗹𝗮𝘂𝗱𝗲 𝗢𝗽𝘂𝘀 𝗮𝗻𝗱 𝗚𝗣𝗧 𝟱.𝟰 𝗼𝗻 𝗿𝗲𝗮𝗹 𝗰𝗼𝗱𝗶𝗻𝗴 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀. 𝗢𝗻𝗲 𝘁𝗲𝗿𝗺𝗶𝗻𝗮𝗹 𝗰𝗼𝗺𝗺𝗮𝗻𝗱 𝗿𝘂𝗻𝘀 𝗶𝘁 𝗶𝗻 𝗢𝗽𝗲𝗻𝗖𝗹𝗮𝘄 𝗳𝗼𝗿 𝗳𝗿𝗲𝗲. 𝗡𝗼 𝗔𝗣𝗜 𝗸𝗲𝘆. 𝗡𝗼 𝗰𝗼𝗻𝗳𝗶𝗴.

Here are the numbers:

→ SWE Bench Pro: 58.4 (Claude: 57.3. GPT 5.4: 57.7.)

→ CyberJim: 68.7 (Claude: 66.6.)

→ Browse Comp: 68.0. Top score on the entire benchmark.

→ 198K context window. Feed it whole codebases.

→ Ran 600+ iterations on one task. 6,000 tool calls. Never stopped improving.

→ Went from 3,500 queries/sec to 21,500. Six times better by just not quitting.

The setup:

ollama launch openclaw --model glm5.1-cloud

That's it. One command. OpenClaw + GLM 5.1. Running.

It doesn't plateau. It gets better the longer it works.

Save this. Then give it a real problem.