好奇个问题,怎么测出来,对sonnet 4.5,有巨大提升?
其实,我好奇的确实是,sonnet 4.5跟 opus的差距不是那么大。从基准测试来说,也是差了3分。




-
Codex with GPT-5.2-codex(xhigh)是最强的,它能够最频繁地发现其他模型的代码、自己以前的代码逻辑上的问题。像一个人狠话不多的学霸。 -
Claude Opus 4.5虽然在复杂任务中容易遗漏,但是只要经过其他模型评审RFC文档和代码后的提醒,它很倾向于意识到自己的问题,修复自己方案,最终也能够达到很好的效果。像一个略微有点粗心、脾气很好的学霸。Claude Opus 4.5特别适合“和人对需求”,因为它最擅长说人话,用户体验好。 -
Gemini 3 Pro 强在前端任务、世界知识,这两者恐怕是世界第一。但是它的逻辑深度一般,很容易囫囵吞枣,长程任务能力也很可疑。 -
除了这3个模型,其他的模型,都是比较容易漏洞百出的,需要想各种办法去弥补。
前面提到,测试模型能力,我们不能用“初中数学题”,我们需要用综合性强的任务。
ObjectiveBuild a visually stunning, high-fidelity 3D voxel-style simulation of the Golden Gate Bridge in Three.js.Prioritize complex visuals (not simple blocks), strong atmosphere depth, and smooth ~60FPS.Visuals & Atmosphere- Lighting: a Time-of-day slider (0–24h) that controls sun position, intensity, sky color, and fog tint.- Fog: volumetric-feeling fog using lightweight sprite particles; slider 0–100 (0 = crystal clear, 100 = dense but not pure whiteout).- Water: custom shader for waves + specular reflections; blend horizon with distance-based fog (exp2) so the far water merges naturally.- Post: ACES filmic tone mapping + optimized bloom (night lights glow but keep performance).Scene Details- Bridge: recognizable art-deco towers, main span cables + suspenders, piers/anchors consistent with suspension bridge structure.- Terrain: simple but convincing Marin Headlands + SF side peninsula silhouettes.- Skyline: procedural/instanced city blocks on the SF side to suggest depth.- Traffic: up to ~400 cars via InstancedMesh, properly aligned on the deck (avoid clipping). Headlights/taillights emissive at night.- Ships: a few procedural cargo ships with navigation lights moving across the bay.- Nature: a small flock of animated birds (lightweight flocking).Night ModeAt night, enable city lights, bridge beacons, street lights, vehicle lights, ship nav lights.Tech & Controls (Important)- Output MUST be a single self-contained HTML file (e.g., golden_gate_bridge.html) that runs by opening in Chrome.- No build tools (no Vite/Webpack). Pure HTML + JS.- Import Three.js and addons via CDN using ES Modules + importmap.- UI: nice-looking sliders for Time (0–24), Fog Density (0–100), Traffic Density (0–100), Camera Zoom.- Optimization: use InstancedMesh for repeated items (cars/lights/birds), avoid heavy geometry, keep draw calls low.
这个视频是GPT-5.1-Codex-Max 做的。其实GPT-5.2-Codex和Gemini 3 Pro做得更好,我只是没录视频而已。对了,Gemini 3 Flash做得也比较让人惊喜。


