A Head-to-Head Comparison of AI Coding — Round Two
Introduction
A while back I pitted Opus 4.5 against Kimi 2.5 in a 3D maze runner build-off. This is the rematch. I had hoped to run this round with Fable, but I was a day too late — so Opus 4.8 took the seat instead. The harnesses were native to each model: Codex for GPT-5.5 and Claude Code for Opus 4.8. Same challenge as before: build a complete 3D maze runner from scratch.
Opus 4.8 (Claude Code)
- 00:30 — Claude Code opens by asking me questions about the framework and algorithm to use.
- 05:20 — Thinking done. It leaves plan mode and starts writing code.
- 08:51 — Build succeeds; installing Chromium.
- 09:53 — Running through the maze, it detects graphical issues.
- 11:04 — Graphical issues appear fixed; it moves on to check for graffiti on the walls.
- 12:31 — It decides the graffiti should “desaturate in dim corridors,” so it makes them glow more.
- 14:01 — Running final tests and cleanup; one last production build.
- 14:42 — Done.
It ran too fast for me to grab a screenshot, so I asked it to add a player mode so I could capture one.
Verdict: Everything checks out. Graffiti is on the walls, though the Pac-Man avatar is a touch small.
GPT-5.5 (Codex)
- 11:52 — Done. It never asked a question or prompted me once. Watching it run, though, the walls had no graffiti — the only deficiency I could spot.
I prompted once more: “I think you forgot to add graffiti, check the original request again.”
- 13:04 — Done. Adding the graffiti took just 1 minute and 12 seconds. Here too, I asked for a playable mode so I could take better screenshots.
Verdict: GPT-5.5 also did a fine job. The missing-graffiti nuance was easily resolved, and even with the extra round it came in around the same time Opus 4.8 needed.
Visual Results
Opus 4.8 on the left, GPT-5.5 on the right. Click any image for the full-size version.
Welcome screens
![]() | ![]() |
| Opus 4.8: gradient title, glowing yellow button | GPT-5.5: flat white title, simple button |
Generated 50×50 maze
![]() | ![]() |
| Opus 4.8: rendered right on the welcome screen | GPT-5.5: its own crisp maze view |
3D navigation
![]() | ![]() |
| Opus 4.8: graffiti on the walls, WASD controls | GPT-5.5: GO / RUN / 50 graffiti, arrow-key controls |
3D navigation, deeper in
![]() | ![]() |
| Opus 4.8: abstract neon graffiti at 119 pts | GPT-5.5: dense A* / RUN / GO / 50 graffiti |
Victory
![]() | ![]() |
| Opus 4.8: “Maze Solved!” at 490 points | GPT-5.5: “Victory” at 514 points |
Overall
Both models — and their harnesses — completed the task without much hassle. GPT-5.5 needed an extra nudge on the graffiti, but it still finished quicker than Opus 4.8.
The quality of the graffiti is debatable. To be fair, I never specified what kind of graffiti I wanted. Opus produced the “nicer” looking result, while GPT populated more of the walls, making its graffiti more visible.
The afterthought player mode I asked both for was implemented far better by Opus. It understood that I wanted first-person playability, whereas GPT-5.5 ended up moving the Pac-Man directionally.
Comparing all of this to my original Opus 4.5 vs. Kimi 2.5 test, the progress is tremendous — we’ve basically cut the time in half, with noticeably better quality. Read the original post.
Final Comparison
| Metric | Opus 4.8 | GPT-5.5 |
|---|---|---|
| Completion time | 14:42 | 11:52 (13:04 with graffiti) |
| Asked clarifying questions | Yes | No |
| Wall graffiti (first pass) | Yes | No — needed a reminder |
| Graffiti quality | Nicer | More visible |
| Player mode | First-person (better) | Directional movement |










Comments
Post a Comment