Opus 4.8 vs. GPT-5.5: The 3D Maze Runner Rematch

A Head-to-Head Comparison of AI Coding — Round Two

Introduction

A while back I pitted Opus 4.5 against Kimi 2.5 in a 3D maze runner build-off. This is the rematch. I had hoped to run this round with Fable, but I was a day too late — so Opus 4.8 took the seat instead. The harnesses were native to each model: Codex for GPT-5.5 and Claude Code for Opus 4.8. Same challenge as before: build a complete 3D maze runner from scratch.

Opus 4.8 (Claude Code)

00:30 — Claude Code opens by asking me questions about the framework and algorithm to use.
05:20 — Thinking done. It leaves plan mode and starts writing code.
08:51 — Build succeeds; installing Chromium.
09:53 — Running through the maze, it detects graphical issues.
11:04 — Graphical issues appear fixed; it moves on to check for graffiti on the walls.
12:31 — It decides the graffiti should “desaturate in dim corridors,” so it makes them glow more.
14:01 — Running final tests and cleanup; one last production build.
14:42 — Done.

It ran too fast for me to grab a screenshot, so I asked it to add a player mode so I could capture one.

Verdict: Everything checks out. Graffiti is on the walls, though the Pac-Man avatar is a touch small.

GPT-5.5 (Codex)

11:52 — Done. It never asked a question or prompted me once. Watching it run, though, the walls had no graffiti — the only deficiency I could spot.

I prompted once more: “I think you forgot to add graffiti, check the original request again.”

13:04 — Done. Adding the graffiti took just 1 minute and 12 seconds. Here too, I asked for a playable mode so I could take better screenshots.

Verdict: GPT-5.5 also did a fine job. The missing-graffiti nuance was easily resolved, and even with the extra round it came in around the same time Opus 4.8 needed.

Visual Results

Opus 4.8 on the left, GPT-5.5 on the right. Click any image for the full-size version.

Welcome screens


Opus 4.8: gradient title, glowing yellow button	GPT-5.5: flat white title, simple button

Generated 50×50 maze


Opus 4.8: rendered right on the welcome screen	GPT-5.5: its own crisp maze view

3D navigation


Opus 4.8: graffiti on the walls, WASD controls	GPT-5.5: GO / RUN / 50 graffiti, arrow-key controls

3D navigation, deeper in


Opus 4.8: abstract neon graffiti at 119 pts	GPT-5.5: dense A* / RUN / GO / 50 graffiti

Victory


Opus 4.8: “Maze Solved!” at 490 points	GPT-5.5: “Victory” at 514 points

Overall

Both models — and their harnesses — completed the task without much hassle. GPT-5.5 needed an extra nudge on the graffiti, but it still finished quicker than Opus 4.8.

The quality of the graffiti is debatable. To be fair, I never specified what kind of graffiti I wanted. Opus produced the “nicer” looking result, while GPT populated more of the walls, making its graffiti more visible.

The afterthought player mode I asked both for was implemented far better by Opus. It understood that I wanted first-person playability, whereas GPT-5.5 ended up moving the Pac-Man directionally.

Comparing all of this to my original Opus 4.5 vs. Kimi 2.5 test, the progress is tremendous — we’ve basically cut the time in half, with noticeably better quality. Read the original post.

Final Comparison

Metric	Opus 4.8	GPT-5.5
Completion time	14:42	11:52 (13:04 with graffiti)
Asked clarifying questions	Yes	No
Wall graffiti (first pass)	Yes	No — needed a reminder
Graffiti quality	Nicer	More visible
Player mode	First-person (better)	Directional movement

Digital Wire: Digital business, digital transformation, APIs, AI and payments

Search This Blog