Skip to main content

Opus 4.8 vs. GPT-5.5: The 3D Maze Runner Rematch

A Head-to-Head Comparison of AI Coding — Round Two

Introduction

A while back I pitted Opus 4.5 against Kimi 2.5 in a 3D maze runner build-off. This is the rematch. I had hoped to run this round with Fable, but I was a day too late — so Opus 4.8 took the seat instead. The harnesses were native to each model: Codex for GPT-5.5 and Claude Code for Opus 4.8. Same challenge as before: build a complete 3D maze runner from scratch.

Opus 4.8 (Claude Code)

  • 00:30 — Claude Code opens by asking me questions about the framework and algorithm to use.
  • 05:20 — Thinking done. It leaves plan mode and starts writing code.
  • 08:51 — Build succeeds; installing Chromium.
  • 09:53 — Running through the maze, it detects graphical issues.
  • 11:04 — Graphical issues appear fixed; it moves on to check for graffiti on the walls.
  • 12:31 — It decides the graffiti should “desaturate in dim corridors,” so it makes them glow more.
  • 14:01 — Running final tests and cleanup; one last production build.
  • 14:42 — Done.

It ran too fast for me to grab a screenshot, so I asked it to add a player mode so I could capture one.

Verdict: Everything checks out. Graffiti is on the walls, though the Pac-Man avatar is a touch small.

GPT-5.5 (Codex)

  • 11:52 — Done. It never asked a question or prompted me once. Watching it run, though, the walls had no graffiti — the only deficiency I could spot.

I prompted once more: “I think you forgot to add graffiti, check the original request again.”

  • 13:04 — Done. Adding the graffiti took just 1 minute and 12 seconds. Here too, I asked for a playable mode so I could take better screenshots.

Verdict: GPT-5.5 also did a fine job. The missing-graffiti nuance was easily resolved, and even with the extra round it came in around the same time Opus 4.8 needed.

Visual Results

Opus 4.8 on the left, GPT-5.5 on the right. Click any image for the full-size version.

Welcome screens

Opus 4.8: gradient title, glowing yellow buttonGPT-5.5: flat white title, simple button

Generated 50×50 maze

Opus 4.8: rendered right on the welcome screenGPT-5.5: its own crisp maze view

3D navigation

Opus 4.8: graffiti on the walls, WASD controlsGPT-5.5: GO / RUN / 50 graffiti, arrow-key controls

3D navigation, deeper in

Opus 4.8: abstract neon graffiti at 119 ptsGPT-5.5: dense A* / RUN / GO / 50 graffiti

Victory

Opus 4.8: “Maze Solved!” at 490 pointsGPT-5.5: “Victory” at 514 points

Overall

Both models — and their harnesses — completed the task without much hassle. GPT-5.5 needed an extra nudge on the graffiti, but it still finished quicker than Opus 4.8.

The quality of the graffiti is debatable. To be fair, I never specified what kind of graffiti I wanted. Opus produced the “nicer” looking result, while GPT populated more of the walls, making its graffiti more visible.

The afterthought player mode I asked both for was implemented far better by Opus. It understood that I wanted first-person playability, whereas GPT-5.5 ended up moving the Pac-Man directionally.

Comparing all of this to my original Opus 4.5 vs. Kimi 2.5 test, the progress is tremendous — we’ve basically cut the time in half, with noticeably better quality. Read the original post.

Final Comparison

Metric Opus 4.8 GPT-5.5
Completion time 14:42 11:52 (13:04 with graffiti)
Asked clarifying questions Yes No
Wall graffiti (first pass) Yes No — needed a reminder
Graffiti quality Nicer More visible
Player mode First-person (better) Directional movement

Comments

Popular posts from this blog

Vibe Coding Alert! How I Rebuilt a Wix Site and Fed the “AI Will End SaaS” Panic

My better half is an artist and maintains a Wix.com site. For the second time in two years, Wix decided to raise the hosting fees. That’s when I suggested to my spouse that I could rebuild the website and host it on Firebase (where I host most of my projects). I assumed this wouldn’t be a big deal (I was wrong) and started researching ways to use a lightweight CMS with Firebase support. Such a system exists — it’s called FireCMS — and it’s excellent. Before I dive deeper, here’s her original site (no longer a paid Wix site):  Miyuki's WIX site Her instructions were clear: replicate it as closely as possible. So I went to work. I created a product development document with use cases, scope, screenshots from the original site, the required features, and of course FireCMS integration. I used ChatGPT to draft the document, then set up a new Firebase instance, and finally launched the Vibe Coding agent (Claude Code). The process wasn’t too different from my other projects, but what sur...

How I Ended Up Creating an AI Playground to Illustrate and Educate

TL;DR AI Playground User Guide

I've Been Vibe Coding for 2 Months, Here's What I Believe Will Happen

In the past few months, I've embarked on an experiment that has fundamentally changed how I approach software development. I've been "vibe coding" - essentially directing AI to build software for me without writing a single line of code myself. This journey has been eye-opening, and I'd like to share what I've learned and where I think this is all heading. My Vibe Coding Journey I started vibe coding with Claude and Anthropic's Sonnet 3.5 model, later upgrading to Sonnet 3.7, Claude Code, and other tools. My goal was straightforward but comprehensive: create a CRM system with all the features I needed: Contact management (CRUD operations, relationships, email integration, notes) Calendar management (scheduling meetings, avoiding conflicts) Group management for organizing contacts A campaign system with templates A standalone application using the CRM's APIs for external contacts to book meetings direct The technical evolution of this project was inter...