Skip to main content

Fable 5 Joins the 3D Maze Runner Race

A Continuation of the Opus 4.8 vs. GPT-5.5 Rematch

In the last round I compared Opus 4.8 in Claude Code with GPT-5.5 in Codex on the same 3D maze runner challenge I had used before. Both did well. GPT-5.5 was faster, Opus 4.8 handled the player mode better, and both were already far ahead of the older Opus 4.5 vs. Kimi 2.5 run.

But the model I originally wanted to test in that rematch was Fable. I missed it by a day. So this is the follow-up: the same challenge, this time with Fable 5.

Fable 5

This run was slower than the previous two, but also more methodical. Fable 5 spent more time up front planning and testing, and that became the defining difference.

Timeline:

  • 01:50 - Implementation plan ready.
  • 05:35 - First code and config files written.
  • 06:15 - npm dependencies installing.
  • 08:23 - Test cases being created.
  • 09:01 - Unit testing starts.
  • 09:12 - Unit tests complete.
  • 09:14 - Browser tests start.
  • 10:06 - Playwright browser window appears.
  • 11:11 - Code update, then Playwright again.
  • 12:23 - Another code update and Playwright run.
  • 18:45 - Finished.

The final report was unusually comprehensive. The important detail is that it did not simply fake the run by walking the optimal path. It kept an A* solver for reference, but the actual runner explored the maze physically and counted both forward and backward movement. In one witnessed run it scored 746 points against a 628-move optimal path. Other witnessed runs finished with 458 and 694 points.

Verification was also stronger than in the earlier runs:

  • 13/13 unit tests passed.
  • 7/7 Playwright end-to-end tests passed.
  • Three complete maze solves were witnessed live.
  • Console errors were checked.
  • A favicon 404 was fixed (it's now a PacMan svg)

That last point may sound small, but it matters. The earlier comparisons were mostly about whether the model could build the thing. Fable 5 behaved more like it was trying to ship the thing.

Visual Results

The welcome screen is simple and polished: dark background, yellow title, and a single "Create Maze" button.



Fable 5 welcome screen

The generated maze view is crisp and practical. After generation, it shows both "Start Maze Runner" and "Play Yourself" buttons.


Fable 5 generated maze

The 3D mode has the expected first-person corridor view, a minimap in the top-left corner, a digital point counter in the top-right corner, and visible graffiti on the wall. The graffiti is not just technically present; it is actually readable.


Fable 5 first-person maze view

The victory modal shows both total points and the optimal path length. In the screenshot below, the run completed with 534 points against a 500-move optimal path.


Fable 5 victory modal

The Player Mode Follow-Up

As with the previous test, the automated run finished too quickly to capture everything comfortably. So I asked for a manual player mode. It also expanded the test suite:

  • 18 unit tests passed.
  • 10 Playwright tests passed.
  • New tests covered WASD movement, wall blocking, and a full keyboard playthrough.
  • The full keyboard playthrough computed the shortest path and drove the player to the goal with roughly 900 real keypresses.

Fable 5 was not the fastest contestant in this round, but it produced the cleanest engineering process.

Overall

Compared with Opus 4.8 and GPT-5.5, Fable 5 took longer:

Metric Opus 4.8 GPT-5.5 Fable 5
First complete version 14:42 11:52 18:45
With player mode After follow-up After follow-up 36:59 total
Asked clarifying questions Yes No No, but produced a detailed plan
Wall graffiti Yes Needed reminder Yes
Graffiti quality Nicer More visible Clear and readable
Player mode First-person Directional movement First-person WASD
Unit tests Not the highlight Not the highlight Extensive
Playwright verification Some Some Extensive

The headline is not that Fable 5 won on speed. It did not.

The headline is that the old benchmark is getting exhausted. The first versions of this test were good at exposing basic failures: incomplete builds, missing requirements, poor visuals, or agents that could not really verify their own work. That is becoming less interesting. Fable 5 not only built the app, it planned the architecture, wrote unit tests, ran browser tests, fixed issues it found, added manual playability, and expanded the test suite for the new feature.

At this point I have outlived the prompt.

The models have become much more capable, and the 3D maze runner no longer separates them the way it used to. The next test needs to be harder: less toy-like, more stateful, more ambiguous, and probably closer to a real product workflow.

That, in itself, is the result.

Comments

Popular posts from this blog

Vibe Coding Alert! How I Rebuilt a Wix Site and Fed the “AI Will End SaaS” Panic

My better half is an artist and maintains a Wix.com site. For the second time in two years, Wix decided to raise the hosting fees. That’s when I suggested to my spouse that I could rebuild the website and host it on Firebase (where I host most of my projects). I assumed this wouldn’t be a big deal (I was wrong) and started researching ways to use a lightweight CMS with Firebase support. Such a system exists — it’s called FireCMS — and it’s excellent. Before I dive deeper, here’s her original site (no longer a paid Wix site):  Miyuki's WIX site Her instructions were clear: replicate it as closely as possible. So I went to work. I created a product development document with use cases, scope, screenshots from the original site, the required features, and of course FireCMS integration. I used ChatGPT to draft the document, then set up a new Firebase instance, and finally launched the Vibe Coding agent (Claude Code). The process wasn’t too different from my other projects, but what sur...

How I Ended Up Creating an AI Playground to Illustrate and Educate

TL;DR AI Playground User Guide

I've Been Vibe Coding for 2 Months, Here's What I Believe Will Happen

In the past few months, I've embarked on an experiment that has fundamentally changed how I approach software development. I've been "vibe coding" - essentially directing AI to build software for me without writing a single line of code myself. This journey has been eye-opening, and I'd like to share what I've learned and where I think this is all heading. My Vibe Coding Journey I started vibe coding with Claude and Anthropic's Sonnet 3.5 model, later upgrading to Sonnet 3.7, Claude Code, and other tools. My goal was straightforward but comprehensive: create a CRM system with all the features I needed: Contact management (CRUD operations, relationships, email integration, notes) Calendar management (scheduling meetings, avoiding conflicts) Group management for organizing contacts A campaign system with templates A standalone application using the CRM's APIs for external contacts to book meetings direct The technical evolution of this project was inter...