Skip to main content

Opus 4.5 vs. Kimi 2.5: The 3D Maze Runner Showdown

A Head-to-Head Comparison of AI Coding

Introduction

Over the weekend, I put two leading AI models to the test: Opus 4.5 (using Claude Code) and Kimi 2.5 (using Opencode). The challenge? Build a complete 3D maze runner game from scratch, including maze generation, A* pathfinding, Three.js visualization, and automated testing. Both systems received identical prompts and were timed from start to finish.

What followed was a fascinating race that revealed important differences in how these AI systems approach complex coding tasks, handle context windows, and recover from errors.

The Challenge

Both AI assistants were given the same comprehensive prompt requiring them to build a web-based 3D maze demo with maze generation, 3D visualization, minimap, A* pathfinding, and Playwright tests.

The Original Prompt (excerpt)

# 3D Maze Runner Demo - Coding Assistant Prompt

## Overview
Create a complete web-based 3D maze demo using Three.js as the
visualization layer. The application should have two main states:
maze creation and maze running with an automated pathfinding algorithm.

## Technical Requirements
### Framework & Setup
- Use Vite + vanilla JavaScript (or React if preferred) for the project scaffold
- Three.js must be the 3D visualization library
- The project must be fully runnable with `npm install && npm run dev`

## State 1: Welcome Screen / Maze Creation
- Clean welcome screen with a single prominent button labeled "Create Maze"
- Generate a random 50x50 point maze using a maze generation algorithm
- Display the generated maze in a 2D canvas/SVG view
- Show a second button labeled "Start Maze Runner"

## State 2: Maze Runner Mode
- Minimap (Upper-Left Corner): Small 2D representation of the full maze
- Main View (Center): 3D first-person perspective view using Three.js
- Point Counter (Upper-Right Corner): Stopwatch-style display
- Wall Rendering: Grey walls with random graffiti/images (10-15% coverage)
- Pathfinding: A* algorithm, ~100-200ms per move

## Success Criteria
- Maze generates successfully (50x50)
- 2D and 3D views display correctly
- Minimap shows position in real-time
- Graffiti/decorations visible on walls
- A* algorithm finds path to goal
- Victory modal displays with final point count

The Race: A Minute-by-Minute Breakdown

Both AI systems started simultaneously. Here's how the race unfolded:

  • 4:00 - Kimi shows first welcome screen in browser
  • 5:00 - Opus catches up with its welcome screen
  • 6:00 - Opus version looks slightly more polished visually
  • 10:00 – Opus has a working version while Kimi configures MCP server
  • 13:00 - Kimi running final tests
  • 14:00 - Both systems doing final verification runs
  • 16:50 - Kimi starts compacting (context window filled)
  • 18:10 - Kimi reports completion
  • 21:22 - Opus reports 1 test failing due to runner speed
  • 24:10 - Kimi while testing, I noticed that the walls had visible gaps, so I prompted opencode to fix this.
  • 26:10 - Opus reports all tests pass, demonstrates working maze, closes browser
  • 36:20 - Kimi hits context limit and cannot recover (opencode scaffolding issue)
  • 37:00 - Kimi started a new opencode session, noticed that the 3D view always showed the same direction and didn't update the view towards where the maze runner was going. I asked to fix this with a prompt.
  • 41:00 - Kimi The fix worked – verified manually.

Visual Results

The screenshots below show the progression of both implementations side by side.

1. Welcome Screens


Opus:
Orange 'Create Maze' button

Kimi:
Purple gradient button

2. Generated Mazes



Opus:
Orange border, cyan paths


Kimi:
Purple styling, cleaner alignment

3. 3D Navigation



Opus:
'LOST' graffiti, 101 points


Kimi:
Long corridor, 219 points

4. Victory Screens



Opus:
'Congratulations!' 389 points


Kimi:
Trophy emojis, 498 points

Analysis

First Run Verdict

Kimi: Finished the map successfully but had gaps in 3D wall visualization that didn't match the 2D view.

Opus: Finished the map successfully with no visual errors and included wall graffiti.

Context Window Management

A critical difference emerged in how each system handled context limits. Despite Claude Opus 4.5 having a smaller context window (200K vs. Kimi's 262K), Claude Code's memory management proved superior. Kimi hit context limits at 16:50 and eventually couldn't recover at 36:20, while Opus maintained coherent execution throughout.

Visual Quality

This is subjective, but Kimi produced better UI alignment (see the maze/button positioning). However, Claude demonstrated more sophisticated 3D features including the wall graffiti requirement.

Conclusion

Winner: Opus (26 minutes) - Opus 4.5 with Claude Code completed the full challenge including all tests, wall graffiti, and proper 3D visualization.

Kimi 2.5 with Opencode showed promise and might have won with better scaffolding. Accounting for the context window issues (which appear to be Opencode-related rather than Kimi-specific), the adjusted Kimi time would be approximately 30 minutes without the graffiti feature.

Key Takeaway: Claude Code's superior memory management and integration proved decisive. The scaffolding around an AI model matters as much as the model itself for complex, long-running coding tasks.

Final Comparison

Metric Opus Kimi
Completion Time 26 min 30 min*
Wall Graffiti Yes No
Visual Errors None Wall gaps
Context Issues None Hit limit
UI Alignment Good Better

* Adjusted time accounting for context window issues

Comments

Popular posts from this blog

Vibe Coding Alert! How I Rebuilt a Wix Site and Fed the “AI Will End SaaS” Panic

My better half is an artist and maintains a Wix.com site. For the second time in two years, Wix decided to raise the hosting fees. That’s when I suggested to my spouse that I could rebuild the website and host it on Firebase (where I host most of my projects). I assumed this wouldn’t be a big deal (I was wrong) and started researching ways to use a lightweight CMS with Firebase support. Such a system exists — it’s called FireCMS — and it’s excellent. Before I dive deeper, here’s her original site (no longer a paid Wix site):  Miyuki's WIX site Her instructions were clear: replicate it as closely as possible. So I went to work. I created a product development document with use cases, scope, screenshots from the original site, the required features, and of course FireCMS integration. I used ChatGPT to draft the document, then set up a new Firebase instance, and finally launched the Vibe Coding agent (Claude Code). The process wasn’t too different from my other projects, but what sur...

I've Been Vibe Coding for 2 Months, Here's What I Believe Will Happen

In the past few months, I've embarked on an experiment that has fundamentally changed how I approach software development. I've been "vibe coding" - essentially directing AI to build software for me without writing a single line of code myself. This journey has been eye-opening, and I'd like to share what I've learned and where I think this is all heading. My Vibe Coding Journey I started vibe coding with Claude and Anthropic's Sonnet 3.5 model, later upgrading to Sonnet 3.7, Claude Code, and other tools. My goal was straightforward but comprehensive: create a CRM system with all the features I needed: Contact management (CRUD operations, relationships, email integration, notes) Calendar management (scheduling meetings, avoiding conflicts) Group management for organizing contacts A campaign system with templates A standalone application using the CRM's APIs for external contacts to book meetings direct The technical evolution of this project was inter...

The case for central bank digital currency as public infrastructure to enable digital assets

I have dabbled a fair amount with all sorts of crypto currencies and their respective permissionless networks. In fact, I have been dabbling since 2012 which is by my last count a whopping 12 years. While I have always maintained that I do believe the general concept for digitization and programmability of assets is on the right path, its implementation, the user experience, the accessibility, the fraudulent activities, and the overall inefficiencies permissionless DLTs have, never made me into a true believer. I have stated that opinion on several occasions, here , here and here . There are still barriers to entry when it comes to digitization of assets: sustainable- and interoperable infrastructure. To illustrate this, I recently asked a notary public here in Zurich, why they can’t store the notarized documents as PDFs, the answer surprised me: because they must keep records for at least 70 years. Now, think about what would have happened if we stored these documents on floppy disks...