Skip to main content

Posts

Showing posts from 2026

Your Agent Framework Doesn’t Matter - Your Data Boundary Does

I recently worked through a practical problem that many enterprise teams will run into when using LLMs: how do you use powerful frontier models without exposing proprietary data unnecessarily? I cannot describe the exact use case, because it involves internal and proprietary information. So the example below is invented. But the architectural issue is the same. Imagine a financial-services workflow where a system receives an abbreviated security description such as: “UBS Grp 4.20% Call Sr Nts 30” and needs to resolve it into something more explicit: “UBS Group AG, 4.20% Callable Senior Notes, maturity 2030.” The obvious way to solve this is to give the task to a frontier model. It will probably do a good job. But in a real enterprise setting, the question is not only whether the model can solve the task. The more important question is: what else does the model see while solving it? That is where the architecture matters. There are at least two common ways to build this kind...

Opus 4.8 vs. GPT-5.5: The 3D Maze Runner Rematch

A Head-to-Head Comparison of AI Coding — Round Two Introduction A while back I pitted Opus 4.5 against Kimi 2.5 in a 3D maze runner build-off. This is the rematch. I had hoped to run this round with Fable, but I was a day too late — so Opus 4.8 took the seat instead. The harnesses were native to each model: Codex for GPT-5.5 and Claude Code for Opus 4.8. Same challenge as before: build a complete 3D maze runner from scratch. Opus 4.8 (Claude Code) 00:30 — Claude Code opens by asking me questions about the framework and algorithm to use. 05:20 — Thinking done. It leaves plan mode and starts writing code. 08:51 — Build succeeds; installing Chromium. 09:53 — Running through the maze, it detects graphical issues. 11:04 — Graphical issues appear fixed; it moves on to check for graffiti on the walls. 12:31 — It decides the graffiti should “desaturate in dim corridors,” so it makes them glow more. 14:01 — Running final tests and cleanup; one last production build. 14:42 ...

Opus 4.5 vs. Kimi 2.5: The 3D Maze Runner Showdown

A Head-to-Head Comparison of AI Coding Introduction Over the weekend, I put two leading AI models to the test: Opus 4.5 (using Claude Code) and Kimi 2.5 (using Opencode). The challenge? Build a complete 3D maze runner game from scratch, including maze generation, A* pathfinding, Three.js visualization, and automated testing. Both systems received identical prompts and were timed from start to finish. What followed was a fascinating race that revealed important differences in how these AI systems approach complex coding tasks, handle context windows, and recover from errors. The Challenge Both AI assistants were given the same comprehensive prompt requiring them to build a web-based 3D maze demo with maze generation, 3D visualization, minimap, A* pathfinding, and Playwright tests. The Original Prompt (excerpt) # 3D Maze Runner Demo - Coding Assistant Prompt ## Overview Create a complete web-based 3D maze demo using Three.js as the visualization layer. The application shou...