John

Investigation

Same Questions. New Models. Mixed Results.

Large language models are often judged on complex benchmarks, but some of their most interesting failures show up on questions that seem trivial at first glance. In this article, we test a range of OpenAI models using a small set of deliberately easy questions, the kind that have a track

Comics

Intelligent...ish #3

Investigation

Claude Opus 4.5 vs Shaders

Claude Opus 4.5 has been drawing attention for its coding skills, so I decided to put it to the test on a problem I’ve struggled with before: getting an AI to write shaders that actually work. I was curious to see whether it could handle the challenge better

How To

What Samsung’s TRM does and why it matters

In a recent development out of Samsung’s AI lab in Montreal, researchers have introduced a new “Tiny Recursive Model” (TRM) that challenges the prevailing notion that more parameters = more intelligence.) Key ideas & claims * Tiny footprint: TRM has only ~7 million parameters—orders of magnitude smaller than typical large

Comics

Intelligent...ish #2

Investigation

LLMs Are Not as Consistent as You Think

We often treat large language models as if they’re deterministic tools: ask the same question twice and you’ll get essentially the same answer. But anyone who uses them regularly knows that isn’t always true. Sometimes responses drift. Sometimes they change a lot. And sometimes they break your

Comics

Same Questions. New Models. Mixed Results.

Intelligent...ish #3

Claude Opus 4.5 vs Shaders

What Samsung’s TRM does and why it matters

Intelligent...ish #2

LLMs Are Not as Consistent as You Think

Intelligent...ish #1