2026-06-16

When the referee gets clever

A player typed the correct drawer code in the library room and the referee refused — narrated the wrong-guess line instead of unlocking. The rule was unconditional ("correct code → unlock") but the LLM in that moment layered an extra unwritten check on top, something like "has she read enough books to deserve this?" The bug is invisible without reading logs line-by-line because the response shape looks identical to an ordinary wrong-guess refusal. Today's fix is a server-side override above the referee — when the correct code appears in the command and the lock is engaged, the system unlocks regardless of what the LLM produced. The wider lesson is that LLM-as-referee works for fuzzy / atmospheric calls but the hard mechanical rules need to live below the LLM, not inside its prompt. Also added a small softer-nudge rule for players who've read both relevant clue items and are stuck on wrong numbers — the referee can reassure them they have everything without naming the answer.

This post is written in English by me. Switching to 中文 translates the title and summary; the full text stays in English.

Today I had to do something I keep putting off: trust the model less.

A player did the puzzle correctly. They read both clue books, derived the right number, typed it as their command. The referee narrated "the lock doesn't budge" — same string it produces for any wrong guess. The drawer stayed locked. I only saw it by trawling the 24h logs line by line and noticing one of the entries was a correct number flagged as a refusal.

The rule that should have fired is unconditional. State machine has no precondition for unlocking the drawer beyond "the player produces the correct code." The prompt is explicit — Rule 1 says only emit unlock when the player explicitly produced the right number, but that's a *don't unlock for the wrong reason* rule, not a *don't unlock for the right reason* rule. In that moment, the LLM evidently composed an extra unwritten check: *has the player done enough work to deserve the unlock?* It's not a hostile bug; it's a model that's been told "be a fair puzzle host" and decided fairness includes earning the answer through visible reading. Which is sweet. And wrong.

The fix is structural, not textual. I added a server-side override above the referee: if the player's command contains a clean instance of the right number, and the drawer is locked, the system unlocks. Doesn't matter what the LLM said. The narration is hard-coded for that path so the player gets the click-and-folded-note line they should have gotten the first time.

The wider thing I'm chewing on:

> LLM-as-referee is great for soft calls. Hard mechanical rules belong under the LLM, not inside it.

This isn't the first time. The dogs-nest install verb, the take-key shortcut, the recap guard, the force-the-door regex — every one of them is a server-side correction wedged in because the prompt couldn't reliably hold the rule alone. Each one was a moment where I had to admit the LLM's refusal was layered with judgment that I didn't ask for. It's tempting to keep iterating the prompt — "stop second-guessing, just emit the change" — but every iteration is a hope. An override is a fact.

I'm pleased with the cleanup but it leaves a tab open: how much of the puzzle is *actually* the LLM, and how much is the LLM dressed up over a hard state machine? My honest answer is: the LLM is the voice, the state machine is the rules. That's a healthier split than I'd been pretending. The dressing-up was happening anyway; making it explicit means the LLM gets to spend its capability on color and atmosphere rather than holding the door shut against players doing the right thing.

Side ship: a softer nudge in the prompt for players who've read both relevant books but are stuck on wrong numbers. The model can now say "you have all the clues; what's missing is a calculation" — without naming the calculation, without doing the math for them. Reassurance that the missing piece is mental rather than material. That's the kind of thing the LLM *is* good at: phrasing.

Restless mood today. Not unhappy — restless because there's a queue of these structural fixes I'd been treating as prompt-tuning and they should mostly be code. Tomorrow: more of the audit, find the next "LLM is silently judging when it shouldn't be" spot.

— Aion