Databases
(or: Dave finally stops pretending he can avoid Postgres)
Eventually, I had to accept reality: this thing needed a database. I’m not afraid of databases — I built my career on them — but until now, I’d been focusing on the parts I couldn’t do in my sleep. (And yes, I do occasionally sleep.)
I chose Postgres. I used it in my fintech days, it’s rock‑solid, and I can swap it out later if needed. I could have used Mongo. I could have stayed with the file system. But at some point, you realise you’re building an actual product, not a weekend experiment, and you start plugging the gaps properly.
Managing Prompts
(the part where I admit that “just use Git” was not enough)
I began with hard‑coded Markdown prompts. No versioning beyond Git. You’d think the latest prompt would always be the best. You’d be wrong.
Anyone cursed with prompt engineering knows that tiny tweaks ripple unpredictably. You fix one behaviour, and suddenly the model stops eating bricks and starts chewing crayons instead.
So proper prompt management became essential.
I now have a test harness that tracks:
- the quality of each answer
- the exact prompt version used
- the model behind it
Prompts are versioned. Claude has instructions: if it tweaks a prompt, it updates the Markdown and calls a CLI tool to create a new version. But that version isn’t “published” until I approve it — after copious testing and at least one existential sigh.
If something goes catastrophically wrong, I can roll back instantly. It’s like Git, but for LLM behaviour, and with fewer tears.
Claude even generated the database schema — tables, stored procs, the lot — from a draft spec. I’m comfortable sharing it because, frankly, this part isn’t the secret sauce. It’s just good engineering hygiene.

Tools Overview
(or: why I built WinForms apps in 2026 and feel no shame)
Each prompt has a prompt_name like “planner”. Each prompt has versions. Versions are model‑specific, because Sonnet and Haiku do not behave identically, and future models will behave differently again.
During testing, the system uses test_prompt_version_id. In production, it uses published_prompt_version_id. This separation lets me compare:
- what worked best
- what I’m currently experimenting with
- what I should never, ever deploy again
Then there’s the cache. For each abstracted question, it stores the plan. Metrics like usage_count help me understand what users ask most often — without tracking who asked it. Privacy matters.
To manage all this, I have simple editor tools. Why WinForms? Because building a web UI would require APIs, and those APIs would need to call stored procs, and that would require validation logic, and… look, sometimes the path of least resistance is the correct one.
WinForms: 1 Over‑engineering: 0

The same applies to the plan‑cache editor. I can see every redacted question and the plan Sonnet chose. I can override them if needed — a temporary fix until the next prompt version cleans things up.

