$ cd ~/notes

Loooom v1.0

github.com/mager/loooom.xyz

Loooom is v1.0 — tagged and released on GitHub. The shape hasn't changed since the pivot post: fifteen curated non-technical skills, each scored against a written rubric. What v1.0 adds is the layer of testing that was missing.

The two rubric gates judge the skill text. The new third gate runs each skill the way an agent actually would — SKILL.md as the system prompt — and unit-tests its Agent Behavior contract with promptfoo: hook has to make you say what the song is about in one sentence before writing anything, stack has to kill the 23% credit card before entertaining the crypto bet, focus has to send your phone to another room. Thirty tests, two per skill, deterministic assertions plus an LLM rubric, all on Groq's free tier.

The whole harness still costs $0 — judge and tests both run on Groq's free tier. The price shows up in a different currency: run the suite three times back-to-back and you blow through the tokens-per-minute cap, and everything crawls behind HTTP 429s. Promptfoo's cache makes that mostly painless (passing tests don't re-run), but a free eval stack rations your iteration speed instead of your wallet. For a project this size, that's the right trade.

The first run came back 27/30, and the failures were the educational part. Two were the tests' fault, not the skills': story and frame were correctly following their own "make them name the one thing first" contract while my rubrics demanded the whole lecture in turn one. The third was a token cap truncating train before it reached progressive overload. Behavioral tests don't just check the skills — they force you to decide what the skill is actually supposed to do on the first turn.

The audit also closed an embarrassing loop: voice had been shipping without a worked example — the one skill not practicing what it preached, and the spec gate had been flagging it since day one. Fixed in v1.0.

I ran this whole launch with Claude Code on Fable, Anthropic's new model — the skill audit, the test suite, the release, and this note. First project I've shipped with it.

AISkillsLoooomEvalsPromptfoo