mager.co

Always learning. Always eating. Chicago.

Latest

Note

Loooom v1.0

github.com/mager/loooom.xyz

Loooom is v1.0 — tagged and released on GitHub. The shape hasn't changed since the pivot post: fifteen curated non-technical skills, each scored against a written rubric. What v1.0 adds is the layer of testing that was missing.

The two rubric gates judge the skill text. The new third gate runs each skill the way an agent actually would — SKILL.md as the system prompt — and unit-tests its Agent Behavior contract with promptfoo: hook has to make you say what the song is about in one sentence before writing anything, stack has to kill the 23% credit card before entertaining the crypto bet, focus has to send your phone to another room. Thirty tests, two per skill, deterministic assertions plus an LLM rubric, all on Groq's free tier.

The whole harness still costs $0 — judge and tests both run on Groq's free tier. The price shows up in a different currency: run the suite three times back-to-back and you blow through the tokens-per-minute cap, and everything crawls behind HTTP 429s. Promptfoo's cache makes that mostly painless (passing tests don't re-run), but a free eval stack rations your iteration speed instead of your wallet. For a project this size, that's the right trade.

The first run came back 27/30, and the failures were the educational part. Two were the tests' fault, not the skills': story and frame were correctly following their own "make them name the one thing first" contract while my rubrics demanded the whole lecture in turn one. The third was a token cap truncating train before it reached progressive overload. Behavioral tests don't just check the skills — they force you to decide what the skill is actually supposed to do on the first turn.

The audit also closed an embarrassing loop: voice had been shipping without a worked example — the one skill not practicing what it preached, and the spec gate had been flagging it since day one. Fixed in v1.0.

I ran this whole launch with Claude Code on Fable, Anthropic's new model — the skill audit, the test suite, the release, and this note. First project I've shipped with it.

Food

Cacio e Pepe

Three real ingredients — pecorino, pepper, pasta water — plus a knob of butter for insurance, tossed into a glossy sauce that never breaks.

Read post →
Tech

An OpenClaw setup for Dad

A plain-English walkthrough for setting up your own always-on AI assistant on a Mac mini — OpenClaw, Google Gemini, and Tailscale — written for a first-timer.

Read post →
Note

Keeping an always-on agent alive across reboots

I run a Claude Code agent on a Mac mini in Chicago that I reach over Telegram. The hard part isn't the agent, it's keeping it up without me — across crashes, model swaps, and the occasional reboot. The fix is layered supervision, where each layer owns one kind of failure:

  • run.sh loops the agent and watches its exit code. An in-session model switch exits with code 42; the loop sees that and relaunches on the new model. Any other code stops the loop and hands control up.
  • tmux holds the session. The CLI is interactive and wants a PTY, so it runs inside a detached tmux session rather than a bare background process.
  • launchd is the floor. A LaunchAgent with RunAtLoad starts the tmux session at login (so it survives a reboot), and a StartInterval watchdog re-checks every couple of minutes and rebuilds the session if it's gone.

The thing I keep relearning: "restart it when it dies" is not one job. A reboot, a crash, and an intentional model swap are different failures, and each wants a different layer to catch it. Pile them all into one script and it's brittle; separate them and the whole thing just stays up.

Tech

Killing OpenClaw for a native Claude Code setup

I love OpenClaw. I hate that it doesn't run on my Claude Pro subscription. Turns out Claude Code, with the Telegram channels plugin and one CLAUDE.md, is the same harness — minus the daemon, the API bill, and the second LLM provider. Here's the actual recipe, ported from a hotel in Tokyo to a Mac mini in Chicago in forty minutes.

Read post →
Food

Japanese Butter Soy Spaghetti

A five-ingredient Japanese-style spaghetti — butter, tamari, and parmesan tossed with hot pasta and finished with green onion. The wafu pasta I kept eyeing in Tokyo, made at home in ten minutes.

Read post →
Tech

SkillOpt: gradient descent for your SKILL.md

Microsoft's SkillOpt is the first paper to treat agent skill files as trainable parameters — propose an edit, evaluate on held-out examples, accept only on strict improvement. Here's what it found and what it means for teams building with agents.

Read post →
Tech

Claude: Anthropic just shipped most of OpenClaw

I built a 200-line harness called conseiller to test Anthropic's new advisor tool — a fast executor model that consults a stronger model mid-generation. Two days later Anthropic shipped Claude Managed Agents, Multi-agent Orchestration, Dreams, Routines, and Remote Agents. Here's both halves: what I built and what they shipped, and how the pieces fit together into something a lot like OpenClaw.

Read post →
Tech

How I make tokens last longer

A simple set of habits I use to keep long AI coding sessions from getting bloated: better one-shot prompts, matching model and thinking level to the job, understanding cache behavior, and using cheaper orchestrators when it makes sense.

Read post →
Browse the full archive