Studio 07 · The Spec Bench

Write a real spec for your first agent.

Fill in 14 small fields. Most take 1-3 sentences each. By the end of this studio you'll have a written specification you can hand to a builder or use yourself. Your answers save automatically in your browser. You can export the whole thing as Markdown.

🤖

Pip thinks…

This is where we stop talking and start designing. Take your time. If you get stuck on a field, skip it and come back. Most fields are 1-3 sentences. By the end you'll have something most people never reach: a written spec.

Part A · The Why (4 fields)

Why this agent.

Field 01

Name

What will you call this agent? Keep it short and descriptive. Avoid clever puns.

Why it matters → The name is the handle everyone will use to talk about it. "Daily Sales Briefing" beats "SalesBot 9000."

ExampleDaily Sales Briefing

If you can't name it in three words, you don't know what it is yet.

Pip writing one short line on a small piece of paper.

Field 02

The one-sentence job

In one sentence, what does this agent do?

Why it matters → If you can't write one sentence, the agent isn't scoped. If your sentence has "and" in it twice, you have two agents pretending to be one.

ExampleEvery weekday morning, post a short note to the sales channel comparing yesterday's revenue to the same weekday last year.

Read your sentence to a colleague. If they say "wait, what?" the sentence is wrong.

Pip with a magnifying glass over a small soft-red pain-point dot.

Field 03

The problem it solves

What's painful about how this gets done today? Who feels the pain?

Why it matters → No problem, no agent. This field protects you from building something nobody actually needs.

ExampleThe owner opens the sales tool every morning and writes the comparison by hand. Takes 15 minutes. Only actually happens 3 days a week, so trend awareness is patchy.

If the answer is "it would just be cool," close the spec and go for a walk.

Pip beside a small bar chart with one bar highlighted.

Field 04

Success in numbers

How will you know in 4 weeks whether this is working? Use numbers.

Why it matters → Forces measurement before building. Without this, every agent is "kinda working" forever.

ExampleBy end of month one: 5 unaided runs per week. Owner spends zero minutes on it. Owner trusts the number on 8 out of 10 sampled days.

If you can't measure it, you can't keep it. Write a number you'll actually check.

Part B · The Five Parts (8 fields)

The anatomy, filled in for your agent.

Field 05

Trigger

When does this agent run? Pick one: a schedule, an event, or a person asking.

Why it matters → Triggers shape everything downstream. Scheduled agents are simplest. Start there.

ExampleMonday-Friday at 08:00 [your timezone]

Your first agent should run on a schedule. Event-driven is a year-2 problem.

Pip with a magnifier over a small stack of papers.

Field 06

Read

What sources does it pull from? List each one in plain language.

Why it matters → Naming sources forces you to confront whether the data actually exists where you think it does. Half of failed agent projects die here.

Example1. Sales database — yesterday's totals 2. Same source filtered to same weekday last year 3. Today's running totals so far

If you can't name the source in one line, the data isn't ready. Fix the data first.

Pip surrounded by 5 small floating note-slips.

Field 07

Context per run

Each time the agent runs, what specific information does it need handed to it?

Why it matters → Smaller is better. Five focused inputs beats fifty random ones. This is the "stack of notes" you hand the agent.

Example1. Yesterday's totals (revenue, orders, AOV) 2. Today's running totals so far 3. Same-weekday-LY totals 4. The previous 4 daily snapshots, for trend

Anything more than 5 inputs and you're confusing your agent. Trim.

Pip holding up a small shield in a stop gesture.

Field 08

Guard: refuse conditions

What conditions should make the agent refuse to run? List them.

Why it matters → This is the field people skip and regret. Every refuse condition you write saves you from one future embarrassing post.

Example1. Refuse if rows(yesterday) < 70% of recent median 2. Refuse if yesterday's revenue is null 3. Refuse if it's a public holiday

Write one more guard than you think you need. Future-you will write a thank-you note.

Pip holding up a small yellow flag with a question mark.

Field 09

Guard: the "I don't know" path

When the agent refuses or is unsure, what does it do instead?

Why it matters → Silence is the worst possible answer. An agent that fails loudly is a hundred times better than one that fails silently.

ExamplePost a short "data isn't complete yet, retrying at 09:00" message. Try once at 09:00. If still failing, post "data incomplete, please check pipeline" and stop for the day.

Every good agent has an "I don't know" button. The day yours stops using it is the day to worry.

Pip reading a small list with three short lines.

Field 10

Compute: instructions

What rules does the agent follow to turn the inputs into the answer? Write them as if to a thoughtful junior colleague.

Why it matters → This is the agent's brain. Plain English here translates directly into the actual prompt or code that runs.

Example1. Compute three deltas: revenue, orders, AOV. Compare to same weekday LY. 2. Identify the biggest mover and write 2-3 sentences in plain English. 3. Tone: calm, factual. No emojis. No exclamation points. 4. Never speculate about causes.

If you can't write the rules in plain English, you can't write them at all. Don't skip to code yet.

Pip sketching the shape of a message on paper.

Field 11

Compute: output format

What exact shape does the output take? A sketch is fine.

Why it matters → Without this, the output will drift week to week. Locking the format means readers scan it the same way every morning.

ExampleSubject: "Revenue · {date} · {↑/↓} {delta%} vs LY" Body: • Revenue: €{x} ({delta%} vs LY) • Orders: {n} ({delta%}) • AOV: €{x} ({delta%}) {2-3 sentence narrative} Footer: source + run time

Sketch the actual output before you build. If you can't draw it, the agent can't write it.

Field 12

Post: where it goes

Where does the output land? One destination only.

Why it matters → One agent, one channel. Multi-channel agents fail twice as often.

ExampleOne message to the team's #sales channel. No threads, no DMs, no email copy.

One destination, one message. Adding a second channel doubles your maintenance forever.

Part C · Lifecycle (2 fields)

Who owns it. When does it end.

Field 13

Owner

Who's the one human responsible for this agent? What's their response time when it breaks?

Why it matters → No owner means no maintenance means dead agent in 6 months. Every agent needs one named human.

Example[Head of Sales]. Reviews failures within 4 business hours. Updates the prompt if the narrative drifts. Renews the spec annually.

If you can't name an owner, don't build the agent. It will become someone's silent problem.

Pip looking at a small hourglass running out.

Field 14

Retirement criteria

Under what conditions should this agent be retired or replaced?

Why it matters → Agents have lifespans. Writing this on day one is the easiest way to remember to actually retire them.

ExampleRetire if any of: 1. The owner stops reading the output for 4 consecutive weeks. 2. A successor agent that handles multi-country splits is shipped. 3. The underlying data source migrates and the spec hasn't been updated in 30 days.

Decide when it ends now, while you're calm. Future-you might be too attached.

For reference

What a finished spec looks like.

All 14 fields filled in for the Daily Sales Briefing agent.

Your spec

Saved automatically in this browser. Export it as Markdown or print it as a PDF (your browser's print dialog → "Save as PDF").

Loading…

You finished Studio 07

Spec Author

The big sticker. You did the work most people don't.