Specialized Agents That Work While You Sleep

Why a single smart agent is the wrong architecture for recurring background tasks.

The Problem With Universal Agents

The boss agent (Claude Opus) can do anything. Throw a task at it and it will execute.

But "can do anything" is a terrible architecture for recurring background work.

A morning briefing doesn't need email-sending tools. It wastes context. A meeting prep doesn't need Slack posting. Cost explodes. Reply generation doesn't need the ability to create new GitHub issues.

We were paying $15 per 1M input tokens for Opus to do work that could be done with Sonnet ($3 per 1M input).

The insight: Specialized agents for specialized tasks.

Eight different agents. Eight different models. Eight different tool sets. Each one optimized for a single job.

The Eight Agents

Agent	Trigger	What It Does	Model	Cost
Morning Briefing	Daily pre-dawn	Calendar, emails, tasks summary	Sonnet 4.6	$3/M
Evening Briefing	Daily end-of-day	Completed items, pending, tomorrow's agenda	Sonnet 4.6	$3/M
Reply Generator	New email / Slack DM	Drafts reply matching tone	Sonnet + Opus	$3+$15
Meeting Prep	Cal sync + cron	Attendee context, relevant docs, talking points	Sonnet 4.6	$3/M
Action Items	Webhooks (Gmail, Linear, GitHub)	PR reviews, assigned issues, replies needed	Sonnet 4.6	$3/M
Onboarding	User signup	Web research, profile generation	Sonnet 4.6	$3/M
Skills	User approves pattern	Index workflow, store reusable pattern	Sonnet 4.6	$3/M
Begging Agent	Near credit limit	Nudge usage, suggest upgrade	Opus 4.6	$15/M

Every agent follows the same pattern:

NATS Event
  ↓
Consumer receives event
  ↓
Agent graph runs
  ↓
Terminal tool writes structured output to DB
  ↓
UI updates via Ably push

Why Specialization Wins

Cheaper Models

A 100-line task-specific prompt is laser-focused. Sonnet is smart enough.

A 500-line general-purpose prompt dilutes attention. Needs Opus.

We use Sonnet ($3/M) for 7 agents. Only the Begging Agent (which needs nuance) uses Opus ($15/M).

Focused Tool Sets

The morning briefing agent gets:

MEMORY_SEARCH (remember what happened)
CALENDAR_GET (read calendar)
EMAIL_LIST (read emails)
DUMP_MORNING_BRIEFING (write output)

That's it. Four tools. Minimal context.

It doesn't get:

SLACK_POST (can't post to Slack)
EMAIL_SEND (can't send emails)
NOTION_CREATE_PAGE (can't create pages)
DATABASE_DELETE (can't delete anything)

Safety through architecture. Not through prompt warnings.

A hallucinated "send this email" tool call returns tool_not_found. Doesn't execute.

Structured Output Schemas

Instead of returning unstructured text, each agent writes to a purpose-built database schema.

The meeting prep agent stores:

prep_notes (string)
attendees (array)
key_topics (array)
action_items (array)
source_docs (array with citations)

These schemas enable rich UIs that would be impossible with unstructured output. The app can show attendee bios inline, cite sources, and surface action items.

Deep Dive: The Meeting Prep Agent

Meeting prep is one of our most-loved agents. Let me walk through how it works end-to-end.

The Pipeline

Stage 1 — Calendar Sync
User connects their calendar. A NATS consumer SyncCalendarConsumer fetches events via sync tokens, normalizes them, and publishes to NATS.

Stage 2 — Filter & Route
Consumer CalendarUpdateSuggestions filters to next 2 days, separates cancelled from active, skips declined events, and routes to either "prep this" or "surface as action item."

Stage 3 — Deduplication & Deferral
Consumer MeetingPrepConsumer checks: Is there already a prep for this meeting? If not, create a skeleton. Is the meeting today? If not, defer it (process later). Has the event changed materially (time, attendees)? If yes, regenerate. Is it RSVP-only (no actual meeting work)? If yes, skip.

Stage 4 — Agent Execution
If the consumer decided "yes, prep this," the agent graph runs:

User: "Prep my 3pm meeting with John and Sarah"

Agent thinks:
├─ Is this a work meeting? (classification gate)
│  └─ Yes (colleague sync, not haircut)
├─ Which John? Which Sarah? (memory search)
│  └─ John = VP Eng, Sarah = designer
├─ What do I need to know? (research strategy)
│  ├─ Same domain: Linear sprint, blockers
│  ├─ Different domain: Gmail threads, relationship history
│  └─ Both: Call PEOPLE_SEARCH and WEB_SEARCH
├─ What docs are relevant? (hybrid search)
│  └─ Find Q2 roadmap (they're discussing it)
└─ Create sharp prep notes (dump to DB)

Four Trigger Paths

1. Calendar Sync — User connects calendar. Skeleton created, deferred if not today.

2. Webhook Update — Event changes (time, attendees). Content diff → regenerate only if material.

3. Morning Refresh — Daily pre-dawn. Refreshes stale preps generated before today. A prep from yesterday that was already generated shouldn't be re-run at midnight.

4. Pre-Meeting Cron — ~20 min before meeting. Always regenerates. Always sends email with fresh prep.

The key design: The consumer decides whether the agent runs at all. RSVP changes don't trigger the agent. Minor time shifts don't trigger it. If the consumer says "don't prep," the agent never runs.

This is why background agents are cheap. LLMs aren't invoked for decisions that can be made deterministically (time comparisons, content diffs, classification rules).

The Meeting Prep Agent's Prompt

The prompt is opinionated.

Good tone:
"Heads up: they mentioned response time concerns in their last email"

Bad tone:
"Key Notes: Enterprise prospect. High-stakes call."

The first is a colleague giving a heads-up. The second sounds like a system generating a report.

The prompt contains 100+ lines of tone guidance, classification heuristics, and examples. The boss agent's 500-line general prompt can't specialize like this.

Deep Dive: The Reply Generator

The reply generator demonstrates parallel pre-computation.

Expensive operations run before the agent's first LLM call. This saves ReAct iterations and latency.

START
  ├─ [Parallel] Memory Search
  │  └─ Finds relationship context (John's tone, history, response time)
  ├─ [Parallel] Attachment Preprocessing
  │  └─ Extracts content from PDFs, images (no raw file uploads to LLM)
  └─ [Parallel] Tone Skill Prefetch
     └─ Loads user's learned email/Slack voice profile

Agent sees pre-computed context (no MEMORY_SEARCH tool call needed)
  ↓
Agent drafts reply (Sonnet, $3/M)
  ↓
Opus refinement pass (matches user's voice, $15/M)
  ↓
Gmail draft created (not sent—user reviews)

This architecture is brilliant for cost. The expensive Opus pass only happens after research is done. It's polishing, not exploring.

Design Principles I Learned

1. Each Agent is Isolated

Every background agent runs on its own NATS consumer.

A failed morning briefing doesn't affect meeting prep. A timed-out reply generator doesn't block action items.

They share infrastructure (NATS, Postgres, Zep, TurboPuffer) but never share state.

2. Read-Only by Default

Background agents observe and summarize. They never take actions with real-world consequences.

No emails sent. No messages posted. No issues created.

The only exception: The reply generator creates a Gmail draft. This is deliberate. It's in the user's account. They see it before anything goes out.

3. Terminal Tools Enforce Schema

Every agent ends with a DUMP_* tool:

DUMP_MORNING_BRIEFING
DUMP_MEETING_PREP
DUMP_REPLY

The schema is the API. Output must match. You can't return free-form text and expect the UI to work.

4. Deferred Execution Saves Cost

Meeting prep defers non-today events. The morning briefing refreshes stale preps on the day they matter. Pre-meeting crons run 20 min before.

This cascading deferral means most events are processed once, on the day they matter. Not on every calendar sync.

5. Consumer Logic, Not Agent Logic

The NATS consumer does the heavy lifting:

Content diffs
Time shift detection
RSVP filtering
Staleness checks

The agent is stateless. It receives context and produces output. The LLM is never invoked for decisions that can be made deterministically.

The Economics

Without specialization:

8 background tasks × Opus ($15/M) = $120 cost
8 tasks running simultaneously = context contention
Latency unpredictable (Opus slower than Sonnet)

With specialization:

7 tasks × Sonnet ($3/M) + 1 × Opus ($15/M) = $36 cost
No context contention (separate agents)
Latency predictable (Sonnet is faster)
Safety by design (read-only toolsets)

Cost went down 70%. Latency improved. Safety improved.

What Changed

Before specialization, background work happened ad-hoc. Morning briefing was slow. Reply generation took forever. Accuracy was inconsistent.

After specialization:

Morning briefing — Pre-dawn, every day, in 30 seconds
Reply generator — Instant draft, Sonnet + Opus, matches user voice
Meeting prep — Fresh before meeting, zero latency
Action items — Real-time, across 4+ integrations

Specialized agents are boring. They're not impressive. But they're reliable, fast, and cheap.

That's the actual job.

This powers Dimension's background intelligence layer—eight specialized agents handling recurring tasks for thousands of users daily.