Why a single smart agent is the wrong architecture for recurring background tasks.
The Problem With Universal Agents
The boss agent (Claude Opus) can do anything. Throw a task at it and it will execute.
But "can do anything" is a terrible architecture for recurring background work.
A morning briefing doesn't need email-sending tools. It wastes context. A meeting prep doesn't need Slack posting. Cost explodes. Reply generation doesn't need the ability to create new GitHub issues.
We were paying $15 per 1M input tokens for Opus to do work that could be done with Sonnet ($3 per 1M input).
The insight: Specialized agents for specialized tasks.
Eight different agents. Eight different models. Eight different tool sets. Each one optimized for a single job.
The Eight Agents
| Agent | Trigger | What It Does | Model | Cost |
|---|---|---|---|---|
| Morning Briefing | Daily pre-dawn | Calendar, emails, tasks summary | Sonnet 4.6 | $3/M |
| Evening Briefing | Daily end-of-day | Completed items, pending, tomorrow's agenda | Sonnet 4.6 | $3/M |
| Reply Generator | New email / Slack DM | Drafts reply matching tone | Sonnet + Opus | $3+$15 |
| Meeting Prep | Cal sync + cron | Attendee context, relevant docs, talking points | Sonnet 4.6 | $3/M |
| Action Items | Webhooks (Gmail, Linear, GitHub) | PR reviews, assigned issues, replies needed | Sonnet 4.6 | $3/M |
| Onboarding | User signup | Web research, profile generation | Sonnet 4.6 | $3/M |
| Skills | User approves pattern | Index workflow, store reusable pattern | Sonnet 4.6 | $3/M |
| Begging Agent | Near credit limit | Nudge usage, suggest upgrade | Opus 4.6 | $15/M |
Every agent follows the same pattern:
NATS Event
↓
Consumer receives event
↓
Agent graph runs
↓
Terminal tool writes structured output to DB
↓
UI updates via Ably push
Why Specialization Wins
Cheaper Models
A 100-line task-specific prompt is laser-focused. Sonnet is smart enough.
A 500-line general-purpose prompt dilutes attention. Needs Opus.
We use Sonnet ($3/M) for 7 agents. Only the Begging Agent (which needs nuance) uses Opus ($15/M).
Focused Tool Sets
The morning briefing agent gets:
- MEMORY_SEARCH (remember what happened)
- CALENDAR_GET (read calendar)
- EMAIL_LIST (read emails)
- DUMP_MORNING_BRIEFING (write output)
That's it. Four tools. Minimal context.
It doesn't get:
- SLACK_POST (can't post to Slack)
- EMAIL_SEND (can't send emails)
- NOTION_CREATE_PAGE (can't create pages)
- DATABASE_DELETE (can't delete anything)
Safety through architecture. Not through prompt warnings.
A hallucinated "send this email" tool call returns tool_not_found. Doesn't execute.
Structured Output Schemas
Instead of returning unstructured text, each agent writes to a purpose-built database schema.
The meeting prep agent stores:
- prep_notes (string)
- attendees (array)
- key_topics (array)
- action_items (array)
- source_docs (array with citations)
These schemas enable rich UIs that would be impossible with unstructured output. The app can show attendee bios inline, cite sources, and surface action items.
Deep Dive: The Meeting Prep Agent
Meeting prep is one of our most-loved agents. Let me walk through how it works end-to-end.
The Pipeline
Stage 1 — Calendar Sync
User connects their calendar. A NATS consumer SyncCalendarConsumer fetches events via sync tokens, normalizes them, and publishes to NATS.
Stage 2 — Filter & Route
Consumer CalendarUpdateSuggestions filters to next 2 days, separates cancelled from active, skips declined events, and routes to either "prep this" or "surface as action item."
Stage 3 — Deduplication & Deferral
Consumer MeetingPrepConsumer checks: Is there already a prep for this meeting? If not, create a skeleton. Is the meeting today? If not, defer it (process later). Has the event changed materially (time, attendees)? If yes, regenerate. Is it RSVP-only (no actual meeting work)? If yes, skip.
Stage 4 — Agent Execution
If the consumer decided "yes, prep this," the agent graph runs:
User: "Prep my 3pm meeting with John and Sarah"
Agent thinks:
├─ Is this a work meeting? (classification gate)
│ └─ Yes (colleague sync, not haircut)
├─ Which John? Which Sarah? (memory search)
│ └─ John = VP Eng, Sarah = designer
├─ What do I need to know? (research strategy)
│ ├─ Same domain: Linear sprint, blockers
│ ├─ Different domain: Gmail threads, relationship history
│ └─ Both: Call PEOPLE_SEARCH and WEB_SEARCH
├─ What docs are relevant? (hybrid search)
│ └─ Find Q2 roadmap (they're discussing it)
└─ Create sharp prep notes (dump to DB)
Four Trigger Paths
1. Calendar Sync — User connects calendar. Skeleton created, deferred if not today.
2. Webhook Update — Event changes (time, attendees). Content diff → regenerate only if material.
3. Morning Refresh — Daily pre-dawn. Refreshes stale preps generated before today. A prep from yesterday that was already generated shouldn't be re-run at midnight.
4. Pre-Meeting Cron — ~20 min before meeting. Always regenerates. Always sends email with fresh prep.
The key design: The consumer decides whether the agent runs at all. RSVP changes don't trigger the agent. Minor time shifts don't trigger it. If the consumer says "don't prep," the agent never runs.
This is why background agents are cheap. LLMs aren't invoked for decisions that can be made deterministically (time comparisons, content diffs, classification rules).
The Meeting Prep Agent's Prompt
The prompt is opinionated.
Good tone:
"Heads up: they mentioned response time concerns in their last email"
Bad tone:
"Key Notes: Enterprise prospect. High-stakes call."
The first is a colleague giving a heads-up. The second sounds like a system generating a report.
The prompt contains 100+ lines of tone guidance, classification heuristics, and examples. The boss agent's 500-line general prompt can't specialize like this.
Deep Dive: The Reply Generator
The reply generator demonstrates parallel pre-computation.
Expensive operations run before the agent's first LLM call. This saves ReAct iterations and latency.
START
├─ [Parallel] Memory Search
│ └─ Finds relationship context (John's tone, history, response time)
├─ [Parallel] Attachment Preprocessing
│ └─ Extracts content from PDFs, images (no raw file uploads to LLM)
└─ [Parallel] Tone Skill Prefetch
└─ Loads user's learned email/Slack voice profile
Agent sees pre-computed context (no MEMORY_SEARCH tool call needed)
↓
Agent drafts reply (Sonnet, $3/M)
↓
Opus refinement pass (matches user's voice, $15/M)
↓
Gmail draft created (not sent—user reviews)
This architecture is brilliant for cost. The expensive Opus pass only happens after research is done. It's polishing, not exploring.
Design Principles I Learned
1. Each Agent is Isolated
Every background agent runs on its own NATS consumer.
A failed morning briefing doesn't affect meeting prep. A timed-out reply generator doesn't block action items.
They share infrastructure (NATS, Postgres, Zep, TurboPuffer) but never share state.
2. Read-Only by Default
Background agents observe and summarize. They never take actions with real-world consequences.
No emails sent. No messages posted. No issues created.
The only exception: The reply generator creates a Gmail draft. This is deliberate. It's in the user's account. They see it before anything goes out.
3. Terminal Tools Enforce Schema
Every agent ends with a DUMP_* tool:
DUMP_MORNING_BRIEFINGDUMP_MEETING_PREPDUMP_REPLY
The schema is the API. Output must match. You can't return free-form text and expect the UI to work.
4. Deferred Execution Saves Cost
Meeting prep defers non-today events. The morning briefing refreshes stale preps on the day they matter. Pre-meeting crons run 20 min before.
This cascading deferral means most events are processed once, on the day they matter. Not on every calendar sync.
5. Consumer Logic, Not Agent Logic
The NATS consumer does the heavy lifting:
- Content diffs
- Time shift detection
- RSVP filtering
- Staleness checks
The agent is stateless. It receives context and produces output. The LLM is never invoked for decisions that can be made deterministically.
The Economics
Without specialization:
8 background tasks × Opus ($15/M) = $120 cost
8 tasks running simultaneously = context contention
Latency unpredictable (Opus slower than Sonnet)
With specialization:
7 tasks × Sonnet ($3/M) + 1 × Opus ($15/M) = $36 cost
No context contention (separate agents)
Latency predictable (Sonnet is faster)
Safety by design (read-only toolsets)
Cost went down 70%. Latency improved. Safety improved.
What Changed
Before specialization, background work happened ad-hoc. Morning briefing was slow. Reply generation took forever. Accuracy was inconsistent.
After specialization:
- Morning briefing — Pre-dawn, every day, in 30 seconds
- Reply generator — Instant draft, Sonnet + Opus, matches user voice
- Meeting prep — Fresh before meeting, zero latency
- Action items — Real-time, across 4+ integrations
Specialized agents are boring. They're not impressive. But they're reliable, fast, and cheap.
That's the actual job.
This powers Dimension's background intelligence layer—eight specialized agents handling recurring tasks for thousands of users daily.