Building with LLMs at Scale: Part 3 - Higher-Level Abstractions

In Part 1 I described the pain points of working with multiple LLM sessions. Part 2 covered the ergonomics layer that made individual sessions manageable.

But ergonomics alone isn’t enough when you’re running 5-10 parallel Claude sessions. You need coordination, quality enforcement, and shared context. This article covers the higher-level abstractions that make LLM teams actually work.

The Smoke Test Paradigm: Designing Software for Rapid Iteration

Here’s the key insight: software design principles that help human developers also help LLMs. The same things that trip up human coders—complex interfaces, tight coupling, unclear contracts—trip up LLMs too.

When building software that LLMs will write and modify, the classic principles still apply:

Modular code: Small, well-defined components
Simple interfaces: Clear inputs and outputs
Loose coupling: Changes in one area don’t cascade
Fast feedback: Know immediately when something breaks

The difference is velocity. LLMs can iterate 10x faster than humans—but only if the feedback loop is tight. That’s where smoke tests become critical.

Why Smoke Tests Over Unit Tests?

I tried comprehensive unit test suites. They worked, but the overhead was crushing:

Writing tests took longer than writing features
Tests became brittle as code evolved
Mocking and fixtures added complexity
False positives made me ignore failures

The problem: unit tests are designed for human-paced development. When Claude can refactor an entire module in 30 seconds, waiting 5 minutes for a full test suite kills momentum.

Instead, I adopted smoke tests: simple, end-to-end checks that verify the system works. Run in seconds. Clear pass/fail. No ambiguity.

Example from my flashcards project (test/smoke_test.sh):

#!/bin/bash
# Smoke test: Does the basic workflow work?

# Create a flashcard
./flashcards create \
    --question "What is 2+2?" \
    --answer "4" \
    --project "math"

# Get quiz items
./flashcards quiz --limit 1 | grep "What is 2+2?"

# Review it
./flashcards review <id> --confidence 5

# Check it's in the list
./flashcards list | grep "What is 2+2?"

echo "✅ Smoke test passed!"

That’s it. No mocking. No fixtures. No complex assertions. Just: Does it work end-to-end?

The Make Test Convention

Every project has a Makefile with a test target:

test:
	@echo "Running smoke tests..."
	@./test/smoke_test.sh
	@echo "✅ All tests passed"

Claude knows this convention. After every code change, it automatically runs make test. If tests fail, Claude must fix them before continuing.

This simple pattern has caught hundreds of regressions. Claude refactors a function? Tests catch it. Claude renames a variable? Tests catch it. Claude adds a feature? Tests verify it.

Why This Works

Smoke tests have unique advantages for LLM workflows:

Fast: Run in seconds, not minutes
Clear failures: “Command failed” is unambiguous
Self-documenting: Reading the test shows how the system should work
Easy to maintain: When features change, tests are obvious to update
Catches real issues: Integration problems that unit tests miss

The trade-off: you don’t get fine-grained coverage. But in my experience, that’s fine. I’d rather have 90% confidence in 5 seconds than 99% confidence after 5 minutes of test runs.

Memento: Shared Context Between Sessions

The core challenge of parallel LLM sessions: they don’t know about each other.

Session A refactors the authentication system. Session B adds a new feature that uses authentication. Session A’s changes break Session B’s code—but Session B has no idea until tests fail.

I needed a shared knowledge base. Enter memento.

What Is Memento?

Memento is my note-taking system built on org-roam, which implements the Zettelkasten method for networked thought. I expose it to Claude via MCP (Model Context Protocol).

Think of it as a shared brain for all Claude sessions—a personal knowledge graph where notes link to each other, concepts build on each other, and every LLM session can read and contribute to the collective knowledge.

Memento note system showing interconnected knowledge graph

Key features:

Public notes tagged with PUBLIC are accessible via MCP
Searchable with full-text search
Structured with org-mode properties and tags
Version controlled in git
Persistent across sessions

The Global Context Pattern

Every Claude session starts by reading the claude-global-context note:

;; Automatically loaded by Claude at session start
(mcp__memento__note_get :note_id "claude-global-context")

This note contains:

My coding preferences
Project structure
Common pitfalls
Tools available (memento, MCP servers, custom scripts)
Reminders (never access ~/.roam directly, always use MCP)

As I discover patterns, I add them to this note. Every future Claude session gets that knowledge automatically.

Example from my global context:

## 🧪 Testing Approach:
- Write tests for new features
- Rely on smoke tests for projects (trigger with `make test`)
- **Whenever all tests pass after a change, make a commit with a descriptive message**

## 🔧 ELISP DEVELOPMENT WITH DOOMSCRIPT:
See the note tagged `elisp` for patterns and testing approaches

Session-Specific Context

For complex projects, I create dedicated notes:

memento-clojure-patterns: Clojure idioms and anti-patterns
appdaemon-testing-guide: How to test Home Assistant automations
mcp-server-patterns: How to build reliable MCP servers

When Claude works on these projects, I explicitly reference the notes:

Read the note `mcp-server-patterns` and apply those patterns
to this new server implementation.

Claude reads the note, absorbs the context, and applies it. The next Claude session working on the same project does the same thing—they’re building on shared knowledge.

Coordination Patterns (Experimental)

I’m experimenting with explicit coordination notes for parallel sessions:

# working-on-memento-refactor

## Current State
- Session A: Refactoring CLI argument parsing (IN PROGRESS)
- Session B: Adding new `bulk-update` command (WAITING)
- Session C: Updating tests (COMPLETED)

## Decisions Made
- Use argparse instead of manual parsing (Session A, 2025-09-28)
- All commands must support JSON output (Session B, 2025-09-27)

## Upcoming Work
- [ ] Migrate all commands to new arg structure
- [ ] Add integration tests
- [ ] Update documentation

Each session reads this note before starting work. Session A updates its status when done. Session B sees that and can proceed safely.

This is informal right now—I’m still exploring better patterns. Some ideas:

Barrier functionality: Session B blocks until Session A completes
Lock mechanism: Only one session can modify a file at once
Dependency tracking: Session C depends on Session A and Session B

I’m considering building an MCP server specifically for project coordination. Something like:

# Hypothetical coordination MCP server
mcp_coordinator.claim_file("src/parser.py", session_id="A")
# Other sessions get an error if they try to edit it
mcp_coordinator.add_barrier("refactor-complete", required_sessions=["A", "B"])
mcp_coordinator.wait_for_barrier("refactor-complete")  # Blocks until A and B finish

The Supervisor Pattern: Orchestrating LLM Teams

When I need major changes, I run multiple Claude sessions in parallel:

Session A: Implements feature X
Session B: Writes tests for feature X
Session C: Updates documentation
Session D: Reviews changes from A, B, and C

This is the supervisor pattern—but instead of manually coordinating, I use an LLM to generate prompts for other LLMs.

The Meta-LLM Approach

Here’s the key insight: planning parallel work is itself an LLM task. So I have Claude generate the work breakdown and individual prompts:

I describe the goal to a planning session: “Implement feature X with tests and docs”
The planner LLM creates:
- A work plan broken into phases (represented as a DAG)
- Individual prompt files for each parallel task
- Memento-based coordination scheme
- A supervisor prompt for monitoring progress
I review and launch using my automation tools

This meta-approach scales much better than manual coordination. The planner understands dependencies, estimates complexity, and generates consistent prompt structures.

The Tooling: claude-parallel

I built claude-parallel to automate the workflow:

# Step 1: Generate the plan
claude-parallel plan -P myproject -p "requirements.txt"

# This launches a planning Claude session that:
# - Breaks work into phases and tasks
# - Creates prompt files in ~/.projects/myproject/prompts/
# - Generates plan.json with the dependency DAG
# - Creates a supervisor.txt prompt for monitoring

# Step 2: Dispatch work to parallel sessions
claude-parallel dispatch -p prompts/phase-1-task-auth.txt src/auth.py
claude-parallel dispatch -p prompts/phase-1-task-tests.txt tests/test_auth.py

The dispatch command automatically:

Creates a new tmux window
Changes to the file’s directory
Launches Claude with the prompt
Monitors completion via memento notes

Tmux Automation

For complex projects with many parallel sessions, I use generate_tmuxinator_config:

# Generate tmuxinator config from prompt files
generate_tmuxinator_config -n myproject prompts/*.txt > ~/.config/tmuxinator/myproject.yml

# Launch all sessions at once
tmuxinator start myproject

This creates a tmux session with:

One window per prompt file
Proper window naming for easy navigation
All sessions starting in the correct directory

How I Do It Today

Write high-level requirements in a text file
Run claude-parallel plan to generate work breakdown
Review the generated prompts (adjust if needed)
Launch sessions via claude-parallel dispatch or tmuxinator
Use memento for coordination (automatically set up by the planner):
- Sessions read/write status notes
- Sessions check phase completion before starting
- Blocker notes communicate issues
Rely on smoke tests to catch integration issues
Monitor via tmux status indicators (see Part 2) or run the supervisor prompt

Persona-Driven Architecture

Assigning roles to sessions improves output quality, but I use personas differently than you might expect.

I use Robert C. Martin (Uncle Bob) as the planner and architect. When breaking down a complex feature into parallel tasks, I ask the planner session:

You are Robert C. Martin (Uncle Bob). Review this feature request and break it
down into clean, well-separated tasks for parallel implementation. Focus on
SOLID principles and clear interfaces between components.

This gives me a work breakdown that follows clean architecture principles: small, focused components with clear responsibilities.

Then for the worker sessions (the ones actually implementing the tasks), I experiment with different prompts. Sometimes specific personas help:

“You are obsessed with performance and correctness” for algorithm-heavy code
“You are paranoid about edge cases and defensive programming” for input validation
“You value simplicity above all else, avoid any unnecessary complexity” for utility functions

Other times, I just use the task description from the planner without additional persona framing. I’m still experimenting with what works best for different types of work.

What’s Missing

Current gaps in my supervisor pattern:

No automatic conflict detection: I manually ensure sessions don’t edit the same files
No rollback mechanism: If Session A breaks tests, I manually revert
No progress tracking: I eyeball tmux windows instead of having a dashboard
No automatic merging: I manually integrate changes from parallel sessions

These are ripe for automation. The MCP coordination server would solve 1-3. Number 4 might need a specialized “merger” session that reads changes from all other sessions and integrates them.

Knowledge Accumulation Over Time

Traditional LLM conversations are ephemeral. Each session starts fresh. But with memento, knowledge compounds.

Example workflow:

Week 1: I discover that MCP servers should validate input strictly
I add to global context: “MCP servers must validate all inputs and return clear error messages”
Week 2: Claude builds a new MCP server, automatically applies that pattern
Week 3: I discover another pattern (connection pooling), add it to global context
Future sessions: Apply both patterns automatically

Over months, my global context evolved from 50 lines to 500+ lines of hard-won knowledge. New Claude sessions are more productive from day one.

The Memento Notes Index

To make knowledge discoverable, I maintain a memento-notes-index:

## Development & Technical Guides

- **mcp-server-patterns**: Patterns for building reliable MCP servers
- **smoke-test-paradigm**: Why smoke tests work better than unit tests
- **elisp-testing-guide**: Fast testing with doomscript
- **code-review-guide**: How to review code and log issues for AI

## Quick Lookup by Use Case

- Building MCP servers → `mcp-server-patterns`
- Emacs development → `elisp-testing-guide`
- Testing frameworks → `smoke-test-paradigm`

When Claude asks “How should I structure this?”, I can say: “Check the notes index for relevant guides.”

Key Learnings

After 6 months of parallel LLM workflows:

Smoke tests are a game-changer. They catch 90% of issues with 10% of the effort of comprehensive test suites.

Shared context is essential. Without memento, each session reinvents the wheel. With it, knowledge compounds.

Personas improve output quality. “Be Uncle Bob” consistently produces cleaner code than “refactor this.”

Informal coordination works at small scale. For 3-5 parallel sessions, a shared note is enough. Beyond that, I’ll need real tooling.

Every discovery should be captured. If I solve a problem once, I never want to solve it again. Write it down in memento.

What’s Next

The patterns in this article work but aren’t fully automated. I’m manually coordinating sessions, manually managing shared context, manually merging changes.

Part 4 covers experiments and works-in-progress: the project explorer tool, Emacs integration for code review, diff workflows, and ideas that didn’t quite work out.

Part 5 shifts to learning: using Claude to generate flashcards, worksheets, and annotated code for studying complex topics.

The memento system is open source at github.com/charignon/memento. The global context patterns are in my CLAUDE.md. The flashcards smoke tests are at github.com/charignon/flashcards.