Building with LLMs at Scale: Part 1 - The Pain Points

Working with 10 parallel LLM coding sessions exposes problems that don’t appear at smaller scale. Managing multiple conversations, maintaining context across sessions, and ensuring quality all require different approaches than single-session work.

This series documents those problems and the solutions that emerged. The tools shown use Claude Code and Emacs, but the patterns apply broadly to any LLM workflow.

The Pain Points

The problems:

Managing Multiple Conversations - 10 terminal windows, no visibility into which sessions need attention
Lost Context - No audit trail of past sessions or decisions made
Quality & Regressions - LLMs fix one thing, break another
Language-Specific Edit Challenges - Parenthesis balance issues in Lisp
Project Exploration Speed - 10+ minutes to load a 20-file project
Context Switching Between Sessions - No shared knowledge between parallel sessions
Review Without Full IDE Context - Reviewing diffs without syntax highlighting and jump-to-def
No Long-Term Memory - Every session starts from scratch
Parallelization Challenge - Coordinating multiple LLMs working simultaneously
Safety and Access Control - Too easy to grant access to private data

Let’s dive into each of these.

Problem 1: Managing Multiple Conversations

Picture this: 10 terminal windows, each running a different LLM session. One is refactoring your note system, another is debugging a home automation script, a third is implementing a new feature. Zero visibility into which needs your attention.

The problem becomes clear when context switching:

Which session is waiting for input?
Which is still processing?
Which finished 10 minutes ago and has been idle?

Without state tracking across sessions, every context switch means manually checking each window. You switch to a session only to find the LLM finished 10 minutes ago while you were focused elsewhere.

Problem 2: Lost Context

Open a project you worked on last week with an LLM. The code looks unfamiliar. You don’t remember writing it. Questions arise:

What was the original prompt?
Did I review this properly?
What architectural decisions were made?
Why this approach instead of alternatives?

Without an audit trail of past sessions, there’s no way to reconstruct the reasoning behind the code. You’re essentially trusting that past-you made good decisions—but you have no record of what those decisions were.

Automatic context compaction makes this worse. LLMs will drop older messages to fit within token limits, but I want explicit control over what gets retained from session to session, not an algorithm deciding what’s “important.”

Problem 3: Quality and Regressions

Whack-a-mole development: LLMs fix one issue and silently break another. The problem wasn’t the LLM’s capabilities—it was my process. I was treating LLM sessions like conversations with a developer I trusted to test their own code.

The first solution: treat every change like a pull request. Tests must pass.

# After every LLM change
make test  # Must pass before continuing

This catches regressions but doesn’t solve architectural consistency. Code generated across dozens of separate sessions felt scattered, like it was designed by committee where no one talked to each other.

The second solution: persona-based prompts. Instead of “Refactor this code”:

You are Robert C. Martin (Uncle Bob). Review this code and refactor
it according to clean code principles.

The difference was striking. Suddenly: smaller functions, better separation of concerns, consistent naming conventions across the codebase.

You can use different personas for different needs. Want paranoid security review? “You are a security-minded, paranoid QA engineer who trusts nothing.” Need simplicity? “You are obsessed with reducing complexity and eliminating unnecessary abstractions.” The persona focuses the LLM’s attention on specific concerns.

Problem 4: Language-Specific Edit Challenges

Lisp-based languages (Elisp, Clojure, Scheme) are harder for LLMs to edit because of parenthesis balance.

The problem: Remove one closing paren and get “end-of-file during parsing” with no location. The error could be 200 lines away from the actual edit.

The feedback loop:

LLM edits code
Compile fails
Hunt for unbalanced paren manually
Fix and retry

This affects any language with nested structure spanning many lines: deeply nested JSON, XML, etc.

The solution: validation tooling that gives precise error locations. Without that, you’re debugging blind.

Problem 5: Project Exploration Speed

New codebase? Get ready to spend 10+ minutes on initial exploration. A 20-file project means feeding files one by one to the LLM, waiting for API calls, managing context windows.

This creates a cold-start problem. Every new project or every time you switch projects means a lengthy ramp-up period before the LLM has enough context to be productive.

The solution: a way to efficiently snapshot and load project context—not just individual files, but the structure, key patterns, and architectural decisions all at once.

Problem 6: Context Switching Between Sessions

I’d discover a great pattern in session A. Session B, working on a related problem, had no idea it existed.

Each LLM conversation was an island. Problems with this isolation:

Can’t share knowledge between sessions
Contradictory decisions across different LLM instances
Manual copy-paste required to propagate learnings
If I made an architectural decision in conversation A, conversation B would make a different one

The solution: a shared context system where different LLM sessions can coordinate and learn from each other.

Problem 7: Review Without Full IDE Context

Code review without your IDE is code review on hard mode.

The LLM generates a diff. You’re looking at it in a terminal or web interface. You’re missing:

Syntax highlighting
Jump-to-definition
Project-wide search
Static analysis
Your configured linters

Example: The LLM renames process() to process_data(). Questions you can’t answer:

What calls this function?
Is this part of a larger refactoring?
Did it affect other functions that depend on it?

Tools like Cursor solve this with deep editor integration—the LLM changes happen natively in your IDE. But if you’re using terminal-based LLM tools or trying to integrate with Emacs/Vim, you need a workflow to bring LLM-generated changes into your full development environment.

Problem 8: No Long-Term Memory

Sessions had amnesia. Yesterday’s architectural decisions? Gone. Last week’s patterns? Forgotten.

Sure, I had a global CLAUDE.md file with preferences, but that was static. I couldn’t easily capture evolving patterns like:

“When working on MCP servers, always check the umcp wrapper patterns”
“The smoke test paradigm works better than unit tests for these projects”
“Remember that the memento CLI should never be called directly—use MCP”

These insights lived in my head, not in a form the LLM could access and build upon. Each new session started from zero, unable to leverage the accumulated knowledge from previous sessions.

Problem 9: Parallelization Challenge

I wanted parallel LLM sessions building different parts of the same project. Chaos ensued.

The ideal workflow:

Session A: implements a feature
Session B: writes tests for that feature
Session C: updates documentation
Session D: reviews the changes from A, B, and C

But coordinating multiple LLM sessions is harder than coordinating humans. Problems:

Sessions can’t see each other’s progress
No natural communication channel between sessions
They’ll happily work on the same file and create conflicts
No way to express dependencies (Session B needs Session A to finish first)

The solution: orchestration patterns to divide tasks, prevent conflicts, and merge results without manual intervention.

Problem 10: Safety and Access Control

When you’re in flow, you say ‘yes’ to everything. That’s how the LLM reads your private notes.

Claude Code prompts have become like cookie consent banners or Terms of Service pages. You’ve seen the prompt 50 times today. “Do you want to let Claude read this file?” Yes. “Run this command?” Yes. “Search this directory?” Yes. Decision fatigue sets in. You stop reading carefully. You just click yes to make the prompt go away and get back to work.

This is exactly how website designers exploit users with cookie banners—they know after the 10th website, you’ll just click “Accept All” without reading. The same psychological pattern applies to LLM tool use.

I discovered a serious problem when building my note management system. Despite explicit prompts telling the LLM “do NOT access private notes,” I’d occasionally review logs and find it had read private files anyway. This wasn’t malicious—the LLM was trying to be helpful, pattern-matched similar file paths, and I’d reflexively approved the request without carefully reading which specific file it wanted.

Risk areas where this becomes dangerous:

Personal notes or journals
Configuration files with API keys or tokens
Any sensitive data mixed with development work

The fundamental tension:

Speed vs Safety: Careful review of every action slows you down
Context vs Control: The LLM needs broad context to be useful, but that increases risk
Automation vs Oversight: You want automated workflows, but automation can bypass safety checks

The real solution isn’t better logging—it’s making the wrong thing impossible by design. Don’t rely on prompts or careful review. Build systems where sensitive data simply can’t be accessed.

For my note system, I mark notes as PUBLIC in org-mode by setting a property. Only PUBLIC notes are accessible to the LLM via MCP. The system enforces this at the API level—no amount of prompt engineering or reflexive approval can expose private notes.

But this pattern doesn’t scale well to code. You can’t mark every file in a codebase as PUBLIC or PRIVATE.

A more scalable approach: leverage Unix file permissions. Make LLM tools run as a specific user or group with restricted permissions:

Private files: chmod 600 (owner-only)
Public files: chmod 644 (world-readable)
LLM runs as different user/group: physically cannot read private files

This enforces access control at the OS level. The LLM tool literally can’t open the file, regardless of prompts or approval. You could even use chattr +i on Linux to make sensitive files immutable.

The challenge: this requires discipline in setting permissions and may conflict with normal development workflows. But it’s the right direction—making violations impossible, not just logged.

Other needed patterns:

Directory-level access control (allow ~/projects/blog, block ~/.ssh)
Pattern-based restrictions (block *.env, *credentials*, *secrets*)
API-level enforcement that tools can’t bypass
Audit trails that make violations obvious

Until we solve this systematically, the onus is on us to be vigilant—and that’s exhausting when you’re trying to move fast.

The Solutions

Ergonomics (Part 2): Terminal integration showing LLM state, telemetry tracking all sessions, logging every command
Abstractions (Part 3): Shared context between sessions, smoke test paradigm, coordinating parallel LLMs
Experiments (Part 4): Project exploration tools, diff review workflows, lessons from failures
Learning (Part 5): Flashcard generation, annotated code worksheets, spaced repetition

The next articles show how each works.

What’s Next

Part 2: Ergonomics Layer (coming soon) - Terminal integration for managing multiple LLM sessions, telemetry and logging infrastructure that makes everything auditable.

Part 3: Higher-Level Abstractions (coming soon) - Shared context systems for long-term memory, smoke tests as the foundation of quality, patterns for coordinating multiple LLM sessions.

Part 4: Works in Progress (coming soon) - Project exploration tools, diff review workflows, experiments that didn’t quite pan out (but taught me valuable lessons).

Part 5: Learning & Knowledge (coming soon) - Using LLMs to generate flashcards, worksheets, and heavily-annotated code for studying complex topics.