Building with LLMs at Scale: Part 2 - Ergonomics and Observability

Part 1 described the problems of managing multiple LLM sessions. This article shows the ergonomic layer that solves them: visual indicators, session recording, logging, and telemetry.

The Complete Picture

Here’s my workflow using tmux to manage multiple LLM sessions.

Tmux is a terminal multiplexer—it lets you run multiple terminal sessions inside a single window and switch between them quickly. Think of it like having tabs in a browser, but for your terminal. You can have one tmux session with 10 different windows, each running a different LLM conversation, and easily switch between them with keyboard shortcuts.

Here’s how it works in practice:

I open a new tmux window (like opening a new tab) and start an LLM session—maybe Claude Code working on a bug fix. Metrics tracking begins automatically in the background. As the LLM works, my tmux status bar (the line at the bottom of the terminal) shows a 🤖 emoji next to that window’s name. I can glance at the status bar and instantly see that window 3 is busy with an LLM.

When the LLM finishes and waits for my input, the emoji changes to 💬. If I’m currently in a different window (say, window 5 where I’m reviewing code), I just press ` n (backtick followed by n) to jump directly to the waiting session. No manually cycling through windows, no remembering which number it was.

Every context switch gets recorded with a timestamp. A week later, when I need to understand what happened in that session—what prompts I gave, what the LLM suggested, what decisions were made—I can query the session history and replay the logs.

The Visual Layer: Terminal Session Management

Problem: 10 terminal windows (or in tmux terminology, 10 windows within one tmux session), each running a different LLM conversation. No visibility into which LLM needs attention.

Solution: emoji indicators showing window state in the tmux status bar.

💬 memento        # LLM waiting for input
🤖 appdaemon      # LLM actively working
📝 config         # Editor open
🐍 analyzer       # Python script running
⌨️ bash           # Shell waiting for command

Window Status Script

The tmux-window-status script analyzes each tmux pane (a pane is like a split section within a window) and adds contextual emojis. Here’s how it works:

Capture recent output: Grab the last 100 lines of text from the pane
Detect LLM patterns: Look for LLM-specific text like > prompts or dialog boxes asking “Do you want to…”
Check the process: See what command is actually running in that pane
Return the right emoji: Based on what we found, add the appropriate emoji to the window name

Here’s the key detection logic:

check_llm_waiting() {
    local pane_content="$2"
    local last_lines=$(echo "$pane_content" | tail -5)

    # Check for common LLM prompts
    if echo "$last_lines" | grep -qE "^>\s*$|^> "; then
        return 0  # LLM is waiting
    fi

    # Check for dialog boxes
    if echo "$last_lines" | grep -qE "Do you want to|❯.*Yes"; then
        return 0  # Waiting for decision
    fi

    return 1  # Not waiting
}

Jump to Next Waiting Window

The tmux-next-waiting script cycles through windows where an LLM is waiting. It loops through all your tmux windows, checks which ones have the 💬 emoji (meaning an LLM is waiting for input), and jumps to the next one after your current window:

#!/bin/bash
# Find all windows with 💬 emoji (LLM waiting)
windows_waiting=""
for window in $(tmux list-windows -F "#{window_index}"); do
    formatted_name=$(~/bin/tmux-window-status "$window_name" "$pane_id")
    if echo "$formatted_name" | grep -q "💬"; then
        windows_waiting="$windows_waiting $window"
    fi
done

# Jump to next waiting window after current
# (wraps around to first if at end)

To use this, bind it to a tmux key in your tmux configuration (~/.tmux.conf):

bind-key n run-shell "~/bin/tmux-next-waiting"

Now pressing ` n (assuming you’ve set ` as your tmux prefix key) jumps to the next LLM session that needs attention. The prefix key is like a “modifier” that tells tmux “the next key is a command for you.” With this setup, switching is fast: ` 1 goes to window 1, ` TAB toggles to your last window, ` n finds the next waiting LLM.

The Logging Layer: Complete Auditability

Remember the problem from Part 1? Code written last week is unrecognizable without session history. You need to understand what the LLM did, what decisions were made, and why certain approaches were taken.

The solution: record everything. I use asciinema, a terminal session recorder, to capture complete LLM sessions. Unlike text logs (which just save the text), asciinema records the actual terminal output with timing information—think of it like a video recording of your terminal session. You can replay sessions later and see exactly what appeared on screen, when it appeared, and in what order.

For complex refactoring sessions or experiments, I use this wrapper script:

#!/usr/bin/env bash
# llm-record - Record LLM sessions with asciinema

RECORDING_NAME="${1:-llm-$(date '+%Y%m%d-%H%M%S')}"
RECORDINGS_DIR="${HOME}/llm-recordings"
RECORDING_FILE="${RECORDINGS_DIR}/${RECORDING_NAME}.cast"

asciinema rec \
    --title "LLM Session: ${RECORDING_NAME}" \
    --idle-time-limit 10 \
    "${RECORDING_FILE}"

The --idle-time-limit 10 flag compresses long waits (like when the LLM is thinking or making API calls) to 10 seconds in playback. This makes replaying sessions much faster—you’re not sitting through minutes of “Processing…” messages.

When Claude Code encounters bugs or issues, I can extract the exact terminal transcript with asciinema cat and share it. This works around a limitation in current LLM tools: they don’t have built-in access to session history, so providing a complete transcript helps them understand what went wrong.

The Telemetry Layer: Metrics and Patterns

Visual indicators solve the immediate “which window needs attention?” problem. But I wanted to understand deeper patterns: how many parallel sessions do I actually run? When am I most productive? Which projects consume the most time?

To answer these questions, I built a telemetry system using Prometheus—an open-source monitoring system originally built at SoundCloud. Prometheus collects metrics (numerical measurements) over time and lets you query them later. A background script runs every 15 seconds, collecting metrics about my tmux environment and LLM sessions.

The script tracks session-level metrics like total tmux sessions, windows per session, and which sessions are actively attached. It also captures LLM-specific data: the number of active LLM processes, memory usage per session (RSS in MB), CPU usage, session duration in minutes, and the working directory for each session.

All of this gets pushed to a Prometheus pushgateway:

LLM_SESSIONS_TOTAL = Gauge('llm_sessions_total',
    'Total number of active LLM sessions')

LLM_SESSION_MEMORY_MB = Gauge('llm_session_memory_mb',
    'Memory usage per LLM session in MB',
    ['pid', 'command'])

LLM_SESSION_DURATION_MINUTES = Gauge('llm_session_duration_minutes',
    'Duration of LLM session in minutes',
    ['pid', 'command'])

What This Reveals

With proper dashboarding, the metrics answer practical questions. When are you most productive? You can see which times of day correlate with longer, more focused sessions. Which projects consume the most time? Resource usage aggregated by working directory shows exactly where hours go. Do you context-switch too much? Tracking window switches per hour reveals patterns you might not consciously notice.

The data also catches problems early. If session memory usage steadily climbs over time, you know something’s leaking. If you’re consistently running 8+ parallel sessions, maybe your workflow needs simplification.

Prometheus makes it easy to query historical patterns and correlate them with specific projects or time periods. The metrics themselves don’t make you productive, but they reveal patterns that inform better workflow decisions.

Key Learnings

Visual indicators eliminate the “which window?” hunt
Complete session history invaluable for debugging
Metrics reveal workflow patterns you don’t consciously notice
Record complex sessions, not everything
Automation essential—manual logging fails

Are LLMs Making Us More Productive?

The tools in this article—tmux integration, session recording, telemetry—exist because I’m managing 10 parallel LLM coding sessions. But that raises the obvious question: are LLMs actually making me more productive at writing code?

I don’t believe that’s the case for everyone using them. Handing an LLM to a developer without workflow engineering is like giving someone a race car without teaching them to drive. They might go faster on straightaways, but they’ll crash on the first turn.

But if you know how to use them—if you build the right workflows, enforce quality with tests, coordinate multiple sessions, and maintain proper oversight—they’re a game changer. The productivity gains are real, but they’re not automatic. They come from deliberate workflow design.

The ergonomic layer in this article is what makes those gains possible. Without visibility into session state, without audit trails, without metrics to understand patterns, you’re flying blind. The tools don’t make LLMs productive—they make you productive when using LLMs.

What’s Next

The ergonomics layer makes individual sessions manageable. But coordinating multiple LLM sessions to work together without conflicts requires higher-level abstractions.

Part 3: Higher-Level Abstractions covers shared context systems for long-term memory, the smoke test paradigm for quality, and patterns for running a “team” of LLM instances on a single project.