Building Software Faster with LLMs: Part 2 - Ergonomics and Observability
Note: This article series was written by me, with LLMs helping to refine the style and structure.
Part 1 described the problems of managing multiple LLM sessions. This article shows the ergonomic layer that solves them: visual indicators, session recording, logging, and telemetry.
Series Navigation: ← Part 1: Pain Points | Part 3: Abstractions →
The Complete Picture
Here’s my workflow using tmux to manage multiple LLM sessions.
Tmux is a terminal multiplexer—it lets you run multiple terminal sessions inside a single window and switch between them quickly. Think of it like having tabs in a browser, but for your terminal. You can have one tmux session with 10 different windows, each running a different LLM conversation, and easily switch between them with keyboard shortcuts.
Here’s how it works in practice:
I open a new tmux window (like opening a new tab) and start an LLM session—maybe Claude Code working on a bug fix. Metrics tracking begins automatically in the background. As the LLM works, my tmux status bar (the line at the bottom of the terminal) shows a 🤖 emoji next to that window’s name. I can glance at the status bar and instantly see that window 3 is busy with an LLM.
When the LLM finishes and waits for my input, the emoji changes to 💬. If I’m currently in a different window (say, window 5 where I’m reviewing code), I just press ` n
(backtick followed by n) to jump directly to the waiting session. No manually cycling through windows, no remembering which number it was.
Every context switch gets recorded with a timestamp. A week later, when I need to understand what happened in that session—what prompts I gave, what the LLM suggested, what decisions were made—I can query the session history and replay the logs.
The Visual Layer: Terminal Session Management
Problem: 10 terminal windows (or in tmux terminology, 10 windows within one tmux session), each running a different LLM conversation. No visibility into which LLM needs attention.
Solution: emoji indicators showing window state in the tmux status bar.
💬 memento # LLM waiting for input
🤖 appdaemon # LLM actively working
📝 config # Editor open
🐍 analyzer # Python script running
⌨️ bash # Shell waiting for command
Window Status Script
The tmux-window-status
script analyzes each tmux pane (a pane is like a split section within a window) and adds contextual emojis. Here’s how it works:
- Capture recent output: Grab the last 100 lines of text from the pane
- Detect LLM patterns: Look for LLM-specific text like
>
prompts or dialog boxes asking “Do you want to…” - Check the process: See what command is actually running in that pane
- Return the right emoji: Based on what we found, add the appropriate emoji to the window name
Here’s the key detection logic:
check_llm_waiting() {
local pane_content="$2"
local last_lines=$(echo "$pane_content" | tail -5)
# Check for common LLM prompts
if echo "$last_lines" | grep -qE "^>\s*$|^> "; then
return 0 # LLM is waiting
fi
# Check for dialog boxes
if echo "$last_lines" | grep -qE "Do you want to|❯.*Yes"; then
return 0 # Waiting for decision
fi
return 1 # Not waiting
}
Jump to Next Waiting Window
The tmux-next-waiting
script cycles through windows where an LLM is waiting. It loops through all your tmux windows, checks which ones have the 💬 emoji (meaning an LLM is waiting for input), and jumps to the next one after your current window:
#!/bin/bash
# Find all windows with 💬 emoji (LLM waiting)
windows_waiting=""
for window in $(tmux list-windows -F "#{window_index}"); do
formatted_name=$(~/bin/tmux-window-status "$window_name" "$pane_id")
if echo "$formatted_name" | grep -q "💬"; then
windows_waiting="$windows_waiting $window"
fi
done
# Jump to next waiting window after current
# (wraps around to first if at end)
To use this, bind it to a tmux key in your tmux configuration (~/.tmux.conf
):
bind-key n run-shell "~/bin/tmux-next-waiting"
Now pressing ` n
(assuming you’ve set `
as your tmux prefix key) jumps to the next LLM session that needs attention. The prefix key is like a “modifier” that tells tmux “the next key is a command for you.” With this setup, switching is fast: ` 1
goes to window 1, ` TAB
toggles to your last window, ` n
finds the next waiting LLM.
The Logging Layer: Complete Auditability
Remember the problem from Part 1? Code written last week is unrecognizable without session history. You need to understand what the LLM did, what decisions were made, and why certain approaches were taken.
The solution: record everything. I use asciinema, a terminal session recorder, to capture complete LLM sessions. Unlike text logs (which just save the text), asciinema records the actual terminal output with timing information—think of it like a video recording of your terminal session. You can replay sessions later and see exactly what appeared on screen, when it appeared, and in what order.
For complex refactoring sessions or experiments, I use this wrapper script:
#!/usr/bin/env bash
# llm-record - Record LLM sessions with asciinema
RECORDING_NAME="${1:-llm-$(date '+%Y%m%d-%H%M%S')}"
RECORDINGS_DIR="${HOME}/llm-recordings"
RECORDING_FILE="${RECORDINGS_DIR}/${RECORDING_NAME}.cast"
asciinema rec \
--title "LLM Session: ${RECORDING_NAME}" \
--idle-time-limit 10 \
"${RECORDING_FILE}"
The --idle-time-limit 10
flag compresses long waits (like when the LLM is thinking or making API calls) to 10 seconds in playback. This makes replaying sessions much faster—you’re not sitting through minutes of “Processing…” messages.
When Claude Code encounters bugs or issues, I can extract the exact terminal transcript with asciinema cat
and share it. This works around a limitation in current LLM tools: they don’t have built-in access to session history, so providing a complete transcript helps them understand what went wrong.
The Telemetry Layer: Metrics and Patterns
Visual indicators solve the immediate “which window needs attention?” problem. But I wanted to understand deeper patterns: how many parallel sessions do I actually run? When am I most productive? Which projects consume the most time?
To answer these questions, I built a telemetry system using Prometheus—an open-source monitoring system originally built at SoundCloud. Prometheus collects metrics (numerical measurements) over time and lets you query them later. A background script runs every 15 seconds, collecting metrics about my tmux environment and LLM sessions.
The script tracks session-level metrics like total tmux sessions, windows per session, and which sessions are actively attached. It also captures LLM-specific data: the number of active LLM processes, memory usage per session, CPU usage, session duration, and the working directory for each session.
What This Reveals
With proper dashboarding, the metrics answer practical questions:
When are you most productive? You can see which times of day correlate with longer, more focused sessions. Which projects consume the most time? Resource usage aggregated by working directory shows exactly where hours go. Do you context-switch too much? Tracking window switches per hour reveals patterns you might not consciously notice.
The data also catches problems early. If session memory usage steadily climbs over time, you know something’s leaking. If you’re consistently running 8+ parallel sessions, maybe your workflow needs simplification.
Prometheus makes it easy to query historical patterns and correlate them with specific projects or time periods. The metrics themselves don’t make you productive, but they reveal patterns that inform better workflow decisions.
Key Learnings
- Visual indicators eliminate the “which window?” hunt
- Complete session history invaluable for debugging
- Metrics reveal workflow patterns you don’t consciously notice
- Record complex sessions, not everything
- Automation essential—manual logging fails
Are LLMs Making Us More Productive?
The tools in this article—tmux integration, session recording, telemetry—exist because I’m managing 10 parallel LLM coding sessions. But that raises the obvious question: are LLMs actually making me more productive at writing code?
I don’t believe that’s the case for everyone using them. Handing an LLM to a developer without workflow engineering is like giving someone a race car without teaching them to drive. They might go faster on straightaways, but they’ll crash on the first turn.
But if you know how to use them—if you build the right workflows, enforce quality with tests, coordinate multiple sessions, and maintain proper oversight—they’re a game changer. The productivity gains are real, but they’re not automatic. They come from deliberate workflow design.
The ergonomic layer in this article is what makes those gains possible. Without visibility into session state, without audit trails, without metrics to understand patterns, you’re flying blind. The tools don’t make LLMs productive—they make you productive when using LLMs.
What’s Next
The ergonomics layer makes individual sessions manageable. But coordinating multiple LLM sessions to work together without conflicts requires higher-level abstractions.
Part 3: Higher-Level Abstractions covers shared context systems for long-term memory, the smoke test paradigm for quality, and patterns for running a “team” of LLM instances on a single project.
Continue Reading: Part 3: Higher-Level Abstractions →