Building Software Faster with LLMs: Part 4 - The Way We Build Software Is Rapidly Evolving

Note: This article series was written by me, with LLMs helping to refine the style and structure.

Part 1 identified the problems. Part 2 covered ergonomics. Part 3 showed coordination patterns.

This article covers tools that became obsolete, workflows that didn’t pan out, and lessons learned from building at the edge of what works.

Series Navigation: ← Part 3: Abstractions | Part 5: Learning →

The Project Ingester: Solving Yesterday’s Problem

Before Sonnet 4.5, exploring a codebase was slow. Reading 20 files meant 20 sequential API calls and 10+ minutes of setup time.

I built project-ingest to solve this—output a single markdown document with project structure, key file contents, and dependency graph. Claude could ingest it in one shot instead of reading files incrementally.

Before (Sonnet 3.5):

  • Run project-ingest → 15 seconds
  • Claude reads summary → 5 seconds
  • Total: 20 seconds

After (Sonnet 4.5):

  • Claude reads 20 files directly → 8 seconds
  • Total: 8 seconds

The tool became slower than the problem it solved.

When It’s Still Useful

I haven’t deleted it. It’s valuable for:

  1. Very large codebases (100+ files) - still faster for high-level view
  2. Project snapshots - capturing state at a point in time
  3. Documentation - overview for human readers
  4. Cross-project analysis - comparing architecture

But for everyday “help me understand this project” tasks? Obsolete.

The Lesson

Build for today’s constraints. The tool was perfect for its time. Model improvements made it unnecessary. That’s success, not failure.

Code Review Logger: Decoupling Discovery from Fixing

When LLMs generate code at scale, you produce a lot of code fast. Too fast to carefully review every change in real-time.

The problem: I’m reading through hundreds of lines of Claude-generated code. I spot issues—unclear function names, generic error handling, repeated patterns. But I’m in discovery mode, trying to understand the whole picture. Stopping to craft detailed prompts for each fix kills momentum.

What I need: a fast way to point and give hints to steer in the right direction, then batch all the corrections and let Claude work on them later.

The workflow:

  1. Mark issues at exact lines while browsing
  2. Keep reading without losing flow
  3. Later, batch all issues together and have Claude fix them

The Emacs Integration

I built code-review-logger.el:

;; While reviewing code in Emacs:
;; SPC r c - Log comment at current line
;; SPC r r - Log comment for selected region
;; SPC r o - Open review log

(defun code-review-log-comment (comment)
  "Log a review comment with file/line tracking"
  (let* ((file (buffer-file-name))
         (line (line-number-at-pos)))
    (code-review-format-entry comment file line "TODO")))

This creates entries in ~/code_review.org:

** TODO [[file:~/repos/memento/src/cli.py::127][cli.py:127]]
   :PROPERTIES:
   :PROJECT: memento
   :TIMESTAMP: [2025-09-30 Mon 14:23]
   :END:
   This error handling is too generic - catch specific exceptions

** TODO [[file:~/repos/memento/src/search.py::89][search.py:89]]
   :PROPERTIES:
   :PROJECT: memento
   :TIMESTAMP: [2025-09-30 Mon 14:25]
   :END:
   Add caching here - search is called repeatedly with same query

The Workflow

  1. Review code in Emacs (syntax highlighting, jump-to-def, all IDE features)
  2. Mark issues as I find them (SPC r c for quick comment)
  3. Trigger the automated fix process: Read code-review-llm-prompt-template and follow it
  4. Claude automatically:
    • Reads ~/code_review.org for all TODO items
    • Fixes each issue in the actual code
    • Runs make test after every change
    • Marks items as DONE only when tests pass

The entire workflow is encoded in a memento note that Claude reads. Contains:

  • Review format specification
  • Priority order (correctness → architecture → security → performance)
  • Testing requirements (always run make test, never leave tests failing)
  • Complete fix-and-verify process

Why This Works

Batch processing is more efficient than interactive fixes:

  • Claude sees all issues at once, plans holistically
  • No back-and-forth during fixing
  • Tests run after every change
  • Clear audit trail

Emacs integration solves the “review without IDE” problem:

  • In my editor with all tools
  • Jump to definitions, search references, check blame
  • Clickable org links to code

Structured format means precise instructions:

  • Exact file paths and line numbers
  • Context about the issue
  • Project name for multi-repo workflows

Current State

Fully automated for the fix workflow. I just say: Read code-review-llm-prompt-template and follow it

Claude then processes all TODO items, fixes issues, runs tests, marks items DONE. Never leaves the codebase with failing tests.

Key Learnings

Embrace obsolescence. If a tool becomes unnecessary because the problem disappeared, that’s progress.

Perfect is the enemy of done. The code review logger works even though it’s not fully automated. Ship it.

Build for constraints, not aspirations. Don’t future-proof. Solve today’s problem with today’s constraints.

Fast feedback beats comprehensive coverage. Quick hints during review, batch fixes later. Speed of iteration matters more than perfection.

The Future: Moving Beyond Supervision

The pattern of tools becoming obsolete points to something bigger. Right now, I’m building tools to supervise one or more LLMs working together. But this is a transitional phase.

We’re moving toward a different form of collaboration—one where we identify the tasks that absolutely have to come from a human, and delegate everything else.

What must remain human:

  • Steering direction: What problem are we actually solving?
  • Final decisions on UX: How should this feel to users?
  • Architectural trade-offs: What complexity is worth accepting?
  • Quality standards: What level of polish matters for this?

What LLMs can increasingly handle:

  • Implementation details
  • Test coverage
  • Documentation
  • Refactoring
  • Performance optimization
  • Debugging

The project ingester became obsolete because models got faster at reading files. The code review logger works because it focuses my human effort on spotting issues, not fixing them. The pattern: human judgment for direction, LLM execution for implementation.

As models improve, the boundary shifts. Tasks that require human oversight today become fully automated tomorrow. The tools we build now are scaffolding—useful for this moment, likely obsolete soon.

The Societal Challenge

But we can’t ignore the larger implications. This shift isn’t just about productivity—it’s about what happens to the people whose expertise becomes less essential.

What does it mean when:

  • Junior developers find fewer entry-level opportunities because LLMs handle beginner tasks?
  • Mid-level engineers see their core skills automated away faster than they can adapt?
  • The gap between “steering direction” and “writing code” leaves fewer rungs on the career ladder?

The technical solutions—better tools, better workflows—are the easy part. The hard questions are political and societal:

How do we ensure the gains from AI-augmented productivity are distributed fairly? If a small group of people supervising LLMs can do the work that previously required large teams, who benefits from that efficiency? The workers who are displaced, or the companies and shareholders who capture the value?

What safety nets and retraining programs do we need? The pace of change is faster than traditional education and career transitions can handle. Moving people from “implementer” to “director” roles requires more than technical training—it requires fundamentally different skills and mindsets.

How do we preserve the learning paths that create expertise? If LLMs handle all junior-level work, how do people develop the judgment needed for senior roles? Expertise comes from doing, debugging, and making mistakes. When those learning opportunities disappear, where do the next generation of experts come from?

I don’t have answers to these questions. I’m optimizing my own workflow, building tools that make me more productive. But I recognize that scaling these patterns across the industry has consequences beyond individual efficiency gains.

For a deeper exploration of where this trajectory leads, I recommend reading The Race to AI Supremacy—it examines the broader societal and political implications of rapidly advancing AI capabilities.

The tools becoming obsolete is progress. But progress for whom, and at what cost? Those are questions we need to grapple with collectively, not just as individuals optimizing our workflows.

What’s Next

Part 5 covers using Claude as a learning tool: generating flashcards, creating annotated worksheets, and building a spaced-repetition system for technical concepts.


Continue Reading: Part 5: Learning and Knowledge Accumulation →


If you’re interested in any of the tools or patterns mentioned in this series, feel free to reach out. I’m happy to discuss what you find compelling and share more details.