My Hammerspoon Screenshot Manager: From Chaos to AI-Powered Organization
I take a lot of screenshots. Screenshots of bugs, UI mockups, terminal output, error messages, documentation - you name it. The problem? macOS dumps them all on my Desktop with cryptic names like Screenshot 2024-12-14 at 10.42.13.png, and within a week my Desktop looks like a digital landfill.
So I built a screenshot manager in Hammerspoon that intercepts every screenshot, organizes it automatically, and gives me an action picker with AI-powered features.
The Problem With Screenshots
Here’s what happens without intervention:
- Take screenshot with Cmd+Shift+4
- File lands on Desktop
- Forget about it
- Desktop accumulates 47 screenshots
- Eventually rage-delete them all
- Realize one of them was important
The real issue isn’t just organization - it’s that screenshots are incredibly useful for interacting with LLMs. I constantly want to share screenshots with Claude or GPT to ask “what’s this error?” or “how do I fix this UI?”. But getting the image into the conversation requires:
- Finding the file
- Copying the path or dragging it in
- Sometimes converting formats
My Solution: Intercept Everything
The screenshot manager watches my Desktop folder for new screenshots using Hammerspoon’s pathwatcher:
watcher = hs.pathwatcher.new(DESKTOP_PATH, handleNewScreenshot)
watcher:start()
When a new screenshot appears, it:
- Moves it to an organized folder:
~/Pictures/Screenshots/2024-12/filename.png - Pops up an action picker with several options
- Lets me choose what to do before it disappears into the void
The Action Picker
When a screenshot is captured, I see a dark panel slide in from the right with these options:
Copy Path (for LLM) - This is my most-used action. It copies the full file path to clipboard, ready to paste into Claude Code or any terminal. The parenthetical “(for LLM)” reminds me why this exists.
Copy Image - Copies the actual image to clipboard, useful for pasting into Slack, documentation, or image editors.
Open in Aidoc - Uploads the screenshot to my local document analysis tool for OCR and text extraction.
Describe with LLM - This is the magic one.
AI-Powered Screenshot Description
The “Describe with LLM” feature sends the screenshot to my local LLM proxy and asks for a detailed analysis. But it’s not just a simple “what’s in this image?” prompt. I built it specifically for automation purposes:
local prompt = string.format([[Describe this screenshot in detail. The image is %dx%d pixels.
For EACH clickable/interactive UI element, provide:
- Description of the element
- **Pixel location**: approximate (x, y) coordinates of where to click
- Bounding box if possible: (x, y, width, height)
Include:
- What application or context is shown
- All visible UI elements (buttons, menus, text fields, toggles, icons)
- Any text content visible
- The layout and visual hierarchy
Format clickable elements like this:
- **[Button Name]** at (~X, ~Y) - description of what it does
]], imgWidth, imgHeight)
Why pixel coordinates? Because I want to eventually use this for GUI automation. If the LLM can identify where buttons are, I can script clicks.
Precision Levels
Different tasks need different quality levels. Describing a simple error dialog doesn’t need GPT-4 Vision - a local model works fine. But analyzing a complex dashboard? That might need more horsepower.
So the tool lets me pick:
- Low (Fast) - Quick local models, sub-second responses
- Medium - Better local models, a few seconds
- High - More capable models
- Very High (Best) - Top-tier vision models for complex analysis
The panel dynamically shows which precision levels are available based on what models are running locally.
The “Sensitive” Toggle
There’s a toggle at the top: “Sensitive (Local LLM)”. When enabled (the default), the screenshot only goes to local models - never to cloud APIs. This matters when screenshotting:
- Code with credentials visible
- Internal tools
- Personal information
- Anything I wouldn’t paste into a public chat
When disabled, it can use cloud models for better quality, but I consciously have to flip that switch.
The Progress Indicator
LLM vision requests take time. Rather than blocking with a loading spinner, I show a minimal pulsing indicator:
- Purple pulsing dot while processing (shows estimated time remaining)
- Green pulsing dot when ready (shows checkmark)
- Click the green dot to view the result
This lets me keep working while the description generates. When I see it turn green, I can click to view.
The Description Panel
Results appear in a scrollable panel with proper markdown rendering:
- Headers in cyan with different sizes
- Bullet points properly formatted
- Bold text highlighted
- Scroll with arrow keys
There’s also a “Source” toggle to view the raw markdown if needed (useful for copying into other tools).
Technical Details
The whole thing is about 1,200 lines of Lua. Key components:
File organization:
local function ensureDestDir()
local date = os.date("*t")
local yearMonth = string.format("%04d-%02d", date.year, date.month)
local destDir = SCREENSHOT_DEST .. "/" .. yearMonth
os.execute('mkdir -p "' .. destDir .. '"')
return destDir
end
Screenshot detection (handles English, French, and other locales):
local function isScreenshot(filename)
return filename:match("^Screenshot") or
filename:match("^Screen Recording") or
filename:match("^Capture d'écran") or
(filename:match("%.png$") and filename:match("^Screen"))
end
Canvas-based UI using Hammerspoon’s canvas API for a dark, futuristic aesthetic that matches my other Hammerspoon tools.
What I’d Build Next
A few ideas I haven’t implemented yet:
- Auto-describe: Automatically run a quick description on every screenshot and store it as metadata
- Search: Search through past screenshots by their AI-generated descriptions
- OCR extraction: Pull all text from screenshots automatically
- Clipboard history: Keep the last N screenshots accessible via hotkey
Why Hammerspoon?
I could have built this as a standalone app, but Hammerspoon offers:
- Always running (it’s my automation backbone anyway)
- Lua is fast and easy to iterate on
- Direct access to macOS APIs
- Canvas system for custom UIs
- Hotkey binding for instant access
Plus, the auto-reload feature means I can edit the Lua file and see changes immediately.
Try It Yourself
If you use Hammerspoon, the core pattern is simple:
- Watch a directory with
hs.pathwatcher - Filter for files you care about
- Pop up a canvas-based UI for actions
- Integrate with whatever services you use
The LLM integration assumes you have something like llm-proxy running locally, but the organization and basic actions work standalone.
Screenshots are too useful to let them rot on your Desktop. Intercept them, organize them, and make them work for you.