Flow State
macOS focus coach that watches screen context and gives voiced, character-driven feedback in real time.
Flow State desktop demo
YouTube demo slot ready. Add a youtubeId in content/projects.ts to render the embedded demo.
Metrics / Signals
Problem
What this project solves
Task lists capture intentions, but they do not notice when someone drifts off-task or help them return to the session.
Architecture
How it is put together
Flow State runs as an Electron background app with a tray menu, draggable overlay character, configurable screen capture cadence, Claude Vision analysis for on-task detection, Claude text generation for character dialogue, ElevenLabs TTS, electron-store persistence, and Jest coverage around core modules.
Desktop agent
What It Does
Flow State is a macOS background focus coach that screenshots the screen every configurable interval, asks Claude Vision whether the user is working on their stated task, and generates voiced feedback through a selected character.
- Characters can praise on-task behavior or call out distraction.
- Session memory lets characters reference repeated behavior across the current focus session.
- Idle detection reacts when the user has been inactive for 30+ seconds.
Electron
Desktop Experience
- A tray menu opens settings and controls sessions.
- A small draggable character lives near the bottom-right of the screen.
- The transparent overlay uses click-through behavior with dynamic mouse-event toggling.
- electron-store persists API keys, settings, current task, and user preferences.
Runtime
AI Loop
The app coordinates a sequential async loop across screenshot capture, visual task analysis, character dialogue generation, and ElevenLabs speech playback. The loop can skip vision calls on unchanged screens to reduce token usage.
Product flavor
Character System
- Drill Sergeant: direct, military-style tough love.
- Disappointed Mom: affectionate but deeply let down.
- Anime Rival: competitive, dramatic, and motivational.
Decisions
Technical choices
- Used periodic screenshots rather than manual check-ins so the app can react to actual behavior.
- Separated character definitions, memory, screen capture, analysis, and speech into testable modules.
- Made the coach live as a small desktop overlay so feedback is ambient rather than buried in a dashboard.
Tradeoffs
Constraints and next choices
- Screen Recording permission adds setup friction, but it is necessary for truthful context-aware feedback.
- Voiced reactions make the experience vivid, but require careful settings and API-key handling.
- Skipping vision calls on unchanged screens reduces token usage, but requires screen-change and idle heuristics.