muesli
Most voice-note apps are good at capture and bad at retrieval. You can record thoughts quickly, but finding the right moment later usually means scrubbing through long audio files.
I built Muesli to fix that. It records both microphone input and system audio, transcribes them live, stores the result as markdown, and makes the archive searchable with embeddings.
dual audio capture
The main feature is recording two streams at once:
- microphone audio for your own commentary
- system audio for meetings, tutorials, podcasts, or anything else playing through the computer
On macOS, system capture uses a small Swift binary built on ScreenCaptureKit. On Windows and Linux, the app falls back to getDisplayMedia with audio.
Both streams are transcribed separately and shown side by side, which makes a big difference in practice. You can keep your own notes distinct from the source material instead of mixing everything into one transcript.
transcription and search
Muesli uses DeepL's Voice API over WebSocket for streaming transcription. Audio is chunked, normalized, sent upstream, and rendered back into the UI as partial and final results arrive.
The second layer is semantic search. Each note is embedded with OpenAI and indexed in a local LibSQL database, so queries like "authentication discussion" can still find notes that talk about OAuth or login flows without using the exact same words.
local-first storage
Every session is saved as a plain markdown file with frontmatter:
---
id: note-abc123
createdAt: 2024-09-15T10:30:00Z
updatedAt: 2024-09-15T11:45:00Z
---
# Meeting Notes: Product Sync
## User Transcript
[10:30:15] We need to prioritize the authentication flow
## System Transcript
[10:30:45] [Video audio] "Best practices for OAuth 2.0..."
I chose markdown because it keeps the data portable. Notes can be opened in any editor, versioned with Git, and kept outside the lifetime of the app itself. Besides this, because they are local, they are easily accessible to coding agents, like Claude Code.
AI features around the notes
Once the transcription and storage were solid, I added a few higher-level tools:
- dictation, similar to Wispr Flow
- transcript summaries
- chat over the archive
- tool-calling that can trigger semantic search automatically
The prompts live as markdown files inside the app resources, so behavior can be changed without rewriting core logic.
stack
Muesli is built with:
- Electron
- React
- DeepL Voice API
- OpenAI embeddings and chat
- Mastra RAG
The key product choice was to get the core loop working first: capture, transcribe, save, search. Everything else was added on top of that.