Dictate started as a simple question: what if editing a document felt like talking to a smart assistant?
The core idea
Most AI writing tools work by generating text from a blank page. We wanted something different — a tool that helps you refine what you’ve already written, using your voice as the primary input.
The workflow is simple:
- You speak — the editor transcribes your voice continuously
- If you say a command (“make this shorter”, “add a paragraph about X”), the AI recognizes it
- The AI edits the relevant part of your document and shows you exactly what changed
Technical choices
We built Dictate on three main technologies:
Web Speech API — available in Chrome and Edge, handles continuous speech recognition without any external service. No audio is sent to our servers.
OpenRouter — a unified API for LLMs. We use it to route editing commands to the best available model. Currently defaulting to Gemini Flash for speed and cost.
Unified.js — the markdown parsing ecosystem. Documents are stored as a list of “entities” (paragraphs, headings, lists) which makes it easy to edit individual blocks without disrupting the rest.
The entity model
The key architectural insight was treating the document as a list of independent blocks rather than a single text string. Each block has an ID, a type (heading, paragraph, list), and content.
When the AI edits a block, it returns a new version of just that block. The rest of the document stays intact. This makes edits fast, predictable, and easy to undo.
What’s next
We’re working on:
- Multi-language voice support (already partially working)
- Better undo/redo with visual diff
- Export to more formats
Dictate is open to try — no account required, just bring an OpenRouter API key.