Building Dictate: an AI editor that listens

Dictate started as a simple question: what if editing a document felt like talking to a smart assistant?

The core idea

Most AI writing tools work by generating text from a blank page. We wanted something different — a tool that helps you refine what you’ve already written, using your voice as the primary input.

The workflow is simple:

You speak — the editor transcribes your voice continuously
If you say a command (“make this shorter”, “add a paragraph about X”), the AI recognizes it
The AI edits the relevant part of your document and shows you exactly what changed

Technical choices

We built Dictate on three main technologies:

Web Speech API — available in Chrome and Edge, handles continuous speech recognition without any external service. No audio is sent to our servers.

OpenRouter — a unified API for LLMs. We use it to route editing commands to the best available model. Currently defaulting to Gemini Flash for speed and cost.

Unified.js — the markdown parsing ecosystem. Documents are stored as a list of “entities” (paragraphs, headings, lists) which makes it easy to edit individual blocks without disrupting the rest.

The entity model

The key architectural insight was treating the document as a list of independent blocks rather than a single text string. Each block has an ID, a type (heading, paragraph, list), and content.

When the AI edits a block, it returns a new version of just that block. The rest of the document stays intact. This makes edits fast, predictable, and easy to undo.

What’s next

We’re working on:

Multi-language voice support (already partially working)
Better undo/redo with visual diff
Export to more formats

Dictate is open to try — no account required, just bring an OpenRouter API key.