Channel-split only
Works only on stereo recordings where each speaker is on a separate audio channel — Zoom/Meet exports, two-mic studio setups.
Speaker diarization splits a multi-speaker recording into labelled turns — Speaker 1, Speaker 2 — with timestamps. Rename to real names, get a citable, speaker-attributed transcript.
MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously
YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more
↓ Four voices, four labels
Drop a panel recording, an interview, or a Zoom export — the model splits voices into labelled turns with timestamps. Channel-split for stereo (each speaker on a separate track), pyannote for mono recordings where everyone shares one mic.
The thing about diarization is everybody wants it to be one number, but it's really four separate problems stacked.
Detection, attribution, overlap handling, and label persistence across breaks. Different failure modes on each.
Right. So when we say '95% accuracy' on diarization, it depends on which
↓ This is the Speakers tab
Rename Speaker 1 → Mary Chen, Speaker 2 → David Park. The chip names propagate across the entire transcript, summary, and exports. Filter the transcript to one speaker — useful for journalists pulling quotes from a specific source.
Sample preview from a 4-speaker panel discussion about audio engineering tradeoffs. The Speakers tab lets you rename labels, filter the transcript to one voice, and export per-speaker SRT files — workflow used by journalists, qualitative researchers, podcast editors.
Three ways to separate speakers · honest comparison
Three real ways to get speaker labels in a transcript in 2026. Channel-split is exact but only works if you record stereo. AI integrated handles mono recordings. Human is the legal-grade fallback.
Works only on stereo recordings where each speaker is on a separate audio channel — Zoom/Meet exports, two-mic studio setups.
Stereo files use exact channel-split. Mono uses pyannote-3.1 clustering. Same dashboard, same export, same speaker chips regardless of source.
Person listens to the recording, types speaker labels by hand. Highest accuracy on overlap and label persistence across long files.
Channel-split accuracy from first principles (separate channels are deterministic). AI diarization from pyannote-3.1 published benchmarks. Human relabeling rates from US/UK industry rate cards.
Accuracy · real-world numbers
Speaker diarization accuracy depends mostly on speaker count and overlap frequency, not language or microphone (those affect transcription itself, not the speaker-label layer). Numbers below come from our internal QA on real customer recordings across 2025.
Each speaker recorded on a separate audio channel — Zoom/Meet exports, two-mic studio podcasts. Diarization is deterministic at this point.
The sweet spot for most professional work. Distinct voices, low overlap, good microphone distance. Usable without a review pass.
Standard panel scenario. Plan a 1-minute rename pass on the speaker chips after the job finishes; accuracy bumps to 95%+ post-rename.
Conference panels, debate recordings, group brainstorms. Diarization still works but expect 2–3 chip merges that need a manual fix during review.
Common questions
Diarization is included on Pro ($19/mo) and Business ($49/mo). Free plan transcribes the same audio without speaker labels — useful to evaluate raw transcription accuracy first.
Start free transcription