Interview transcription.Different recording, same result.

Phone memo, Zoom call, lavalier rig, or handheld field recorder — drop the interview recording and get speaker-labeled, timestamped text you can quote from.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

↓ Watch what comes out

Two voices in. Two voices out, labeled.

Most interviews are two people on one device — a phone on the table, a recorder between you. We separate the interview audio into reporter and source even from a single mono channel, then timestamp every turn for citation.

Field recorder · WAVREC 2 speakers · 38:42
auto-detected en-US48 kHz mono · 1411 kbps
~90s
Transcript · streaming94% accuracy
S1

Can you walk me through what you saw the morning of the eighteenth?

S2

I got there around six. The loading bay door was already open, which it shouldn't have been.

S1

And you'd reported the door issue before — to whom?

S2

To Diane Okafor in facilities, twice in March. I have the emails.

94% on field WAVDOCX · TXT · SRT · JSON

↓ This is the dashboard

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

Try it on your own file — it's free

Three real options · honest comparison

Rev human. Otter or Trint. Or us.

Rev sends your audio to human transcribers — slow and pricey but high fidelity on hard audio. Otter and Trint are AI-first like us, tuned for journalists and researchers. Here's where each fits.

Option 01

Rev human transcription

Real people typing your interview. Best on hostile audio, but you wait and you pay.

Turnaround12–24 hours typical
Accuracy on clean audio99% (claimed)
Speaker labelsManual, included
LanguagesEN human · 30+ AI
Cost · per min$1.50 human · $0.25 AI
PrivacyAudio sent to contractors
Best forCourt-bound or publication-critical interviews on bad audio where you need a human ear and have a day to wait.
Option 02

Transcription.Solutions

AI transcript, speaker-split, ready in minutes. Same engine for phone memo, Zoom, or field recorder.

Turnaround~3 min per hour of audio
Accuracy on clean audio94–96%
Speaker labelsAuto · rename in editor
Languages99, auto-detected
Cost · per min$0.03
PrivacyAudio deleted in 24h · no training
Best forJournalists, researchers, and producers doing multiple interviews a week who need fast, citable text without uploading to a contractor.
Option 03

Otter / Trint

AI transcription with a research-oriented editor. English-strong, locked to monthly plans.

TurnaroundReal-time to ~5 min
Accuracy on clean audio~90–93%
Speaker labelsYes · EN-tuned
LanguagesOtter EN-only · Trint 30+
Cost$17–80/user/mo (subscription)
PrivacyStored in account by default
Best forTeams who want a hosted library of every interview ever recorded and don't mind a monthly seat fee per user.

Pricing and feature flags accurate as of 2026. Human Rev turnaround varies by queue depth and audio length.

Specific to interviews

Three things that bite people on generic transcription tools.

Interview audio is rarely clean. Flip these settings and the transcript holds up under quoting.

What goes wrong

  1. 1Cross-talk on a single channel. When your source gets emphatic and talks over your question, generic diarization merges both into one speaker block.
  2. 2Source names and places (Okafor, Tigray, Maranello) come back phonetic. Useless for fact-checking against a transcript.
  3. 3Off-the-record moments end up in the same transcript as quotable material — no way to mark a region as redacted.

What to flip here

  1. 1If your field recorder writes a two-channel WAV (one mic per track), upload that file directly. We detect per-channel and skip diarization entirely.
  2. 2Paste your prep notes — source names, organizations, place names — into Custom vocabulary on the job form. Recognizer treats them as known proper nouns.
  3. 3After the transcript lands, mark a region as off-record in the editor. It exports as `[REDACTED 14:22–15:08]` in DOCX and TXT, with the source audio deleted in 24 hours regardless.

Recommended job settings for interviews

Drop an interview file and these flip on by default. Override per-job from the form.

Diarization
Per-channel if stereo · acoustic else
Speaker model
Interview · 2–4 speakers
Language
Auto-detect · code-switch on
Filler words
Kept (verbatim mode)
Summary
Key quotes + topic index
Export
DOCX with timestamps · plain TXT · JSON

Accuracy · real-world numbers

96% on a good lav. Still readable on a cafe recording.

Interview accuracy is bounded by what the mic actually heard. Close-mic stereo on each speaker is the ceiling; a phone sitting on a noisy table is the floor. Numbers below come from production interview files, not synthetic benchmarks.

96%
Dual lavalier · studio quiet

One mic per speaker, separate channels (Zoom H5/H6, Tascam DR-40). Diarization is trivial — error is text-only.

94%
Handheld recorder on table

Single condenser between two speakers, quiet room. Acoustic diarization separates voices reliably under 4 ft.

90%
Phone voice memo · close

iPhone or Pixel voice memo on the table. Names and numbers occasionally miss; cadence is fine for quoting.

84%
Field recording · cafe or street

Espresso machines, traffic, third voices nearby. Worst case in our data — usable for navigation, verify quotes against audio.

Common questions

8 things people ask about interview transcription.

01Can I use these transcripts in a published article without verifying against the audio?+
For direct quotes — no, always verify against the audio. AI transcripts at 94% accuracy still misread one word in 17 on average, and the wrong word in a quote is a correction. The transcript is for navigation and drafting; the audio is the source of truth.
02My recorder saved a stereo WAV with one mic per speaker. What do I do?+
Upload that file directly — don't convert to mono first. We detect the two channels and route each to its own diarization track, which is the highest-accuracy path we have. Expect 96%+ on a quiet room.
03What about interviews recorded over a phone call?+
Phone audio is 8 kHz narrow-band, which caps accuracy around 88% even on a clean line. We still split the two parties using channel separation if your recorder app captured them separately (most do). VoIP calls over WhatsApp or Signal sound a bit better than PSTN.
04Can I redact off-the-record sections before sharing the transcript?+
Yes. In the editor, select the timestamp range and mark it `[REDACTED]`. The export replaces the text with a redaction marker but keeps the timestamps so the document still tracks the audio.
05Do you train models on my interview recordings?+
No. Source audio is deleted from our infrastructure within 24 hours of completion, and we don't use customer recordings for model training under any plan. The transcript text stays in your account until you delete it.
06Three or four people on a panel interview — does diarization still work?+
Up to about six distinct voices, yes, but accuracy on speaker assignment drops with each added person and gets worse when two speakers sound similar. Plan a 2–3 minute rename pass on the speaker chips after the transcript lands.
07Can you transcribe interviews in languages other than English?+
99 languages, auto-detected. Code-switching (English source slipping into Spanish mid-sentence) is handled in 12 language pairs. Accuracy varies by language — European languages match English; low-resource African and Central Asian languages run 5–10 points lower.
08I record on a Zoom call — should I use your Zoom page instead?+
Same engine, same result. The Zoom page covers cloud-recording specifics (per-participant audio, dial-in degradation). If you're conducting one interview at a time over Zoom, either path works — drop the MP4 here and the speaker labels come out the same.

Drop your interview recording. See what comes out.

30 free minutes every month. No card. Speaker labels, 99 languages, all exports included.

Start free