Stay on Otter.ai
Works for English-only Zoom calls under 90 minutes if you upload fewer than 25 files in your account's lifetime.
Otter is engineered around English-only Zoom calls under 90 minutes. The moment your audio is in another language, longer than 90 minutes, or sitting in your downloads folder as a file — you start hitting walls. We don't have those walls. Drop a file, paste any URL, transcribe up to 10 hours in any of 99 languages on a single Pro plan.
MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously
YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more
Three real options for converting audio to text
Each of the three approaches below is a legitimate way to get text from audio. The middle card is what most teams who'd otherwise buy Otter actually need.
Works for English-only Zoom calls under 90 minutes if you upload fewer than 25 files in your account's lifetime.
Built around files and URLs first. 99 languages, no caps, 7 export formats, 24-hour source deletion. Free tier opens with 30 min/mo and no card.
Top-end quality on hard audio, but ~100× the per-minute cost of AI transcription. Use for legal certifications, not daily workflow.
Sources: Otter.ai pricing page (May 2026), Rev.com / Trint published rates, Transcription.Solutions plans config. Re-verified before publish.
Three things people believe that don't survive a real workflow
“"My language is in their list of 6 — I'm fine."”
Otter's non-English coverage trails its English flagship by a noticeable margin. The English model is the one they iterate on; the others are downstream. We treat every supported language as a first-class target — same per-minute price, same speaker labelling, same export formats. On Mandarin and Portuguese specifically, recurring G2 / Capterra complaints flag Otter as needing a manual review pass.
“"I rarely upload files — I just transcribe meetings."”
Until you have a recording from a different platform (Webex, GoTo, Slack Huddle), a colleague's mobile recording, a podcast guest's local track, or a recording made before you adopted Otter. The 25-file lifetime cap on Otter Pro isn't a per-month rolling cap — it's a permanent ceiling on your account. We don't have one.
“"I'll handle long meetings by recording locally and uploading."”
Otter caps single files at 90 minutes on Pro and 4 hours on Business. A 5-hour deposition, an 8-hour content sprint recording, a long-form podcast — every one of these requires splitting before upload, then stitching transcripts back together. Our cap on Pro is 10 hours single-file, same on Business. Symmetric. No splitting.
Accuracy · real-world numbers
On clean English podcast audio every modern ASR plateaus around ~92%. The differentiation lives outside English.
Spanish, Portuguese, French, German, Mandarin, Japanese, Russian, Italian podcast / interview audio at 128 kbps+ lands in the same ~92% range we hit on English. Otter supports 6 languages total — and recurring user complaints flag noticeably weaker performance on Mandarin and Portuguese.
Diarization is the hard part of meetings — and for stereo recordings with one speaker per channel (which we ingest directly), the math is exact: left becomes speaker_0, right becomes speaker_1. Otter doesn't expose channel-aware ingest.
Every cloud ASR drops on telephony — high-frequency content that distinguishes f / s / th / sh is gone in the bandwidth. Industry-wide ceiling. We're not magic on phone audio either, but recording at 16 kHz instead of 8 kHz when possible recovers 6–8 accuracy points.
Common questions
30 minutes a month, no card. Drop a Portuguese interview, a 90-minute podcast, a YouTube link, a stereo recording — exactly the shapes where Otter's caps kick in.
Start free