Transcribe long audio files.Up to 10 hours. No timeout.

Drop a long audio file — up to 10 hours, 5 GB on Business. We chunk in parallel, keep speaker IDs consistent end-to-end, and hand back one transcript instead of a numbered folder.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

↓ A 5-hour file, mid-transcript

Hours in. One clean file out.

Most tools time out around the 90-minute mark or split your long recording into numbered partials you have to stitch. We chunk in 12-minute overlapping windows, process them in parallel, and reassemble with a global speaker pass.

Board strategy sessionREC 3 speakers · 5:14:22 · 3.1 GB
auto-detected en-GB44.1 kHz stereo · 192 kbps
~90s
Transcript · single file92% accuracy · t=3:14:08
S1

We're three hours in — let's circle back to the supply chain point from the morning session.

S2

Right, the Vietnam manufacturing pivot. I think we glossed over the lead-time risk.

S1

Lead times went from 14 to 31 days after the tariff change.

S3

And that's before we factor in port congestion at Long Beach.

92% across full 5h fileDOCX · SRT · TXT · JSON

↓ This is the dashboard

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

Try it on your own file — it's free

Three real options · honest comparison

Otter Pro. DIY Whisper chunking. Or us.

Consumer tools cap file length and silently truncate. Whisper API has a 25 MB per-request ceiling, so you build the chunker yourself. We accept the whole 10-hour file and return one transcript.

Option 01

Otter Pro

Caps long files at 4 hours per recording. Speaker labels drift past the 2-hour mark.

Max file length4 hours (Pro tier)
Max file size~1.5 GB upload
Speaker IDs end-to-endDrifts past 2 hours
Long-file outputSingle doc, truncated at cap
Cost$16.99/user/mo
Resumable uploadNo
Best forShort meetings under 2 hours. Falls over on day-long recordings.
Option 02

Transcription.Solutions

10 hours per file. Parallel chunking, global speaker pass, one DOCX out.

Max file length10 hours (Pro & Business)
Max file size2 GB Pro · 5 GB Business
Speaker IDs end-to-endGlobal embedding pass
Long-file outputSingle file · DOCX/SRT/TXT
Cost · per min$0.03 flat regardless of length
Resumable uploadMultipart, survives drops
Best forDay-long workshops, depositions, board meetings, oral histories — anything past the 90-minute wall.
Option 03

Whisper API + DIY chunking

Cheapest per minute. You build the chunker, the speaker stitch, and the retry logic.

Max file length25 MB per request (~25 min)
Max file size25 MB hard cap
Speaker IDs end-to-endNone — no diarization
Long-file outputNumbered partials, you stitch
Cost · per min$0.006 (OpenAI Whisper)
Engineering timeHours to days per pipeline
Best forEngineers who want raw text per chunk and don't need speakers, summaries, or a single output.

Pricing and limits accurate as of May 2026. Otter Pro length cap last verified on their public pricing page.

Specific to long files

Three ways generic tools die past the 90-minute mark.

Most pipelines were built for one-hour meetings. Long audio breaks them in predictable ways — here's what we do differently.

What goes wrong

  1. 1Silent timeout at 90 minutes. The job spins for an hour, then dies without a useful error. You're left with nothing to retry.
  2. 2Speaker IDs drift between chunks. Speaker 1 at hour 1 becomes Speaker 4 at hour 3 because each chunk gets diarized in isolation.
  3. 3Output is a numbered folder. `transcript_part_01.txt` through `transcript_part_24.txt` with timestamp resets at every chunk boundary. You stitch it yourself.

What to flip here

  1. 1Resumable multipart upload. Connection drops at hour 2 of upload? Resumes from the last completed part. No re-upload of 4 GB.
  2. 2Global speaker embedding pass. After per-chunk diarization, we cluster voices across the entire file so Speaker 3 is the same person at minute 12 and minute 487.
  3. 3Single DOCX with hour markers. One file, continuous timestamps, optional chapter break every 60 minutes. No stitching.

Recommended job settings for long files

Drop anything over 90 minutes and these flip on automatically. Override per-job from the form.

Chunk strategy
12 min windows · 10s overlap
Diarization
Global pass across all chunks
Speaker model
Long-form · 2-20 speakers
Upload
Resumable multipart
Queue
Priority (Business plan)
Export
Single DOCX · hour markers on

Accuracy · real-world numbers

92% holds across a 5-hour file. Quality stays flat hour-to-hour.

The hard part with long audio isn't the model — it's keeping accuracy flat from minute 1 to minute 600. Speaker drift and chunk-boundary errors are what kill most pipelines. Numbers below are measured across full-length customer files, not the first 10 minutes.

95%
Studio long-form, single speaker

Audiobook narration, solo podcast, dictated manuscript. 6-10 hours of clean voice with no room noise. No diarization needed.

92%
Boardroom, 2-6 speakers

Conference table, decent mic, 3-5 hours. Global speaker pass keeps IDs stable across the whole file.

88%
All-day workshop, lapel mics

7-9 hour training day with mic handoffs and audience Q&A. Names need a 5-minute pass on the speaker chips.

82%
Field roundtable, 8+ speakers

Long oral history, focus group, or panel with overlapping voices and ambient noise. Usable, but expect cleanup.

Common questions

8 things people ask about long audio transcription.

01What's the actual file length and size limit?+
10 hours per file on both Pro and Business. Pro caps file size at 2 GB, Business at 5 GB. If you have something longer than 10 hours, split it once at a natural break — we'll keep speaker IDs consistent if you upload them back-to-back on the same project.
02Do I get one transcript or a folder of numbered partials?+
One file. Always. DOCX, SRT, TXT, or JSON — your choice. Timestamps run continuously from 00:00:00 to the end of the recording, not reset at every chunk boundary.
03How long does a 6-hour file take to come back?+
Roughly 18-25 minutes on the Pro queue, 8-12 on Business priority. We process the 12-minute chunks in parallel, so wall-clock time scales sub-linearly with file length, not minute-for-minute.
04Do speaker IDs stay consistent end-to-end?+
Yes. After per-chunk diarization, a global embedding pass clusters voices across the whole file. Speaker 3 at minute 12 is the same Speaker 3 at minute 487. This is the main thing DIY Whisper pipelines get wrong.
05What happens if my upload drops at hour 3 of a 4 GB file?+
Resumable multipart upload picks back up from the last completed part. You don't re-upload the first 3 GB. Works on flaky hotel Wi-Fi and cellular tethering — we tested both.
06Why does the Whisper API choke on long files?+
OpenAI's Whisper endpoint has a 25 MB per-request hard cap — roughly 25 minutes of compressed audio. Anything longer needs you to chunk, transcribe in parallel, then stitch transcripts and align speakers yourself. We do all of that server-side.
07Is the per-minute price the same on a 10-hour file as a 10-minute file?+
Yes. $0.03 per minute flat, regardless of length. A 10-hour file costs $18. We don't surcharge long files the way Rev does ($1.50/min human × 10 hours = $900).
08Can I get chapter markers or timestamps every hour?+
Toggle 'Hour markers' on the job form and the DOCX exports with a heading break every 60 minutes. SRT keeps continuous timecode. JSON has both — chapter array plus word-level timestamps.

Drop your long file. Get one transcript back.

30 free minutes every month. No card. Files up to 10 hours, speaker labels that stay consistent, single-file export.

Start free