Transcribe WAV files with speaker labels.Lossless quality.

Drop a WAV recording straight from your field rig, DAW bounce, or interview kit. We keep the 24-bit headroom intact, run diarization on the raw PCM, and return a timestamped transcript with SRT in minutes.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

↓ Watch what comes out

Raw PCM in. Clean transcript out.

Lossless WAV means every sibilant, plosive, and quiet word survives intact — no MP3 smear on consonants. If the file is multi-track (one speaker per channel), we skip acoustic diarization entirely and split on the channel layout.

WAV · 48 kHz / 24-bitREC 2 tracks · 1h 12m · 743 MB
auto-detected en-GBstereo PCM · uncompressed
~90s
Transcript · streaming97% accuracy
S1

Take me back to that morning in seventy-eight — what time did the call come in?

S2

Quarter to five, give or take. Kettle was on, I remember that much.

S1

And from there you drove straight down to the harbour?

S2

Straight to the boatyard. Lights were still on when I pulled in.

97% on per-track WAVSRT · DOCX · TXT · JSON

↓ This is the dashboard

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

Try it on your own file — it's free

Three real options · honest comparison

Adobe Audition. Descript. Or us.

Audition's Speech to Text is bundled with Creative Cloud and stays inside the timeline. Descript imports the WAV into its own editor. We take the file as-is, return standard exports, and don't ask you to move your project anywhere.

Option 01

Adobe Audition / Premiere

Transcript panel inside the Adobe timeline. Tied to Creative Cloud and the project file.

RequiresCreative Cloud subscription
Speaker diarizationYes, mixed-down only
Multi-track WAVFlattened before STT
ExportSRT · CSV · XML
Languages18, manual select
Cost~$23/mo (single app)
Best forEditors already cutting in Premiere or Audition who want captions stitched to the timeline.
Option 02

Transcription.Solutions

Drop the WAV. Per-channel diarization if it's multi-track. Source deleted in 24h.

RequiresNothing ��� just the file
Speaker diarizationPer-track or acoustic
Multi-track WAVUp to 16 channels
ExportSRT · VTT · DOCX · TXT · JSON
Languages99, auto-detected
Cost · per min$0.03
Best forAnyone holding a raw WAV — field recordists, podcasters bouncing from a DAW, oral history archivists, researchers.
Option 03

Descript

Imports your WAV into Descript's editor. Powerful, but you have to work inside it.

RequiresDescript account + import
Speaker diarizationAcoustic, EN-tuned
Multi-track WAVImport as separate clips
ExportTXT · SRT · DOCX
Languages23, accuracy varies
Cost$16–24/user/mo
Best forPodcast editors who want to edit the audio by editing the transcript — Descript's actual superpower.

Pricing accurate as of 2026. Adobe and Descript feature flags change frequently; check current docs before committing.

Specific to WAV

Three things that bite people on generic transcription tools.

Most uploaders silently downsample your WAV before sending it to a recognizer. We don't.

What goes wrong

  1. 1Multi-track WAV gets flattened. A 4-channel field recording from a Sound Devices MixPre gets mixed to mono before STT. The per-mic separation you paid for is thrown away.
  2. 232-bit float WAVs from Zoom F-series or MixPre get rejected outright, or clipped to 16-bit and lose their headroom recovery.
  3. 396 kHz / 24-bit interviews take forever to upload because the tool re-encodes to MP3 in the browser before sending.

What to flip here

  1. 1Upload the multi-track WAV as-is (up to 16 channels). We read the channel layout from the WAV header and assign one speaker per track — no acoustic guessing.
  2. 232-bit float is accepted natively. We preserve the float headroom when normalising for the recognizer, so peaks above 0 dBFS don't clip.
  3. 3Direct binary upload, no transcode in the browser. A 2 GB WAV moves at your full bandwidth and starts processing the moment the last byte lands.

Recommended job settings for WAV

Drop a WAV and these flip on by default. Override per-job from the form.

Sample rate
Native (no downsample)
Bit depth
24-bit / 32-float preserved
Diarization
Per-channel if multi-track
Speaker model
Interview · 2-8 speakers
Filler words
Kept (toggle off if needed)
Export
DOCX · SRT · timestamped TXT

Accuracy · real-world numbers

97%+ on per-track WAV. WAV gives the recognizer the cleanest possible signal.

Because WAV stores raw PCM with no perceptual compression, consonants and sibilants aren't smeared the way MP3 smears them. The recognizer hears what the microphone heard. Numbers below come from real customer WAV jobs in production.

98%
Studio WAV · single speaker

48 kHz / 24-bit, large-diaphragm condenser, treated room. Narration, audiobook, voice-over bookings land here.

96%
Multi-track interview WAV

One channel per speaker (lavs or boundary mics). Diarization is just channel routing — text-only error.

92%
Handheld field recorder

Zoom H5, Tascam DR-40, similar. Stereo XY pickup, 2-3 speakers, some room reflection. Most podcast WAVs land here.

85%
Noisy environment field WAV

Outdoor, café, vehicle. Lossless capture helps — the noise is real, not codec artefact — but accuracy still drops on overlapping speech.

Common questions

8 things people ask about WAV transcription.

01What's the maximum WAV file size?+
5 GB per file on the standard plan, which is roughly 8 hours of stereo 48 kHz / 24-bit, or 2.5 hours of 96 kHz / 24-bit. Larger files are fine on the team plan — just contact us before the upload.
02Do you support 32-bit float WAV from Zoom F-series or MixPre?+
Yes, natively. We read the float samples without clipping at 0 dBFS, so loud transients you'd planned to pull down in post still get transcribed cleanly. Most generic uploaders silently down-cast to 16-bit first.
03I have a 4-channel WAV from a field recorder — one mic per person. Will diarization use that?+
It will. Upload the polyphonic WAV directly (don't bounce to stereo first). We parse the channel layout from the WAV header and assign one speaker per track — much more reliable than acoustic diarization on similar voices.
04Will you downsample my 96 kHz WAV?+
The recognizer runs at 16 kHz internally — that's the ceiling of human speech intelligibility. But we keep your original file untouched and use it for any post-processing like noise gating. Your exports reference the original timeline.
05Is WAV actually more accurate than MP3 for transcription?+
Marginally, yes — usually 1-2 points of WER on clean speech. The bigger gap shows up on sibilants and quiet passages, where MP3's psychoacoustic compression discards information the recognizer would have used. For archival or forensic work, WAV is the right call.
06Are BWF metadata and timecode preserved?+
We read BWF chunks (bext, iXML) and use the start timecode to align the transcript to your session timeline. The original WAV is never modified — we work on a copy that's deleted within 24h.
07Can I drop a folder of WAV files from a DAW session export?+
Yes. Batch upload accepts up to 50 files at once. Each WAV gets its own job and transcript. If they're stems from one session, you can also merge them into a single multi-track WAV before upload and we'll diarize per channel.
08How long does a 1-hour stereo WAV actually take?+
Upload is the slowest part — a 1-hour 48 kHz / 24-bit stereo WAV is about 600 MB and takes 2-5 minutes on typical broadband. Once uploaded, transcription itself runs in roughly 4-6 minutes on the standard queue.

Drop your WAV. Keep the lossless quality. See what comes out.

30 free minutes every month. No card. Per-track diarization, 32-bit float supported, source audio deleted in 24h.

Start free