Start free

Blog · · 6 min read

Voice memo transcription: iPhone .m4a to text in 90 seconds

How to transcribe iPhone Voice Memos accurately. The default 64 kbps AAC format, the Share Sheet workflow, and what dictation apps can't do.

Voice memo transcription: iPhone .m4a to searchable text

Voice memo transcription on iPhone takes about 90 seconds end-to-end: tap Share in the Voice Memos app, pick the upload destination, and the .m4a file is transcribed at roughly 6× realtime — a 10-minute memo completes in under 2 minutes. Accuracy lands around ~92% on clear single-speaker dictation recorded in a quiet room, which is what iPhone Voice Memos is built for. Noisy rooms and the default 64 kbps AAC compression are the two things that drag accuracy down.

What format does iPhone Voice Memos record?

Apple's Voice Memos app records to .m4a — an MP4 container holding AAC audio at 64 kbps mono, 44.1 kHz. That's the default. If you toggle Lossless in Settings → Voice Memos → Audio Quality, it switches to ALAC (Apple Lossless) at the same sample rate, producing files roughly 10× larger.

For transcription purposes, the default 64 kbps AAC is borderline. On clear dictation — phone held 15-30 cm from your mouth, quiet room, one speaker — it transcribes at ~92% accuracy, the same plateau as a 128 kbps podcast MP3. The format isn't the bottleneck there; the room is.

Where 64 kbps starts to show its seams: meetings recorded across a table, voice memos captured in a car or café, anything with two people talking. AAC compression smooths out low-level detail, which is exactly the detail a transcription model needs to separate speech from background noise.

The 90-second workflow

The fastest path from recording to searchable text, on iPhone alone:

  1. Open Voice Memos. Long-press the recording you want to transcribe.
  2. Tap Share. The iOS Share Sheet opens.
  3. Pick your browser, paste the Transcription.Solutions upload page, and drop the file. Or use any "Open in..." app that accepts .m4a.
  4. Wait 1-2 minutes for a 10-minute memo. The transcript appears with timestamps, plus a searchable text export in DOCX, SRT, VTT, TXT, or JSON.

If you'd rather work on a laptop — which most people editing long-form audio prefer — AirDrop the .m4a from your iPhone to your Mac. The file lands in Downloads. Drag it into the browser dropzone. Same result, slightly more screen real estate for cleaning up the transcript.

A third option: enable iCloud sync in Voice Memos and the recording shows up on your Mac inside the desktop Voice Memos app within a minute. Right-click → Show in Finder → drag to browser.

Try it on your audio

Start free →

30 minutes a month, no card.

How accurate is voice memo transcription?

For a single speaker dictating in a quiet room at 64 kbps AAC, accuracy lands around ~92% — roughly 1 word in 12 will need a fix. That's good enough for note-taking, draft blog posts, journaling, captured-thought-to-text workflows where you read the transcript anyway.

Here's how the same model performs across realistic voice memo scenarios:

ScenarioApproximate accuracy
Quiet room, phone close, single speaker~92%
Car interior, hands-free~85-88%
Café or restaurant background~80-85%
Two people across a table~85% (plus diarization needed)
Conference room speakerphone~82-88%

The format matters less than people assume. A 64 kbps AAC of a quiet dictation will transcribe better than a 320 kbps MP3 of a noisy café. Bitrate doesn't fix room acoustics — only the microphone and the room do.

When voice memos have two people

The Voice Memos app records mono. If you put the iPhone between two people on a table, both voices land in the same channel. Speaker diarization — the model that separates "who said what" — runs automatically on mono files using pyannote 3.1, and labels turns as speaker_0, speaker_1, and so on. You rename them in the dashboard by clicking the speaker chip.

Diarization on mono table recordings is the hardest case for any transcription system. Two voices similar in pitch, both in the same channel, with overlapping turns — expect occasional speaker swaps. For interviews where attribution matters, two phones recording separately and uploaded as two files gives cleaner results than one phone in the middle.

Why iPhone's built-in dictation isn't a substitute

iOS has Live Transcription in the Voice Memos app (iOS 18+) and system-wide dictation. They're convenient for short notes. They are not a substitute for a transcription tool once your recording is longer than about 5 minutes or has any of the following:

  • More than one speaker. Apple's transcription is single-voice; there's no speaker label, no way to filter to one person's turns.
  • Need for exports. Apple's transcript stays inside the Voice Memos app. There's no DOCX, no SRT, no JSON, no API. Copy-paste is the only export.
  • Edit history. If you fix a misheard word, there's no record. With a dashboard-based transcript, edits are timestamped and reversible.
  • Search across recordings. You can't search across all your voice memos at once — only inside one transcript at a time.
  • Subtitles for video. Voice memos are audio, but the same recordings sometimes get paired with video. iOS dictation doesn't produce SRT or VTT files.

Apple's dictation is built for capture. A transcription service is built for working with what you captured. Different tools.

Privacy: what happens to your voice memo after upload

Source audio is permanently deleted from our infrastructure within 24 hours of the job completing. Transcripts stay in your account until you delete them. We do not train models on your data. For sensitive recordings — interviews under embargo, legal notes, medical dictation — this matters more than the accuracy number.

FAQ

Can I transcribe an iPhone voice memo for free?

Yes — the free tier covers 30 minutes per month with no credit card. Most single voice memos fit well inside that. If you record one daily standup or one interview a week, paste them into the browser dropzone and you'll likely never hit the limit. Files up to 100 MB and 30 minutes long are accepted on free; longer recordings need the Pro plan.

What's the maximum length voice memo I can transcribe?

On the Pro plan ($19/month), individual files up to 10 hours and 2 GB are accepted, which covers any realistic Voice Memos recording. The Voice Memos app itself has no hard length cap but iPhone storage will run out long before our limit does — a 10-hour 64 kbps AAC file is around 280 MB.

Should I switch Voice Memos to Lossless before recording?

Only if you're also using the recording for audio production. For transcription, the lossless setting bumps file sizes ~10× and doesn't meaningfully improve accuracy on speech. Room acoustics and microphone position have far more impact than the compression setting. Save Lossless for when you'll be editing the audio itself — interview podcasts, music notes, ambient capture.

Can I transcribe a voice memo without sending it to a server?

Not through our service — we run cloud ASR. If on-device is a hard requirement (because of policy or because you're offline), Apple's built-in Voice Memos transcription works locally on iOS 18+. The trade-off is everything in the dictation section above: no diarization, no exports, no API, no edit history. For most uses cases cloud transcription is the better fit; for some, local is the only option.

Why is the transcript wrong on a voice memo I recorded in a café?

Background noise is the main reason transcription accuracy drops on voice memos. Café audio at 64 kbps AAC typically lands around 80-85% — every clinking cup or background conversation forces the model to guess. If you record in noisy environments often, an external lavalier mic plugged into the iPhone via Lightning or USB-C makes a bigger difference than any settings change.

Does AirDrop change the audio file?

No. AirDrop transfers the .m4a bit-for-bit. The file you receive on your Mac is the same file the iPhone recorded — same bitrate, same duration, same metadata. Some sharing methods (Messages, WhatsApp) re-encode audio to lower quality; AirDrop, iCloud Drive, and email attachments preserve the original.

Can I send voice memos to transcription automatically?

Yes — via the REST API. Build an iOS Shortcut that takes a Voice Memos file and POSTs it to our endpoint with your API key. The transcript comes back via webhook. A handful of users run this as a "long-press → Transcribe" shortcut so the workflow is one tap.

Related reading