Twitter transcription.Spaces, videos, voice notes to text.

Drop the MP3 from a recorded Twitter Space — or a video, or a DM voice note. Get speaker labels, timestamps, and an SRT in 99 languages. No X Premium needed.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

↓ Watch what comes out

Space recording in. Labeled transcript out.

X exports a Space recording as a single mixed MP3 — every speaker on one channel. We use acoustic diarization tuned for 6-12 rotating mic holders, the usual Spaces shape.

X Space recording (MP3)REC 5 speakers · 1:14:22
auto-detected en-US44.1 kHz mono · 96 kbps
~90s
Transcript · streaming92% accuracy
S1

Welcome back everyone — we've got about 600 listeners now. Jess, you wanted to jump in on the Solana point?

S2

Yeah, so the throughput numbers from last week are misleading without context on the validator set.

S3

Can I push back on that? Because the mainnet beta data tells a different story.

S1

Go ahead, Mike — keep it tight, we've got two more speakers in the queue.

92% on Spaces MP3SRT · DOCX · TXT · JSON

↓ This is the dashboard

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

Try it on your own file — it's free

Three real options · honest comparison

X's own captions. Otter. Or us.

X added live closed captions to Spaces in 2023, but there's no transcript export. Otter requires you to mirror audio into a meeting. We take the MP3 you already downloaded from X and return a file.

Option 01

X live captions

Real-time captions inside the Spaces UI. Nothing to download, nothing to search.

RequiresLive attendance
Speaker labelsNo
LanguagesEN + a few others
ExportNone — captions only
Post-Space accessLost when Space ends
CostFree with X account
Best forListeners who need accessibility in the moment and don't care about a transcript after.
Option 02

Transcription.Solutions

Drop the Space MP3 or paste the Space URL. Speaker labels, SRT, summary — every plan.

RequiresMP3 download or Space URL
Speaker labelsAcoustic, 2-12 speakers
Languages99, auto-detected
ExportSRT · DOCX · TXT · JSON
AI summaryKey points + topic tags
Cost · per min$0.03
Best forHosts repurposing Spaces into blog posts, podcasts, or YouTube videos with burned-in captions.
Option 03

Otter / Fireflies

Calendar bots designed for Zoom. To capture a Space you have to route audio into a fake meeting.

RequiresAudio loopback rig
Speaker labelsOften collapses to one
LanguagesEN-tuned, others degrade
ExportTXT, DOCX (paid)
AI summaryPaid tier
Cost$17/user/mo
Best forPeople already paying for Otter who want a rough live capture and don't mind setup friction.

Pricing and feature flags accurate as of May 2026. X Spaces caption rollout still varies by region and account type.

Specific to X / Twitter

Four things generic transcribers miss on Spaces.

Spaces have a shape: mono mix, rotating mic, crypto and tech jargon, lots of @handles. Tune for that.

What goes wrong

  1. 1Mono-only export. X doesn't give you per-speaker channels like Zoom — everyone is on one track. Tools tuned for stereo meetings underperform.
  2. 2@handles and tickers (@balajis, $SOL, $ETH, gm, ngmi) get spelled phonetically. Generic AI thinks they're typos.
  3. 3Host intro music and stingers trip word detection and add gibberish at the front of the transcript.

What to flip here

  1. 1Pick the Spaces / panel speaker model on the job form. It's tuned for 4-12 mono speakers with rotating mic and tolerates voice merges better.
  2. 2Paste your guest list and ticker list into Custom vocabulary. We pass @handles, $TICKERS, and protocol names as hints to the recognizer.
  3. 3Turn on skip non-speech intro. We trim leading music and start the transcript at first detected voice — usually 20-40 seconds in.

Recommended job settings for X Spaces

Drop a Space MP3 and these flip on by default. Override per-job from the form.

Diarization
Acoustic · 4-12 speakers
Speaker model
Spaces / panel
Language
Auto-detect · multi-lingual on
Filler words
Kept (Spaces are conversational)
Summary
Key points + topic tags
Export
SRT · DOCX · timestamped TXT

Accuracy · real-world numbers

92% on clean Spaces. Lower when Bluetooth shows up.

X exports every Space as a single mixed mono MP3, so the ceiling depends on how each speaker connected. Wired mic in a quiet room is the best case. Bluetooth earbuds in a car is the worst. Numbers below come from actual Spaces files in production.

94%
2-3 speakers, studio mic

Small Space, hosts on USB or XLR mics. Diarization separates voices cleanly even in mono mix.

92%
4-8 speakers, mixed devices

Typical Space. Some on iPhone, some on laptop. Diarization holds; expect a 2-min cleanup pass on speaker chips.

87%
9-15 rotating speakers

Big Space with mic passed around. Acoustic model can merge similar voices when speakers swap quickly.

81%
Bluetooth or noisy line

AirPods in a coffee shop, AAC compression, wind. Text usable; numbers, names, and acronyms degrade first.

Common questions

8 things people ask about Twitter transcription.

01Can you transcribe a Space that's still live?+
Not in real time. We work from the recording. Wait for the Space to end, download the MP3 from your X dashboard (Spaces → Recorded → Download audio), then drop the file. Most Spaces are available for 30 days after.
02What about a Space that wasn't recorded?+
If the host didn't toggle recording on, X has no file and neither do we. Some third-party tools capture Spaces externally — if you have that MP3 or MP4, we'll take it.
03Can you pull from a Space URL directly?+
Yes, if the Space is still public on X and recording was enabled. Paste the URL on the job form. If X has expired or unlisted it, you'll need the downloaded MP3 instead.
04Do you handle X video posts and Vine-style clips too?+
Yes. Drop the MP4 or paste the post URL. Short clips under 30 seconds are charged at our 1-minute minimum. Longer videos transcribe at the standard $0.03/min.
05What about voice DMs?+
Voice notes from X DMs work — export the audio file from the conversation and drop it. They're usually 30-60 seconds and one speaker, so accuracy is high (94%+) and cost is the per-minute minimum.
06How do speaker labels work when 10 people are on mic?+
We assign generic labels (Speaker 1, Speaker 2…) acoustically. After the transcript loads, you rename them once — usually a 2-3 minute pass against the Space's guest list. Renames apply throughout the file.
07Does the AI summary catch crypto / Web3 terminology?+
Mostly yes — protocol names, L1/L2, common tickers ($BTC, $ETH, $SOL) and slang (gm, wagmi) are in our vocabulary. For obscure projects or new launches, add them to Custom vocabulary before processing.
08Can I get burned-in captions for repurposing a Space as a YouTube video?+
We return SRT or VTT, which you import into your editor (Descript, Premiere, CapCut, DaVinci). We don't render burned-in MP4 ourselves — the SRT is the bridge to whatever video tool you already use.

Drop your Space MP3. See what comes out.

30 free minutes every month. No card. Speaker labels, 99 languages, SRT and DOCX included.

Start free