TikTok transcription.Paste a link, get captions.

Drop a TikTok video URL. We pull the audio server-side and return timestamped text plus SRT and VTT caption files — ready to re-upload or burn in.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

↓ Watch what comes out

Public URL in. Captions out.

Paste any public TikTok video link. We fetch the audio track, run language detection, and stream back captions while background music keeps playing under the voice.

TikTok video URLREC 1 voice · 0:47 · vertical 9:16
auto-detected en-US44.1 kHz · music bed -18 dB
~90s
Captions · streaming94% accuracy
S1

Okay so the secret to crispy tofu nobody tells you — press it for ten minutes, not two.

S1

Then cornstarch, not flour. Toss it, don't dust it.

S1

Air fryer at 400 for twelve minutes, flip halfway.

S1

Comment 'tofu' and I'll send the full sauce recipe.

94% on creator voice-overSRT · VTT · TXT · DOCX · JSON

↓ This is the dashboard

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

Try it on your own file — it's free

Three real options · honest comparison

TikTok auto-captions. CapCut or Submagic. Or us.

TikTok ships auto-captions in the editor. CapCut and Submagic add styled, animated captions for re-upload. We give you the raw transcript plus clean SRT/VTT — bring your own editor.

Option 01

TikTok auto-captions

Built into the TikTok editor. Toggle on, captions appear. No file you can take elsewhere.

RequiresUpload through TikTok app
Language coverage~40 languages, EN strongest
ExportNone — burned in only
Edit before publishIn-app text editor
Music handlingMisses lyrics, garbles voice over loud beds
CostFree
Best forCreators who only need captions inside TikTok and never repost to Reels or Shorts.
Option 02

Transcription.Solutions

Paste the public URL. Get a transcript file plus SRT/VTT you can drop into any editor or re-upload anywhere.

RequiresPublic TikTok URL — no login
Language coverage100+ with auto-detect
ExportSRT · VTT · DOCX · TXT · JSON
Edit before publishWeb editor, then re-export
Music handlingVoice isolation on noisy beds
Cost · per min$0.03
Best forCreators cross-posting to Reels/Shorts/YouTube, agencies repurposing client TikToks, researchers archiving trends.
Option 03

CapCut / Submagic

Styled, animated captions tuned for short-form. Locked to their editor, English-first.

RequiresApp install + paid for export
Language coverage~20 strong, others spotty
ExportMP4 with burn-in, SRT on paid
Edit before publishInside their timeline only
Music handlingEN-tuned, drops on accented voice
Cost$10–24/mo (approximate, 2026)
Best forSolo creators who want animated word-pop captions and never leave the CapCut/Submagic editor.

Pricing approximate as of May 2026. Language counts based on each vendor's published support pages.

Specific to TikTok

Three things that bite people on generic transcription tools.

TikTok audio isn't podcast audio. These are the differences worth flipping before you queue the job.

What goes wrong

  1. 1Background music gets transcribed as speech. Generic ASR hears lyrics and writes them out alongside the voice — your caption file becomes unusable.
  2. 2Creator slang and handles (@username, 'rizz', 'fanum tax', product names) come back phonetically misspelled or auto-corrected to the wrong word.
  3. 3Fast hooks — the first three seconds where creators stack 15 words to beat the swipe — get clipped or compressed because the ASR is still warming up.

What to flip here

  1. 1Turn on Voice isolation on the job form. We separate the voice stem from the music before transcribing, so trending audio doesn't pollute the captions.
  2. 2Paste handles, brand names, and creator-specific vocab into Custom vocabulary. We pass it as a recognizer hint — case and spelling come back correct.
  3. 3Set the Caption format to short-form (max 3 words per line, 1.2 sec per cue). The SRT comes out pre-formatted for vertical video without manual line breaks.

Recommended job settings for TikTok

Paste a TikTok URL and these flip on by default. Override per-job from the form.

Source
Public URL · audio extracted server-side
Voice isolation
On (music bed suppressed)
Language
Auto-detect · 100+ supported
Caption format
Short-form · 3 words/line · 1.2s cues
Filler words
Kept (creators rely on them)
Export
SRT · VTT · TXT · DOCX

Accuracy · real-world numbers

94% on clean voice-over. Music-heavy clips drop predictably.

The ceiling is set by how loud the music bed is and how fast the creator talks. Voice-over recorded separately and dropped over a quiet bed is the best case; lip-sync trends and duets are the worst. Numbers below come from real TikTok URLs run through our pipeline.

94%
Voice-over · quiet music bed

Creator recorded on mic, music sits 15-20 dB below voice. Talking-head educational and recipe content lands here.

91%
On-camera · phone mic · no music

Selfie-style talking head, no backing track. Phone mic and room reverb cost a few points versus voice-over.

85%
Loud trending audio under voice

Voice and music within 6 dB. Fast hooks and brand names take hits — expect a 1-minute clean-up pass.

78%
Duets, stitches, lip-sync clips

Two audio tracks overlapping or song lyrics being mouthed. We transcribe what's spoken; song lyrics are flagged, not retyped.

Common questions

8 things people ask about TikTok transcription.

01Do I need to download the TikTok first?+
No. Paste the public video URL (the share link from the TikTok app) and we extract the audio server-side. If the video is private or region-blocked, you'll need to download the MP4 yourself and upload it — we can't bypass TikTok's access rules.
02Will you transcribe the song lyrics or just the creator's voice?+
Just the spoken voice. Voice isolation suppresses the music bed before transcription, and trending-audio lyrics get flagged in the JSON output rather than written into the caption track. You can flip isolation off if you specifically want lyrics.
03Can I get an SRT formatted for vertical short-form video?+
Yes. The short-form caption preset breaks cues at roughly 3 words per line and 1.2 seconds per cue — the rhythm that fits the 9:16 safe zone without overlapping UI. Standard SRT (one sentence per cue) is also available.
04What about duets and stitches with two voices?+
Acoustic diarization separates the two voices and labels them Speaker 1 and Speaker 2. Accuracy drops 5-10 points when the audio tracks overlap heavily — that's the worst case in our data.
05Does it handle non-English creators?+
Yes — 100+ languages with auto-detect. Spanish, Portuguese, Indonesian, Vietnamese, and Arabic creators come back at roughly the same accuracy band as English. Code-switching (mixing two languages mid-sentence) is detected and labeled per segment.
06How long until the transcript is ready?+
Under five minutes for a standard 30-90 second TikTok, usually under two. Longer-form TikToks (3-10 minutes) finish in roughly 1/10 of real-time.
07Can I bulk-process a creator's whole feed?+
Yes, via the API or by pasting a list of URLs into the dashboard. We rate-limit the URL fetcher politely so TikTok doesn't block us — expect ~30 videos in the first batch, then steady throughput from there.
08Is this allowed under TikTok's terms?+
We only fetch public videos via their public share endpoints — the same way a browser preview does. We don't bypass private accounts or login walls. If you're transcribing someone else's content for commercial use, fair-use and platform rules are on you to check.

Paste a TikTok URL. See what comes out.

30 free minutes every month. No card. SRT, VTT, 100+ languages, all exports included.

Start free