YouTube auto-captions
Free, baked into Studio. Stuck on YouTube, English-leaning, no real export.
Paste a Shorts URL or drop the MP4. Get an SRT, VTT and clean text back in seconds — ready to repurpose the Shorts clip to Reels, TikTok or a blog post.
MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously
YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more
↓ Watch what comes out
We pull the audio from the Shorts video server-side, strip the music bed, and return timestamped text plus a frame-accurate SRT. No browser extension, no OBS capture, no scraping yourself.
Three iPhone settings nobody told you about — number one is hidden in Accessibility.
Go to Settings, Accessibility, Touch, then scroll down to Back Tap.
Set double-tap to screenshot. Now you can screenshot with one hand.
Save this before it gets buried in your feed.
↓ This is the dashboard
Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.
Sample preview from a founder interview about post-call workflow. Real transcripts look exactly like this — same tabs, same summary block, same key-points / action-items split, same auto-tag chips.
Three real options · honest comparison
YouTube generates captions for free inside Studio. SubMagic and Submagic-likes (CapCut, Veed) burn animated captions onto the video. We give you the raw transcript and clean subtitle files to take anywhere.
Free, baked into Studio. Stuck on YouTube, English-leaning, no real export.
Paste any public Shorts URL. Get clean SRT, VTT and text — yours to use anywhere.
Burned-in animated captions. Looks great on-screen, but the text lives inside the pixels.
Pricing and feature flags approximate as of 2026. YouTube caption language support varies by region.
Specific to Shorts
Shorts aren't tiny podcasts. The music bed, the speed, and the hashtag-heavy script all break tools that were built for meetings.
Paste a Shorts URL and these flip on by default. Override per-job from the form.
Accuracy · real-world numbers
Shorts are short, so a single bad word is visible. Vocal isolation against the music track is what we tune for. Numbers below are from real Shorts URLs we've processed, not synthetic clips.
Recorded into a mic, music added in post but mixed low. Cleanest case — error mostly on proper nouns and slang.
Phone or DSLR, music ducked under voice. Vocal isolation lifts the dialogue cleanly. Most Shorts land here.
Music sits at the same level as the voice. Words clip on hard consonants and on lyrics that overlap dialogue.
Wind, traffic, ambient crowd. Usable text but expect a 30-second cleanup pass on numbers, names and brand mentions.
Common questions
30 free minutes every month — dozens of Shorts. No card. SRT, VTT and word-level JSON included on every plan.
Start free