YouTube Shorts transcription.60 seconds of video, 10 seconds to transcript.

Paste a Shorts URL or drop the MP4. Get an SRT, VTT and clean text back in seconds — ready to repurpose the Shorts clip to Reels, TikTok or a blog post.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

↓ Watch what comes out

Shorts URL in. Subtitles out.

We pull the audio from the Shorts video server-side, strip the music bed, and return timestamped text plus a frame-accurate SRT. No browser extension, no OBS capture, no scraping yourself.

youtube.com/shorts/aB3kQ…REC 1 speaker · 00:58
auto-detected en-US44.1 kHz · vocal track isolated
~90s
Transcript · streaming94% accuracy
S1

Three iPhone settings nobody told you about — number one is hidden in Accessibility.

S1

Go to Settings, Accessibility, Touch, then scroll down to Back Tap.

S1

Set double-tap to screenshot. Now you can screenshot with one hand.

S1

Save this before it gets buried in your feed.

94% on talking-head ShortsSRT · VTT · DOCX · TXT · JSON

↓ This is the dashboard

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

Try it on your own file — it's free

Three real options · honest comparison

YouTube auto-captions. SubMagic. Or us.

YouTube generates captions for free inside Studio. SubMagic and Submagic-likes (CapCut, Veed) burn animated captions onto the video. We give you the raw transcript and clean subtitle files to take anywhere.

Option 01

YouTube auto-captions

Free, baked into Studio. Stuck on YouTube, English-leaning, no real export.

RequiresOwn the Shorts channel
Speaker labelsNone
Languages~13 reliable
ExportSBV / SRT in Studio
Music handlingOften inserts [Music]
CostFree
Best forCreators who only need captions on YouTube itself and don't repurpose the clip elsewhere.
Option 02

Transcription.Solutions

Paste any public Shorts URL. Get clean SRT, VTT and text — yours to use anywhere.

RequiresPublic URL or MP4
Speaker labelsDiarization included
Languages99, auto-detected
ExportSRT · VTT · DOCX · TXT · JSON
Music handlingVocal isolation on by default
Cost · per min$0.03
Best forCreators repurposing Shorts to TikTok and Reels, agencies running someone else's channel, anyone who wants the text outside Studio.
Option 03

SubMagic / CapCut

Burned-in animated captions. Looks great on-screen, but the text lives inside the pixels.

RequiresUpload source MP4
Speaker labelsSingle speaker only
Languages~30, EN-tuned
ExportVideo file (not text)
Music handlingGood — built for shorts
Cost~$10–25/mo
Best forCreators who want pop-on word-level captions baked into the export and don't need the raw transcript.

Pricing and feature flags approximate as of 2026. YouTube caption language support varies by region.

Specific to Shorts

Three things that bite creators on generic transcription tools.

Shorts aren't tiny podcasts. The music bed, the speed, and the hashtag-heavy script all break tools that were built for meetings.

What goes wrong

  1. 1Music bed mixed hot. Generic ASR transcribes the song lyrics into the middle of your sentence. You get '[Music] go to settings [Music] tap on'.
  2. 2Brand names and hashtags (Notion, Arc, Linear, #buildinpublic) come back lowercased and phonetic. Captions look amateurish on re-upload.
  3. 3Fast-paced delivery. Shorts creators talk at 200+ WPM to fit a hook into 60 seconds. Tools tuned for meeting cadence drop word endings.

What to flip here

  1. 1Leave Vocal isolation on — it's the default. We run a music-suppression pass before recognition, so the lyrics don't leak into the transcript.
  2. 2Drop your brand list into Custom vocabulary. Channel name, product names, recurring hashtags. We pass them as bias hints to the recognizer.
  3. 3Pick the Short-form speaker model. It's tuned for one-speaker, fast-cadence delivery and weights word-boundary detection harder than the conversational model.

Recommended job settings for Shorts

Paste a Shorts URL and these flip on by default. Override per-job from the form.

Input
Public URL or MP4 upload
Speaker model
Short-form · 1-2 speakers
Vocal isolation
On (music suppression)
Filler words
Kept — creators want exact
Summary
Hook + payoff (Pro/Business)
Export
SRT · VTT · word-level JSON

Accuracy · real-world numbers

94% on a talking-head Short. The music bed sets the ceiling.

Shorts are short, so a single bad word is visible. Vocal isolation against the music track is what we tune for. Numbers below are from real Shorts URLs we've processed, not synthetic clips.

96%
Studio voiceover, no music

Recorded into a mic, music added in post but mixed low. Cleanest case — error mostly on proper nouns and slang.

94%
Talking head, light music bed

Phone or DSLR, music ducked under voice. Vocal isolation lifts the dialogue cleanly. Most Shorts land here.

87%
Loud trending-audio backing

Music sits at the same level as the voice. Words clip on hard consonants and on lyrics that overlap dialogue.

82%
Street, field or B-roll voiceover

Wind, traffic, ambient crowd. Usable text but expect a 30-second cleanup pass on numbers, names and brand mentions.

Common questions

8 things creators ask about Shorts transcription.

01Can I just paste a youtube.com/shorts/ URL?+
Yes — that's the main flow. Paste the URL, we fetch the public audio server-side and start transcribing. No browser extension, no OBS capture, no downloading the MP4 first.
02Does it work on Shorts I don't own?+
Yes, as long as the Short is public. We can't access unlisted or private videos because YouTube blocks anonymous fetches on those. For private Shorts, download the MP4 from Studio and upload it directly.
03Will the SRT line up with the re-uploaded video on TikTok or Reels?+
Yes. Timestamps reference the audio start, so as long as you don't trim the head of the clip on re-upload, the SRT drops in cleanly. Trim the front? Subtract that offset in any subtitle editor.
04What happens to the music — does it show up as [Music] like YouTube?+
No. We run vocal isolation before recognition, so the music bed gets suppressed and we transcribe only the spoken voice. You won't see [Music] tags scattered through the transcript.
05How many Shorts can I do on the free tier?+
30 minutes a month. The average Short is 30-45 seconds, so that's roughly 40-60 Shorts a month before you hit Pro. Diarization and SRT export are included on free.
06Do you handle word-level timestamps for animated captions?+
Yes, on every plan. Pick word-level JSON in the export dropdown. You can feed it straight into CapCut, Premiere or a custom Remotion template to render pop-on captions.
07What about non-English Shorts?+
99 languages supported, auto-detected from the audio. Spanish, Portuguese, Hindi, Tagalog, Arabic — all tested in production. Mixed-language Shorts (code-switching) work but accuracy dips 4-6 points.
08Can I get a summary or title suggestions from the transcript?+
Yes on Pro and Business. The summary returns a one-line hook, the payoff, and 3-5 suggested title variants based on the script. Free tier gets the transcript only.

Paste a Shorts URL. See what comes out.

30 free minutes every month — dozens of Shorts. No card. SRT, VTT and word-level JSON included on every plan.

Start free