Fumanalla MP4 video ho thedu.Audio e ntshoa ka bokotsi.

Lahla MP4 file mmala o bona — re ntsha setsi sa audio server-side, re buisa transcript e nang le nako, le re phetha SRT e buang ka botlalo YouTube, Vimeo, kapa NLE ya hao.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

↓ Bona se e busang

MP4 e kena. Transcript + SRT e tsena.

MP4 ke sethako — re bala setsi sa audio ka botlalo, re sa ntšhe video ka tsela e ncha. Dinako di dumelanong le diframo ho timeline ya hao ya pele, ka tsela SRT e dumelanong ha e kena pele.

training-module-04.mp4REC 1080p · 22:14 · 412 MB
auto-detected en-USAAC 48 kHz stereo · 192 kbps
~90s
Transcript · streaming95% accuracy
S1

Lumela, module ye re ka lebona workflow ya refund ho tloha qalolong le go ya ho pele.

S2

Potso e foufaneng pele re simologa — na se se amanang le refund ya karolo-karolo?

S1

E le hantle. Refund ya karolo-karolo e sebelisa screen tse ts'oanang empa ka khoele e fapaneng.

S2

Ke e utloisitse. Le kgetlo ya tiiso e ntse e le madi a palo e 200 dollars?

95% ho dialog e hlotshwaneSRT · VTT · DOCX · TXT · JSON

↓ This is the dashboard

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

Try it on your own file — it's free

Dikgetsi tse tharo tse meta · papiso e ne nete

Eke ka ffmpeg. Video editor. Kapa rena.

O ka ntsha audio leihlo le o hlale Whisper. O ka lahla MP4 ho Descript kapa VEED le o dule kahare ho editor ea bona. Kapa o ka lahla file apha le o fumane transcript + SRT, nang le ho ikuta ho editor.

Option 01

ffmpeg + Whisper

Mahala, ntleng, o na le math ata. O na le pipeline le bug e ka mong ho eona.

E hlokaCLI + 10 GB model + GPU
Speaker diarizationTool e sa kgatiseng (pyannote)
SRT outputEe, setsosa ka letsoho
Nako ho MP4 ya hora e le nngwe20–90 min ka CPU
Audio ye e nang le tsela tse ntsiO kgetha setsi
Theko$0 + hardware ya hao
Best forBaenjineri ba sebelisang Whisper ntleng ba sa alafang ho kopanya diarization.
Option 02

Transcription.Solutions

Lahla MP4. Audio extraction, diarization, SRT, summary — ketsahalo e le nngwe.

E hlokaBrowser, eo feela
Speaker diarizationHo lokisitswe ho kaone, kketso e ka mong
SRT outputE dumelanong le frame ho tswa
Nako ho MP4 ya hora e le nngwe~4 min, streamed
Audio ye e nang le tsela tse ntsiRe bokella diketsi tse ka mong
Theko · per min$0.03
Best forMotho o mong le o mong o nang le MP4 a batlang thedu le SRT nang le ho ithuta video editor kapa CLI.
Option 03

Descript / VEED

Lokela MP4 ho editor. Transcript e hlaha e le karolo ya timeline UI.

E hlokaAccount + editor learning curve
Speaker diarizationEe, EN-tuned
SRT outputExport-gated ka plan
Upload cap5 GB (Descript free)
Audio ye e nang le tsela tse ntsiTsela ya pele feela
Theko$12–24/user/khoeli
Best forBaotelepi ba batlang ho bukella video le transcript ho sebelisana le ditlhalefo.

Pricing and feature caps approximate as of 2026. Descript and VEED tier names change frequently — check their site for current limits.

Specific to MP4

Diphela tse tharo tse hlohang batho ho transcription tools tse tloaelehileng.

MP4 ke sethako, e seng codec — le transcription tools tse ntsi di e rata e le blob ya audio e le nngwe. Eo e matla ho tswa.

Se eleng se sa itseng

  1. 1Multi-track MP4 e nang le boom + lav. Tools tse tloaelehileng di ntsha track 1 di nyatsa tse oa, ka tsela o lahla mic e hlotshwane. E tloaeleha ho FCP le Premiere exports.
  2. 2Background music ho vlogs le ads e tataisa diphakela tsa phantom words. Recognizer e leka ho fumanalla vocals ho music bed.
  3. 3SRT timestamps drift ha tool e ntšha video ka tsela e ncha e kena. Ka miniti e 40 captions e tsao ka segundo e le nngwe.

Se eka hokela apha

  1. 1Lokela — re hlophisa setsi se seng le se seng sa audio le re o kgethela se fumanang. Default ke track e nang le highest-bitrate.
  2. 2Bontsa Music suppression ho job form. Re gate recognizer ka speech VAD ka tsela dikgannong tsa instrumental di dula tse lotoatswe.
  3. 3Re ha re ntšhe video. Audio e ntshoa ho native sample rate, timestamps di referisa container's edit list — SRT e dumelanong le diframo.

Recommended job settings ho MP4

Lahla MP4 le tsena di bontshwa ka default. Override per-job ho tswa ho form.

Audio extraction
Native sample rate, nang le ho ntšha
Track selection
Highest-bitrate stream
Diarization
Acoustic · 1-6 speakers
Music suppression
On for vlog/ad presets
SRT format
≤42 chars/line, 2 lines max
Export
SRT · VTT · DOCX · timestamped TXT

Accuracy · real-world numbers

95% ho shoot e hlotshwane. Dinomoro tsa nete ha audio e na le mathata.

MP4 accuracy e hlalosiwa ke mic, e seng codec. Mic ya lav ho set e hlotshwane e fapana le camera ya 4K e nang le audio e etsoang ka ho ka sa leloko. Dinomoro tsa tlase li tsoang ho MP4 tsa customer tse nete, tse hlophisitsoe ke se eleng se fumanwate ho audio.

96%+
Studio shoot, lav kapa shotgun mic

Lapel kapa boom ho recorder, 48 kHz AAC at 192+ kbps, room e lokisitswe. Nyakiso ea holimo. Speaker labels di tlatsang ho shoot ya batho ba babeli.

93%
DSLR e nang le shotgun e hokahileang ho camera

Mic ya mokgosi 2-4 feet ho tswa basalapali. Room tone e le nngwe empa puo e a utloahala. YouTube creator footage e ntso e fihla apha.

89%
Screen recording e nang le USB mic

OBS, Loom, Camtasia exports. Mic e atamela empa room ha e lokisitswe, hangata e na le system audio bleed. E le hantle haholo ho transcripts tsa tutorial.

84%
Phone-shot vlog, internal mic

Mic ya phone e etsoang, leqale kapa handling noise, hole e fapana shot le shot. Mantsoe a kgonehala, ipaakante 1-2 fixes per minute ho ditshwantshiso tse nepahetseng.

Dipotso tse tloaelehileng

Diphela tse 8 di botsoang batho ho MP4 transcription.

01Na le ntšha video ea ka ka tsela e ncha?+
Nnyaa. Re bala feela setsi sa audio ho tswa MP4 container. Video stream ha e atamele, ha e ntšhe ka tsela e ncha, le ha e bolokoe after job e fihla — o boloka file ya hao e sitisitso.
02Ke likhoele life kahare ho MP4 tse tšehetsoang?+
Standard H.264 + AAC ke nyakiso e bonolo. Re boela e thekela HEVC/H.265, ProRes-in-MP4, le audio ho MP3, Opus, ALAC, kapa PCM. Haeba ffmpeg e ka hlophisa, re ka fumanalla.
03Ke file size cap efe?+
10 GB per upload ho web uploader, 50 GB via API e nang le resumable chunks. Typical 1-hour 1080p MP4 ke 1-3 GB ka tsela e nyane batho ba bang ha ba nahane ka web path.
04Na SRT e dumelanong le video ea ka ea pele?+
Ee — timestamps di referisa MP4's edit list le native sample rate. Re sa ntšhe ka tsela e ncha, ka tsela ha ho na drift. Lahla SRT e leng le MP4 ho player ka moka kapa NLE le captions sync ho kena pele.
05Na nka ntši di subtitles ho video?+
Ha e na ho rena — re output SRT le re siea burn-in ho editor ya hao. ffmpeg one-liner, HandBrake, Premiere, DaVinci, Kapwing tse ka mong di amana SRT e re hlatsang. Re sa batla ho ba tool ya encoding le yona.
06Ke MOV, MKV, M4V, WebM?+
Tse ka mong tse hlalositsoe ka pipeline tse ts'oanang. MOV e le hantle — sethako sa MPEG-4 se ts'oanang, extraction path e ts'oanang. MKV e nang le audio tracks tse ntsi e fumana stream-picker UI tse ts'oanang le multi-track MP4.
07Na nka lahla YouTube kapa Vimeo URL?+
Ee ho YouTube — kopanya public URL ho upload screen le re fumane audio ka botlalo, nang le ho lokela MP4 download. Vimeo e hloka file e nang le direct kapa signed download link hobane player e baka stream.
08Se se etsahalang ha ha ho na setšoantšo se bokolotseng, feela music kapa B-roll?+
VAD e kofofosa dikgannong le music-only sections le e di fokotsa, ka tsela o sa tohe bakeng sa ambient footage. Transcript e tshwaya dikgannong tseo e le `[music]` kapa `[no speech]` fapaneng le ho hlatsang mantsoe.

Lahla MP4 ya hao. Fumana transcript le SRT ka morao.

30 mahala metsotsoana khoeli e ka mong. Ha ho card. Audio e ntshoa server-side, speaker labels, frame-accurate SRT — tse ka mong tse sekamoseng.

Start free