The producer's path: drop the post-production master, get a transcript with speaker labels, AI-generated chapter markers with timestamps, an editable show-notes draft, and the SRT for the YouTube version of the episode. The full sequence reads below.
Podcast transcription turns recorded podcast audio into a publishable text artifact: a transcript with speaker labels, an AI-generated show-notes draft with timestamped chapter markers, key-quote extraction for social, and a searchable archive of every past episode. Transcription.Solutions handles podcasts three ways — upload the post-production master (MP3 / WAV / FLAC / M4A), paste a SoundCloud or Bandcamp URL, or paste the YouTube link if your show also publishes there. The transcript becomes the source of truth for the episode page, the blog post, the social clips, and the back-catalogue search.
Most weekly producers settle into this exact sequence by episode 10. The tool fits between mastering and publish — the steps below describe what happens around it.
Bounce the post-produced episode out of your DAW (Reaper, Logic, Hindenburg, Adobe Audition) as 16-bit WAV or 320 kbps MP3. Music beds in, intros and outros baked in.
Drag the master onto the upload area. Up to 2 GB on Business — covers a 4-hour interview at WAV. Pro 500 MB; Free 100 MB.
Voices separated, transcript chunked and parallelised, chapter markers and key quotes extracted. A 60-minute episode lands in the dashboard around minute 10.
Click "Speaker 2" once, type the guest's name. The rest of the transcript updates. If you have a recurring co-host, save them in the speaker library; rename auto-applies next time.
AI summary suggests 5-9 chapters with timestamps. Drag to reorder, edit titles, drop the obvious ones. Most producers keep ~80% as-is.
Copy chapter syntax into Captivate / Buzzsprout / Transistor / Anchor (each format slightly different — we provide all three). DOCX for the show-notes blog post. SRT for the YouTube video version.
Episode out, transcript live on the show page, blog draft saved for the next workday, archive searchable for the listener-emails-asking-which-episode that come back six months later.
Captivate's RSS feed gives direct MP3 URLs per episode. Copy-pasted the 47 URLs into a CSV.
Hit /jobs/bulk with the 47 URLs and an Authorization header. One webhook URL to receive completion notices.
Total audio: 43.5 hours. Diarization on, summary on, DOCX + SRT exports. Pipeline parallelised the queue across workers.
Used Captivate's bulk-update API to add a "Transcript" tab to every old episode page. DOCX downloaded, converted to HTML, pasted as the tab content.
Auto-generated chapter markers with timestamps. Edit the topic names, drag the order, export as the description for your hosting service (Captivate, Buzzsprout, Transistor — they all accept timestamped chapters).
Use the AI summary outline as the post structure, paste in the most-quotable transcript blocks, ship in 30 minutes. Major source of inbound search traffic for many podcast networks.
Listener emails asking "which episode did you talk about X?" become 5-second answers. Full-text search across hundreds of episodes with click-to-jump-to-moment.
Solo monologues, two-person interviews, three-host roundtables — all separated. Rename once per episode. Useful for guest credit and for transcripts that go on the website with attribution.
Many shows now upload to YouTube as well as audio platforms. Same transcript produces SRT and VTT for video captions — re-upload to YouTube Studio to replace auto-captions.
Webhook into your post-production flow: upload master, get transcript + show notes back. Saves the manual click for shows that publish weekly. JWT auth, per-key rate limits.
Most podcasters either upload the post-production master (best quality), or paste a public link (fast, no waiting for the upload). Both work; below are the actual options ordered by how often users pick them.
The post-production needs of a solo podcast, an interview show, and a narrative-edited podcast are different. Pick the closest match for workflow tips.
Studio-recorded podcasts are the easiest case for ASR — controlled environment, decent mics, low background noise. Field-recorded interviews and Zoom-recorded shows are harder. Honest expectations below.
Post-produced files with EQ, compression, and a music intro/outro. The conditions every successful weekly show creates by episode 10.
Interview podcasts where the guest joins via Zoom or Squadcast. The host's local audio is studio-grade; the guest's is internet-quality. Diarization handles it cleanly; word accuracy on the guest side typically lands here.
Some narrative podcasts (the This American Life style) layer music quietly under the speech for emotional pacing. We handle this OK if the music is significantly quieter than the voice. If the mix is roughly equal, the transcript drifts.
Diarization is excellent at two speakers, good at three, and may merge voices at four or more. For a panel show with five hosts, plan a manual speaker-correction pass.
Tier-1 languages have high regional-accent coverage (Glaswegian, Texan, Australian, Brazilian Portuguese all work well). Heavy accents in tier-2 or tier-3 languages drop accuracy 5–10 points.
We don't pull from RSS feeds directly — paste an individual episode URL or upload the file. We don't transcribe DRM-protected content (Spotify-exclusive shows, Apple Podcasts subscriptions). For those, you'd need to record the playback and upload the audio.
Time for a weekly podcast producer to find which episode a topic was mentioned in, down from three hours of re-listening. The single most-cited reason producers stay on the platform after a back-catalogue migration.
60 free minutes per month. Drop your latest master or paste a SoundCloud URL — first transcript and AI show notes in about 10 minutes.
Start free