Episode → show notes → shipped
A long interview becomes a 5-line summary, four chapters, a transcript with speaker labels, and an SRT for short-form clips — one job, every output you actually ship.
VTT · DOCX · PDF
Speech-to-text & AI transcription software for audio and video. Convert MP3, MP4, or voice to text with speaker labels and AI summary, usually faster than realtime.
Free tier: 30 minutes per month, up to 30 min per file. No card required.
Tabs work. Action items toggle. This is exactly what loads in your account after a job finishes — same layout, same controls.
Founders need post-call content, not just transcripts. Tools force them to stitch 5 apps together.
Clean text dump · all plans
Timestamped subtitle · all plans
Speaker headers + summary · all plans
Public schema · for API workflows · all plans
HTML5 video player format · all plans
Speaker headers + timestamps · all plans
Print-ready · summary & speakers · all plans
Hover or tap any output to see what it actually looks like. Same 30-second podcast clip in the center, eight artifacts derived from it.
en-GB English (UK)0.6%en-AU English (AU)0.2%Three patterns we see weekly. The pipeline doesn't change — what you ship after it does.
A long interview becomes a 5-line summary, four chapters, a transcript with speaker labels, and an SRT for short-form clips — one job, every output you actually ship.
Three-hour Zoom recordings with two voices, end-to-end. Speaker diarization on Pro. Cite by timestamp from the DOCX export. No more "where did they say that…" scrubbing.
No auto-join, no calendar permissions, no "agent in your meeting." Drop the recording, share the transcript. Action items extracted, named, ready for triage.
Six ways in, working today. Each pill is a real ingest path that ships in production right now.
All plans include diarization-quality ASR. Higher tiers unlock larger files, queue priority, and AI summary.
For trying out, occasional one-offs, short clips.
For people running interviews, podcasts, or repeated long-form work.
For teams, agencies, and ops running on volume.
Annual billing saves 50% · Refund policy · No card required for Free
Same audio, same model. The difference is everything we do after the transcription finishes.
So what I keep hearing from founders is this gap between raw recordings and the content they can actually ship. Exactly, nobody wants another transcript, they want a show note, a clip, a blog draft, by the time the call ends. Right, and the tooling right now forces you to stitch five apps together to get there. One pipeline, one place. That's the bet. We've been seeing this pattern for months — the audio comes in clean, but the workflow downstream is held together with screenshots and copy-paste between Notion and Otter and Zapier and whatever else happens to be open in another tab when the call wraps and the deadline is in twenty minutes…
Next: paste somewhere, structure it, write the summary yourself, pull out action items by hand.
Founders don't need transcripts — they need post-processing. One pipeline beats stitching five apps.
Next: copy TL;DR into Slack, attach the DOCX to email, ship the clip. Done before the call notes get cold.
— Same audio · Same model · The difference is in the post-processing —
Unprompted reviews from signed-in users. We don't run review-incentive campaigns. Hover to pause.
Podcaster opens 5 tabs to ship one episode. One job in — show notes, transcript, clip-ready SRT out. That's it.
14 long-form interviews through diarization. DER 0.95 on clean audio is real. DOCX exports go straight into the paper draft.
26 voice memos. 3 TikTok URLs. Newsletter draft outline in 11 minutes. Try beating that with Otter — I'll wait.
Podcaster opens 5 tabs to ship one episode. One job in — show notes, transcript, clip-ready SRT out. That's it.
14 long-form interviews through diarization. DER 0.95 on clean audio is real. DOCX exports go straight into the paper draft.
26 voice memos. 3 TikTok URLs. Newsletter draft outline in 11 minutes. Try beating that with Otter — I'll wait.
Webhook + action-items extraction killed our weekly-recap-doc thing. Whole loop is 2 minutes now.
Deposition recordings → diarized transcript → cited PDF. Used to outsource this overseas. Now it's one upload.
Italian sales calls → English summaries. My team finally reads them. Tiny detail, huge impact.
Webhook + action-items extraction killed our weekly-recap-doc thing. Whole loop is 2 minutes now.
Deposition recordings → diarized transcript → cited PDF. Used to outsource this overseas. Now it's one upload.
Italian sales calls → English summaries. My team finally reads them. Tiny detail, huge impact.
Japanese auto-detect just works. The serif italic on this site is, however, an unrelated design crime I respect.
REST API + per-key rate-limit = our internal voice-memo pipeline. Took 30 minutes to wire. $19/mo for the whole team.
24h auto-delete is the feature I didn't know I wanted until I checked the privacy page of every competitor.
Japanese auto-detect just works. The serif italic on this site is, however, an unrelated design crime I respect.
REST API + per-key rate-limit = our internal voice-memo pipeline. Took 30 minutes to wire. $19/mo for the whole team.
24h auto-delete is the feature I didn't know I wanted until I checked the privacy page of every competitor.
On clear audio with one or two speakers, accuracy reaches 95%+ in most major languages. Quality drops with background noise, heavy accents, or overlapping speech.
100+ languages with auto-detect. You can also force a specific language if auto-detect picks the wrong one. UI is English-only — multi-language interface is on the planned list.
Source media (the audio/video you uploaded) is deleted from our infrastructure within 24 hours after transcription completes. The transcript and summary stay in your account until you delete them — or 30 days after you delete your account. Our speech-to-text providers (AssemblyAI primary, OpenAI fallback) process audio under their own retention policies — see /privacy for the full subprocessor list.
No. Our upstream ASR provider has training opt-out by default for paid endpoints — we use those. We add nothing on top: no own models trained on your transcripts, no shadow analytics.
Your minutes are not deducted. Most failures (private URL, file too long, codec we don't support) come with a clear error message and retry guidance.
Yes — anytime in the Stripe customer portal. You keep your plan through the paid period, then drop to Free at the next renewal date.
Full refund within 7 days if you've used less than 10% of your plan minutes. After that, pro-rated refunds for the unused portion. Email [email protected].
Yes — REST API is live, webhooks too. API key auth is on the next-up list. Rate limits per plan tier. Docs at /docs/api once you have an account.
No SOC 2 sticker. If we don't ship a control yet, we don't put a badge on it.
Audio and video you upload disappear within 24 hours of the job finishing. Hard contract, not a setting.
Upstream ASR provider has training opt-out by default — we use those endpoints. We add nothing on top.
Encryption at rest and in transit, since day one. HSTS enforced.
EU access / deletion / portability rights honored. DPA on request.
Settings → Delete account. All data wiped within 30 days. No support ticket required.
Full vendor list with purpose at /privacy. No surprise vendors.
30 free minutes a month, up to 30 min per file. No credit card, no card-after-trial, no asterisks. Cancel any plan anytime in one click.