Start free

Transcribe
voice recordingsaudio and videoYouTube videosaudio filesvideo filesMP4 videosZoom meetingsMicrosoft TeamsGoogle MeetinterviewspodcastslecturesTikTok videosWhatsApp voicevoice memosMP3 filesphone callssermons
into text. In seconds

Speech-to-text & AI transcription software for audio and video. Convert MP3, MP4, or voice to text with speaker labels and AI summary, usually faster than realtime.

Drop your audio or video

MP3 · MP4 · WAV · M4A · MOV · up to 10 hours per file

Paste a link, we'll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

Free 30 min/moNo card100+ 100+ languagesSpeaker labels (Pro+)Files auto-delete in 24h

Free tier: 30 minutes per month, up to 30 min per file. No card required.

100+
Languages auto-detected
Auto-detect with manual override.
95%+
Accuracy on clean audio
Most major languages, one or two speakers.
10h
Max file length on Business
10 h on Pro · 30 min on Free.
~30×
Faster than realtime
A 60-min file typically back in 2–3 min.
This is the dashboard

Click around. It's the real thing

Tabs work. Action items toggle. This is exactly what loads in your account after a job finishes — same layout, same controls.

app.transcription.solutions / jobs / interview-ari-2026-04-26

Summary

auto-snapshot · saved
TL;DR

Founders need post-call content, not just transcripts. Tools force them to stitch 5 apps together.

318words2speakers · 58 / 425topics

Key points 3

  • 01Gap exists between raw recordings and shippable content
  • 02Show notes, social clips, blog drafts — expected by call's end
  • 03Current tooling fragmented across 5+ apps

Action items 2

  • Investigate single-pipeline approach to replace 5-app stitch
  • Mock how show-note draft would look from this transcript
Topicsfounder workflowpost-call contenttooling fragmentationshow notessingle pipeline

Diarized transcript

4 lines · 2 speakers · 30s clip
00:12Speaker ASo what I keep hearing from founders is this gap between raw recordings and content you can actually ship.
00:27Speaker BExactly. Nobody wants another transcript — they want a show note, a clip, a blog draft, by the time the call ends.
00:41Speaker ARight, and the tooling right now forces you to stitch five apps together to get there.
00:54Speaker BOne pipeline, one place. That's the bet.

Speaker analysis

Stereo channel-split · diarization on mono
Speaker A
58% airtime
2
Turns
14s
Talk time
…this gap between raw recordings and content you can actually ship.
Speaker B
42% airtime
2
Turns
10s
Talk time
One pipeline, one place. That's the bet.

Export formats

Every plan, every format · 7 outputs · no watermarks · TXT · SRT · MD · JSON · VTT · DOCX · PDF
TXT

Plain text

Clean text dump · all plans

SRT

SubRip subtitle

Timestamped subtitle · all plans

MD

Markdown

Speaker headers + summary · all plans

JSON

Structured JSON

Public schema · for API workflows · all plans

VTT

WebVTT subtitle

HTML5 video player format · all plans

DOCX

Word document

Speaker headers + timestamps · all plans

PDF

Branded PDF

Print-ready · summary & speakers · all plans

DEMO · MUTED
0:18 / 1:00
Sample output · 30 seconds of a podcast clip

One file. Eight things back

Hover or tap any output to see what it actually looks like. Same 30-second podcast clip in the center, eight artifacts derived from it.

Transcript

Punctuated · timestamped

00:12 Speaker A
So what I keep hearing from founders is this gap…
AI summary

TL;DR · key points

Founders need post-call content, not just transcripts. Tools force them to stitch 5 apps together.
Speakers

Diarization · Pro+

Stereo channel-split for two-person calls. Mono diarization for everything else.
100+ languages

Auto-detect

Research-grade ASR. Force a specific language if auto-detect picks the wrong one.
interview-ari-2026-04-26.mp3
30-second clip · 2 speakers
100+ langs · auto-detect · 95%+ accuracy
Transcript · 30s window
00:12
ASo what I keep hearing from founders is this gap.
00:14
AThe call ends, the actual work begins.
00:18
BRight — post-call eats the day.
00:21
ATools assume the transcript is the deliverable.
00:24
AIt's the input.
00:27
BSo you stitch five apps together by hand.
AI summary
TL;DR: Founders need post-call content, not raw transcripts. Today's tools force a 5-app workflow.
Key points
  • Transcript is the input, not the deliverable
  • Action items beat raw text
  • One pipeline beats stitched-together SaaS
Diarization · 2 speakers detected
Speaker A
Speaker B
0:000:150:30
Stereo channel split · 62% / 38% turn share
Language detection
English (en-US)99.2%
Other candidates
en-GB English (UK)0.6%
en-AU English (AU)0.2%
Detected on upload · override anytime · 100+ languages
Exports · 7 formats · no watermarks
TXT interview-ari-2026-04-26.txt34 KB
SRT interview-ari-2026-04-26.srt52 KB
VTT interview-ari-2026-04-26.vtt51 KB
MD interview-ari-2026-04-26.md38 KB
JSON interview-ari-2026-04-26.json71 KB
DOCX interview-ari-2026-04-26.docx91 KB
PDF interview-ari-2026-04-26.pdf146 KB
URL ingest · 1500+ sites supported
youtube.com/watch?v=Hk8L4mD2pXv
Fetch metadata0.3s
Download audio4.2 MB
Extract speechstereo · 44 kHz
Queue for ASR
REC00:42 / 60:00
Safari on iPhone · Chrome on desktop
Auto-stops at 60 min — upload longer files
Live job status
Upload0:08
Audio extract0:02
ASR · AssemblyAI U-247%
Diarizationqueued
AI summaryqueued
Export renderqueued
Status pushed step-by-step · no refresh needed
Exports

7 formats · no watermarks

TXTSRTMDJSONVTTDOCXPDF
URL ingest

YouTube · TikTok · Instagram

Paste any video link. We download once, transcribe, and discard the source.
Browser record

Mic in iPhone Safari · Chrome

Hit record, talk, hit stop. No app install. Up to 60 min per recording.
Real-time progress

WebSocket job status

Live status from upload → ASR → diarization → done. No polling, no waiting blind.
Who's using this

Transcription software built for the people who actually do the work

Three patterns we see weekly. The pipeline doesn't change — what you ship after it does.

01Podcasters

Episode show notes shipped

A long interview becomes a 5-line summary, four chapters, a transcript with speaker labels, and an SRT for short-form clips — one job, every output you actually ship.

7 formatsTXT · SRT · MD · JSON
VTT · DOCX · PDF
02Researchers

Long-form interviews, cited by timestamp

Three-hour Zoom recordings with two voices, end-to-end. Speaker diarization on Pro. Cite by timestamp from the DOCX export. No more "where did they say that…" scrubbing.

95%+ASR accuracy
on clean audio
03Small teams

Recordings action items assignees

No auto-join, no calendar permissions, no "agent in your meeting." Drop the recording, share the transcript. Action items extracted, named, ready for triage.

2,500Minutes per month
on Business plan
Inputs we accept

Drop a file, paste a link,
or call our API

Six ways in, working today. Each pill is a real ingest path that ships in production right now.

YouTubeTikTokInstagramDirect media URLPublic REST APIWebhooksYouTubeTikTokInstagramDirect media URLPublic REST APIWebhooksYouTubeTikTokInstagramDirect media URLPublic REST APIWebhooksYouTubeTikTokInstagramDirect media URLPublic REST APIWebhooks
Pricing

Plans that
actually fit

All plans include diarization-quality ASR. Higher tiers unlock larger files, queue priority, and AI summary.

MonthlyAnnual −50%
Free
$0forever
No card · no trial expiry

For trying out, occasional one-offs, short clips.

  • 30 minutes per month
  • Up to 30 min per file
  • All 7 export formats · no watermarks
  • Low-priority queue
Start free →
Email verification required
Most popular
Pro
$19$19/ month
Cancel anytime · $0.04 / min overage

For people running interviews, podcasts, or repeated long-form work.

  • 600 minutes per month
  • Up to 10 hours per file
  • Speaker labels + AI summary
  • Action items + topic tags
  • “Make readable” paragraph polish
  • Translation · webhook delivery
  • Standard queue priority
Choose Pro →
Overage $0.04 / min · cancel anytime
Business
$49$49/ month
Cancel anytime · $0.02 / min overage

For teams, agencies, and ops running on volume.

  • 2,500 minutes per month
  • Up to 10 hours per file
  • Everything in Pro · 50 translations / mo
  • High-priority queue
  • Public REST API · per-key rate-limit tier
  • Priority email support
Choose Business →
Overage $0.02 / min · cancel anytime

Annual billing saves 50% · Refund policy · No card required for Free

Same audio · two outputs

Free gives you words.
Pro ships deliverables.

Same audio, same model. The difference is everything we do after the transcription finishes.

Free output

So what I keep hearing from founders is this gap between raw recordings and the content they can actually ship. Exactly, nobody wants another transcript, they want a show note, a clip, a blog draft, by the time the call ends. Right, and the tooling right now forces you to stitch five apps together to get there. One pipeline, one place. That's the bet. We've been seeing this pattern for months — the audio comes in clean, but the workflow downstream is held together with screenshots and copy-paste between Notion and Otter and Zapier and whatever else happens to be open in another tab when the call wraps and the deadline is in twenty minutes…

Plain transcriptNo speaker labelsNo summaryAll 7 formats

Next: paste somewhere, structure it, write the summary yourself, pull out action items by hand.

Pro output
TL;DR

Founders don't need transcripts — they need post-processing. One pipeline beats stitching five apps.

00:12 Speaker ASo what I keep hearing from founders is this gap between raw recordings and content you can actually ship.
00:27 Speaker BExactly. Nobody wants another transcript — they want a show note, a clip, a blog draft, by the time the call ends.
00:41 Speaker ARight, and the tooling right now forces you to stitch five apps together to get there.
00:54 Speaker BOne pipeline, one place. That's the bet.
Action items · 2
  1. Try a unified pipeline — audio in, notes & exports out, one job.
  2. Replace the Otter + Notion + Zapier stack before the next call.
TL;DR · 1 lineSpeakers · diarizedAction items · 2“Make readable” polish

Next: copy TL;DR into Slack, attach the DOCX to email, ship the clip. Done before the call notes get cold.

— Same audio · Same model · The difference is in the post-processing —

In the wild

What our users won't shut up about

Unprompted reviews from signed-in users. We don't run review-incentive campaigns. Hover to pause.

MR
Maya Reyes
@mayarcuts · podcaster

Podcaster opens 5 tabs to ship one episode. One job in — show notes, transcript, clip-ready SRT out. That's it.

Apr 181 job in
DA
Dr. Diego Alarcón
@diegoalarcon · researcher

14 long-form interviews through diarization. DER 0.95 on clean audio is real. DOCX exports go straight into the paper draft.

Apr 22DER 0.95
SO
Sora Okafor
@sorawrites · writer

26 voice memos. 3 TikTok URLs. Newsletter draft outline in 11 minutes. Try beating that with Otter — I'll wait.

Apr 1911 min
MR
Maya Reyes
@mayarcuts · podcaster

Podcaster opens 5 tabs to ship one episode. One job in — show notes, transcript, clip-ready SRT out. That's it.

Apr 181 job in
DA
Dr. Diego Alarcón
@diegoalarcon · researcher

14 long-form interviews through diarization. DER 0.95 on clean audio is real. DOCX exports go straight into the paper draft.

Apr 22DER 0.95
SO
Sora Okafor
@sorawrites · writer

26 voice memos. 3 TikTok URLs. Newsletter draft outline in 11 minutes. Try beating that with Otter — I'll wait.

Apr 1911 min
JV
Jules Verstappen
@julesverops · ops

Webhook + action-items extraction killed our weekly-recap-doc thing. Whole loop is 2 minutes now.

Apr 232 min loop
RK
Rohan Kapoor
@rohan_legal · counsel

Deposition recordings → diarized transcript → cited PDF. Used to outsource this overseas. Now it's one upload.

Apr 241 upload
EM
Elena Marchetti
@elenamarch · sales

Italian sales calls → English summaries. My team finally reads them. Tiny detail, huge impact.

Apr 27IT → EN
JV
Jules Verstappen
@julesverops · ops

Webhook + action-items extraction killed our weekly-recap-doc thing. Whole loop is 2 minutes now.

Apr 232 min loop
RK
Rohan Kapoor
@rohan_legal · counsel

Deposition recordings → diarized transcript → cited PDF. Used to outsource this overseas. Now it's one upload.

Apr 241 upload
EM
Elena Marchetti
@elenamarch · sales

Italian sales calls → English summaries. My team finally reads them. Tiny detail, huge impact.

Apr 27IT → EN
TN
Tomi Nakamura
@tominaka · translator

Japanese auto-detect just works. The serif italic on this site is, however, an unrelated design crime I respect.

Apr 21auto-detect
PL
Priya Lakshmi
@priyalbuilds · founder

REST API + per-key rate-limit = our internal voice-memo pipeline. Took 30 minutes to wire. $19/mo for the whole team.

Apr 25$19/mo
FA
Fatima Al-Rashid
@fatima_writes · journalist

24h auto-delete is the feature I didn't know I wanted until I checked the privacy page of every competitor.

Apr 2624h delete
TN
Tomi Nakamura
@tominaka · translator

Japanese auto-detect just works. The serif italic on this site is, however, an unrelated design crime I respect.

Apr 21auto-detect
PL
Priya Lakshmi
@priyalbuilds · founder

REST API + per-key rate-limit = our internal voice-memo pipeline. Took 30 minutes to wire. $19/mo for the whole team.

Apr 25$19/mo
FA
Fatima Al-Rashid
@fatima_writes · journalist

24h auto-delete is the feature I didn't know I wanted until I checked the privacy page of every competitor.

Apr 2624h delete
FAQ

Questions people actually ask

How accurate is the transcription?+

On clear audio with one or two speakers, accuracy reaches 95%+ in most major languages. Quality drops with background noise, heavy accents, or overlapping speech.

What languages?+

100+ languages with auto-detect. You can also force a specific language if auto-detect picks the wrong one. UI is English-only — multi-language interface is on the planned list.

How long do you keep my files?+

Source media (the audio/video you uploaded) is deleted from our infrastructure within 24 hours after transcription completes. The transcript and summary stay in your account until you delete them — or 30 days after you delete your account. Our speech-to-text providers (AssemblyAI primary, OpenAI fallback) process audio under their own retention policies — see /privacy for the full subprocessor list.

Do you train models on my recordings?+

No. Our upstream ASR provider has training opt-out by default for paid endpoints — we use those. We add nothing on top: no own models trained on your transcripts, no shadow analytics.

What happens if a job fails?+

Your minutes are not deducted. Most failures (private URL, file too long, codec we don't support) come with a clear error message and retry guidance.

Can I cancel?+

Yes — anytime in the Stripe customer portal. You keep your plan through the paid period, then drop to Free at the next renewal date.

What's the refund policy?+

Full refund within 7 days if you've used less than 10% of your plan minutes. After that, pro-rated refunds for the unused portion. Email [email protected].

Do you have an API?+

Yes — REST API is live, webhooks too. API key auth is on the next-up list. Rate limits per plan tier. Docs at /docs/api once you have an account.

Security & privacy

The boring stuff, handled

No SOC 2 sticker. If we don't ship a control yet, we don't put a badge on it.

100%
Auto-deletion
of source files within 24 hours, every time
0
Trackers · ads · resale
Your audio is never used to train models
1×
Click to delete
Account + all data wiped within 30 days

Source files erased in 24h

Audio and video you upload disappear within 24 hours of the job finishing. Hard contract, not a setting.

No training on your data

Upstream ASR provider has training opt-out by default — we use those endpoints. We add nothing on top.

AES-256 + TLS 1.3

Encryption at rest and in transit, since day one. HSTS enforced.

GDPR-aligned

EU access / deletion / portability rights honored. DPA on request.

One-click deletion

Settings → Delete account. All data wiped within 30 days. No support ticket required.

Subprocessor list

Full vendor list with purpose at /privacy. No surprise vendors.

— READY WHEN YOU ARE

Drop a file.
Get a transcript
before your coffee gets cold

30 free minutes a month, up to 30 min per file. No credit card, no card-after-trial, no asterisks. Cancel any plan anytime in one click.

Free / month30 min
Languages100+
Export formats7
MP3MP4WAVM4AMOVMKVWEBMYOUTUBETIKTOKINSTAGRAMBROWSER RECORD