Yes — REST API is live with webhooks. API key authentication, per-key rate limits by plan tier. Documentation at /docs/api.

Transcribe
voice recordings, audio and video, YouTube videos, audio files, video files, MP4 videos, Zoom meetings, Microsoft Teams, Google Meet, interviews, podcasts, lectures, TikTok videos, WhatsApp voice, voice memos, MP3 files, phone calls, sermons
into text. In seconds

Speech-to-text & AI transcription software for audio and video. Convert MP3, MP4, or voice to text with speaker labels and AI summary, usually faster than realtime.

Drop your audio or video

MP3 · MP4 · WAV · M4A · MOV · up to 10 hours per file

Paste a link, we'll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Free 30 min/moNo card100+ 100+ languagesSpeaker labels (Pro+)Files auto-delete in 24h

Free tier: 30 minutes per month, up to 30 min per file. No card required.

100+

Languages auto-detected

Auto-detect with manual override.

95%+

Accuracy on clean audio

Most major languages, one or two speakers.

10h

Max file length on Business

10 h on Pro · 30 min on Free.

~30×

Faster than realtime

A 60-min file typically back in 2–3 min.

This is the dashboard

Click around. It's the real thing

Tabs work. Action items toggle. This is exactly what loads in your account after a job finishes — same layout, same controls.

app.transcription.solutions / jobs / interview-ari-2026-04-26

Summary

auto-snapshot · saved

TL;DR

Founders need post-call content, not just transcripts. Tools force them to stitch 5 apps together.

318words2speakers · 58 / 425topics

Key points 3

01Gap exists between raw recordings and shippable content
02Show notes, social clips, blog drafts — expected by call's end
03Current tooling fragmented across 5+ apps

Action items 2

Investigate single-pipeline approach to replace 5-app stitch
Mock how show-note draft would look from this transcript

Topicsfounder workflowpost-call contenttooling fragmentationshow notessingle pipeline

Diarized transcript

4 lines · 2 speakers · 30s clip

00:12Speaker ASo what I keep hearing from founders is this gap between raw recordings and content you can actually ship.

00:27Speaker BExactly. Nobody wants another transcript — they want a show note, a clip, a blog draft, by the time the call ends.

00:41Speaker ARight, and the tooling right now forces you to stitch five apps together to get there.

00:54Speaker BOne pipeline, one place. That's the bet.

Speaker analysis

Stereo channel-split · diarization on mono

Speaker A

58% airtime

Turns

14s

Talk time

…this gap between raw recordings and content you can actually ship.

Speaker B

42% airtime

Turns

10s

Talk time

One pipeline, one place. That's the bet.

Export formats

Every plan, every format · 7 outputs · no watermarks · TXT · SRT · MD · JSON · VTT · DOCX · PDF

TXT

Plain text

Clean text dump · all plans

SRT

SubRip subtitle

Timestamped subtitle · all plans

Markdown

Speaker headers + summary · all plans

JSON

Structured JSON

Public schema · for API workflows · all plans

VTT

WebVTT subtitle

HTML5 video player format · all plans

DOCX

Word document

Speaker headers + timestamps · all plans

PDF

Branded PDF

Print-ready · summary & speakers · all plans

DEMO · MUTED

0:18 / 1:00

Sample output · 30 seconds of a podcast clip

One file. Eight things back

Hover or tap any output to see what it actually looks like. Same 30-second podcast clip in the center, eight artifacts derived from it.

Transcript

Punctuated · timestamped

00:12 Speaker A
So what I keep hearing from founders is this gap…

AI summary

TL;DR · key points

Founders need post-call content, not just transcripts. Tools force them to stitch 5 apps together.

Speakers

Diarization · Pro+

Stereo channel-split for two-person calls. Mono diarization for everything else.

100+ languages

Auto-detect

Research-grade ASR. Force a specific language if auto-detect picks the wrong one.

interview-ari-2026-04-26.mp3

30-second clip · 2 speakers

100+ langs · auto-detect · 95%+ accuracy

Transcript · 30s window