Drop an audio file in your browser and get a clean transcript with optional speaker labels and AI summary, usually faster than realtime. No app to install.
Audio-to-text conversion (also called speech-to-text or transcription) turns recorded human speech into searchable, editable text. Transcription.Solutions does this in your browser: upload an MP3, WAV, M4A, OGG, FLAC, or OPUS file, and within minutes you receive a transcript with speaker turns, optional AI summary, and exports in TXT, SRT, VTT, or DOCX. The source audio is permanently deleted from our infrastructure within 24 hours of completion.
Three steps from file to text. No queueing UI, no per-step prompts — paste or drag, walk away, come back to a finished transcript.
Drag a file into the browser or paste a public URL. Supported: MP3, WAV, M4A, OGG, FLAC, OPUS, WEBM. Up to 4 hours per file on Business; 60 minutes on Pro.
We split long audio into chunks, run speaker diarization on Pro and Business plans, and assemble the final transcript with speaker labels. Language is auto-detected; you can also force a specific language if needed.
Read the transcript inline, copy individual speaker turns, run an AI summary, or export to TXT, SRT, VTT, or DOCX. Source audio is deleted within 24 hours; the transcript stays in your account until you delete it.
Sentence boundaries, capitalisation, and punctuation are inserted automatically. Filler words (um, uh) are kept by default and can be filtered out on export.
Two or more voices are separated and labelled (Speaker 1, Speaker 2). Available on Pro and Business plans. Manual rename per speaker.
On Pro and Business plans we generate key points, decisions, and action items from the transcript. Useful for meeting recaps and interview prep.
Export SRT or VTT for video subtitles, plus a clean DOCX for reports and a plain TXT for downstream tooling.
Every transcript is full-text searchable inside your account. Find a quote across hundreds of files in seconds.
Available on Pro and Business plans (and on Free for evaluation, see /docs/api). Per-key rate limits, signed webhook callbacks, JWT auth.
All formats below are accepted directly — no need to convert before upload. Maximum file size depends on plan: 100 MB on Free, 500 MB on Pro, 2 GB on Business.
The most common audio format on the web. Compressed, small files, supported everywhere. We accept any MP3 bitrate from 32 kbps to 320 kbps.
Uncompressed studio-quality audio. Often used for podcast masters and field recordings before publishing. Files are larger but transcription quality is identical to MP3.
Apple's default voice memo format and the audio track of MP4 video. Common on iPhone, iPad, and Mac. Works without conversion.
Open-format containers used by WhatsApp voice notes, Telegram audio messages, and modern web recorders. We accept both.
Lossless compression, popular for archival and audiophile recordings. Slightly slower upload due to file size; transcription accuracy is identical to lower-bitrate formats.
Browser-recorded audio (e.g. from our own in-app recorder, Google Meet exports). Direct upload, no re-encoding required.
100+ languages with automatic detection. The user interface is English-only; transcripts are returned in the original spoken language. Mixed-language audio (e.g. Spanish-English code-switching) typically transcribes well in the dominant language but can split. Force a specific language in advanced settings if auto-detect picks the wrong one.
English (US/UK), Spanish, German, French, Italian, Portuguese (BR/PT), Dutch, Polish, Russian, Ukrainian, Turkish, Greek, Czech, Swedish, Norwegian, Danish, Finnish, Hungarian, Romanian.
Mandarin Chinese, Cantonese, Japanese, Korean, Hindi, Bengali, Tamil, Telugu, Marathi, Thai, Vietnamese, Indonesian, Malay, Filipino/Tagalog, Urdu.
Arabic (multiple dialects), Hebrew, Persian/Farsi, Pashto, Swahili, Amharic, Afrikaans, Yoruba, Igbo, Hausa.
Welsh, Catalan, Basque, Galician, Esperanto, Latvian, Lithuanian, Estonian, Slovenian, Slovak, Serbian, Croatian, Bulgarian, Macedonian, Albanian, Maltese, Icelandic.
On clear audio with one or two speakers, accuracy reaches 95%+ in most major languages. Quality drops with background noise, heavy accents, overlapping speech, or low-bitrate phone-call audio. The transcript is a starting point — for legal or medical use, expect to do a human pass.
Studio-quality audio, single or two clear speakers, no background music, common languages (English, Spanish, French, German, Mandarin).
Conference calls, podcast interviews, meeting recordings with mid-quality microphones. Some words misheard, mostly trivial to fix on a single read.
Phone calls (8 kHz audio), heavy accents, overlapping speakers, background noise, less-common languages. Speaker diarization may merge or split voices.
Music, songs with vocals, multi-speaker debates with crosstalk, or audio recorded at very low volume. Output may need significant cleanup.
The answers we give to the same questions every week. Anything missing? Email support@transcription.solutions.
MP3, WAV, M4A, OGG, FLAC, OPUS, and WEBM. Maximum file size is 100 MB on Free, 500 MB on Pro, and 2 GB on Business. Maximum file duration is 30 minutes on Free, 60 minutes on Pro, and 4 hours on Business.
On clear audio with one or two speakers, accuracy reaches 95%+ in most major languages. Quality drops with background noise, heavy accents, overlapping speech, or low-bitrate phone audio. For legal or medical work, plan a human review pass.
Usually faster than realtime. A 30-minute meeting typically completes in 3-5 minutes, a 1-hour podcast in 6-10 minutes. Long files are split into chunks and processed in parallel.
Yes, on Pro ($19/month) and Business ($49/month) plans. Two or more voices are separated and labelled Speaker 1, Speaker 2, etc. You can rename each speaker after the fact.
Yes — 100+ languages with automatic detection, including Spanish, French, German, Portuguese, Mandarin, Japanese, Hindi, Arabic, and many less-common languages. The user interface is English-only; transcripts are returned in the original spoken language.
Source audio is permanently deleted from our infrastructure within 24 hours after transcription completes. Transcripts and summaries stay in your account until you delete them. We do not train models on your data; the upstream speech-to-text provider operates under no-training paid endpoints.
Yes. REST API with webhooks, JWT authentication, and per-key rate limits. Available on Pro, Business, and Free for evaluation. See /docs/api for endpoint reference.
Plain text (TXT), subtitle formats (SRT and VTT for video), and Microsoft Word (DOCX) for reports. All formats include or omit timestamps based on your selection.
60 free minutes per month, no card required. Upgrade only when you outgrow it.
Start free