Focus group transcription — transcribe a focus group with speaker labels for every participant

Focus group transcription.Every speaker labelled, every word.

Drop a focus group recording with 6, 8, even 10 voices. Get a verbatim transcript with each participant labelled, cross-talk tagged, and a DOCX that loads straight into NVivo.

Drop your audio or video

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-delete in 24h

Eight participants in. Labelled verbatim out.

Focus groups are the hardest diarization case in our queue — similar demographics, similar voices, frequent cross-talk overlap. We tag the overlap inline instead of dropping it, then you rename Speaker 3 → 'Participant_F2' once and it propagates.

Focus group recordingREC Moderator + 7 participants · 1:23:14

auto-detected en-US44 kHz boundary mic · WAV

~90s

Transcript · streaming91% accuracy · 8 speakers

So when you first opened the packaging — walk me through what you noticed.

Honestly? The first thing was the smell. Like a hospital, kind of clinical —

Yeah, same. I thought it was supposed to be the lavender one.

Right, and the label says lavender but it really doesn't —

91% on 8-speaker room micDOCX (QDA-ready) · SRT · TXT · JSON

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

app.transcription.solutions / interview-202.mp3Export

Summary 5Transcript 1,420Speakers 2Exports

interview-202.mp347:08128 kbps CBR2 speakersen-US auto-detected

Founders need post-call content, not just transcripts. Tools force them to stitch 5 apps together.

Sample preview from a founder interview about post-call workflow. Real transcripts look exactly like this — same tabs, same summary block, same key-points / action-items split, same auto-tag chips.

Key points

Gap exists between raw recordings and shippable content — tools stop at transcript.

Show notes, social clips, blog drafts all expected by call's end, not next-day.

Current tooling fragmented across 5 apps — no single pipeline.

Conversion-rate signal flipped a buyer-segment assumption at week 3.

40% of original hypothesis survived — the shape held, mechanics rebuilt.

Action items

Speaker 1Investigate single-pipeline approach to replace 5-app stitch.

Speaker 2Mock how show-notes draft could flow from the transcript.

Speaker 2Pull conversion-rate by segment, Monday EOD.

Speaker 1Map the 5-app stitch & list which steps actually need a human.

Auto-taggedfounder interviewpost-call contenttooling fragmentationsingle pipeline

Try it on your own file — it's free

Option 01

Rev human verbatim

A human types it. High accuracy, but 24-hour turnaround and the price scales linearly with hours.

Accuracy~99% (human)

Turnaround12–24 hours typical

Cross-talkMarked [crosstalk]

QDA exportDOCX, manual cleanup

Cost · per min$1.50 verbatim

90-min group~$135

Best forDissertation work or regulated research where every disfluency must be human-verified.

Option 02

Transcription.Solutions

Diarization tuned for 6-10 voices, cross-talk tagged inline, DOCX export sized for NVivo, ATLAS.ti, and Dedoose.

Accuracy88–94% on group audio

Turnaround~1× realtime

Cross-talkTagged, not dropped

QDA exportDOCX with speaker turns

Cost · per min$0.03

90-min group~$2.70

Best forResearchers running multiple groups who need a first-pass transcript in NVivo by tomorrow morning, not next week.

Option 03

Otter / Sonix

Generic AI built for meetings. Decent on 2-3 speakers, falls apart past 5 — and exports don't anticipate QDA software.

AccuracyDrops past 5 speakers

TurnaroundFast

Cross-talkOften dropped

QDA exportNo native NVivo format

Speaker capSoft limit ~6

Cost$17–22/user/mo

Best forSmall interviews and 1-on-1s where the recording has 2-3 voices and lives in a calendar workflow.

Pricing accurate as of May 2026. Accuracy ranges come from our internal sample of customer focus group files, not synthetic benchmarks.

94% on lavalier-per-participant. Holds at 82% on a single room mic.

Focus group accuracy is bottlenecked by microphone topology, not the model. A lavalier on every participant gives us clean per-speaker channels — diarization becomes trivial. One boundary mic on a conference table with 8 voices is the hard case. Numbers below come from real research recordings in our pipeline.

8 things people ask about focus group transcription.

01Can I rename Speaker 1 to a participant's actual name or ID?+

Yes. Click any speaker chip in the editor, type the name or screener ID (e.g. 'P04_F_34'), and it propagates to every turn from that speaker in the transcript. The DOCX export uses the renamed labels.

02How do you handle cross-talk and overlapping speech?+

We tag it inline with `[overlap]` markers and keep both speakers' utterances in the transcript. Generic tools usually pick one voice and drop the other — we don't, because the overlap moments are often where the actual focus group dynamics live.

03Does the DOCX really import cleanly into NVivo and ATLAS.ti?+

Yes. We export with speaker labels as paragraph-style headings, which NVivo auto-codes during import and ATLAS.ti recognises as speaker turns. Dedoose accepts the same DOCX via its transcript import path.

04How many speakers can you diarize in one file?+

Soft ceiling around 12. Past that, acoustic clustering starts merging similar voices — which usually means a 10-15 minute rename pass on your end. Set 'Expected speakers' explicitly in the job form for best results.

05Verbatim or cleaned-up — can I choose?+

Both. Verbatim mode keeps every 'um', false start, and repeated word for discourse analysis. Cleaned strips disfluencies for readability. You pick per-job; the default for the research template is verbatim.

06What about IRB requirements and participant confidentiality?+

Files are processed in our infrastructure, not sent to third-party APIs. We offer a per-job auto-delete-after-N-days flag for IRB protocols. We're SOC 2 Type II and GDPR-compliant; the DPA is on the legal page if your IRB needs it.

07Should I record video or audio-only?+

Audio-only is fine — we don't use video for diarization. If you have video for participant identification, keep it locally for your own coding; uploading just the audio track is faster and cheaper.

08How does the cost compare to Rev human verbatim?+

A 90-minute focus group runs about $2.70 here versus roughly $135 on Rev verbatim. Trade-off is accuracy: we land at 86-94% depending on mic setup, Rev's human transcribers hit ~99%. Most researchers use us for the first pass and only escalate specific groups to human if needed.

Focus group transcription.Every speaker labelled, every word.

Drop your audio or video

Paste a link, we’ll fetch the audio

Record straight from your browser

Eight participants in. Labelled verbatim out.

This is what loads when the job finishes.

Founders need post-call content, not just transcripts. Tools force them to stitch 5 apps together.

Rev human. Generic AI. Or us.

Rev human verbatim

Transcription.Solutions

Otter / Sonix

Three things that bite researchers on generic AI tools.

What goes wrong

What to flip here

Recommended job settings for focus groups

94% on lavalier-per-participant. Holds at 82% on a single room mic.

8 things people ask about focus group transcription.

Drop a focus group recording. See the transcript in NVivo by tomorrow.