Focus group transcription.Every speaker labelled, every word.

Drop a focus group recording with 6, 8, even 10 voices. Get a verbatim transcript with each participant labelled, cross-talk tagged, and a DOCX that loads straight into NVivo.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

↓ Watch what comes out

Eight participants in. Labelled verbatim out.

Focus groups are the hardest diarization case in our queue — similar demographics, similar voices, frequent cross-talk overlap. We tag the overlap inline instead of dropping it, then you rename Speaker 3 → 'Participant_F2' once and it propagates.

Focus group recordingREC Moderator + 7 participants · 1:23:14
auto-detected en-US44 kHz boundary mic · WAV
~90s
Transcript · streaming91% accuracy · 8 speakers
S1

So when you first opened the packaging — walk me through what you noticed.

S2

Honestly? The first thing was the smell. Like a hospital, kind of clinical —

S3

Yeah, same. I thought it was supposed to be the lavender one.

S2

Right, and the label says lavender but it really doesn't —

91% on 8-speaker room micDOCX (QDA-ready) · SRT · TXT · JSON

↓ This is the dashboard

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

Try it on your own file — it's free

Three real options · honest comparison

Rev human. Generic AI. Or us.

Researchers usually pick between paying a human transcriber (slow, accurate, expensive) or running the file through a generic AI tool that wasn't built for 8-voice rooms. We sit in between — AI speed, diarization tuned for research recordings, and a DOCX that drops into NVivo without surgery.

Option 01

Rev human verbatim

A human types it. High accuracy, but 24-hour turnaround and the price scales linearly with hours.

Accuracy~99% (human)
Turnaround12–24 hours typical
Cross-talkMarked [crosstalk]
QDA exportDOCX, manual cleanup
Cost · per min$1.50 verbatim
90-min group~$135
Best forDissertation work or regulated research where every disfluency must be human-verified.
Option 02

Transcription.Solutions

Diarization tuned for 6-10 voices, cross-talk tagged inline, DOCX export sized for NVivo, ATLAS.ti, and Dedoose.

Accuracy88–94% on group audio
Turnaround~1× realtime
Cross-talkTagged, not dropped
QDA exportDOCX with speaker turns
Cost · per min$0.03
90-min group~$2.70
Best forResearchers running multiple groups who need a first-pass transcript in NVivo by tomorrow morning, not next week.
Option 03

Otter / Sonix

Generic AI built for meetings. Decent on 2-3 speakers, falls apart past 5 — and exports don't anticipate QDA software.

AccuracyDrops past 5 speakers
TurnaroundFast
Cross-talkOften dropped
QDA exportNo native NVivo format
Speaker capSoft limit ~6
Cost$17–22/user/mo
Best forSmall interviews and 1-on-1s where the recording has 2-3 voices and lives in a calendar workflow.

Pricing accurate as of May 2026. Accuracy ranges come from our internal sample of customer focus group files, not synthetic benchmarks.

Specific to focus groups

Three things that bite researchers on generic AI tools.

Flip the right settings up front and the transcript drops into NVivo without a cleanup weekend.

What goes wrong

  1. 1Cross-talk gets dropped. Most consumer tools pick one speaker during overlap and discard the rest. You lose exactly the moments where consensus or pushback happens.
  2. 2Speakers collapse to 3. Tools assume meeting-sized rooms and cap diarization clusters low. Your eight participants come back as 'Speaker 1' / 'Speaker 2' / 'Speaker 3'.
  3. 3Export is one wall of text. No paragraph breaks per speaker turn, no DOCX structure NVivo can auto-code on import.

What to flip here

  1. 1Turn on Tag overlapping speech in the job form. Cross-talk gets inline `[overlap]` markers and both speakers retain their utterances.
  2. 2Set Expected speakers: 8-12 explicitly. We size the diarization cluster count to match instead of guessing low.
  3. 3Choose DOCX (QDA-ready) export. Speaker turns become paragraphs prefixed with the label — NVivo, ATLAS.ti, and Dedoose all auto-detect this format on import.

Recommended job settings for focus groups

Drop a focus group file with the 'research' template and these flip on by default. Override per-job from the form.

Diarization
Acoustic · expected 6-10 speakers
Verbatim mode
Full — disfluencies kept
Overlap handling
Tag inline [overlap]
Custom vocabulary
Product / brand names from screener
Speaker labels
Editable post-job, propagate-all
Export
DOCX (QDA-ready) · timestamped TXT

Accuracy · real-world numbers

94% on lavalier-per-participant. Holds at 82% on a single room mic.

Focus group accuracy is bottlenecked by microphone topology, not the model. A lavalier on every participant gives us clean per-speaker channels — diarization becomes trivial. One boundary mic on a conference table with 8 voices is the hard case. Numbers below come from real research recordings in our pipeline.

94%
Lavalier per participant

Each participant on their own track, mixed to multitrack WAV. Diarization skipped — text-only error. Best case for dissertation-grade work.

91%
Conference mic, 4-6 participants

Boundary mic centred on the table, moderate room treatment. Voices distinguishable, occasional confusion between same-gender participants of similar age.

86%
Single room mic, 7-10 participants

Cross-talk frequent, similar voices merge under acoustic diarization. Expect a 10-minute rename and merge pass on the speaker chips before analysis.

82%
Remote group on mono Zoom

Compressed mono mix, no per-channel split available. Words still usable for thematic coding, but disfluency-level verbatim claims weaken here.

Common questions

8 things people ask about focus group transcription.

01Can I rename Speaker 1 to a participant's actual name or ID?+
Yes. Click any speaker chip in the editor, type the name or screener ID (e.g. 'P04_F_34'), and it propagates to every turn from that speaker in the transcript. The DOCX export uses the renamed labels.
02How do you handle cross-talk and overlapping speech?+
We tag it inline with `[overlap]` markers and keep both speakers' utterances in the transcript. Generic tools usually pick one voice and drop the other — we don't, because the overlap moments are often where the actual focus group dynamics live.
03Does the DOCX really import cleanly into NVivo and ATLAS.ti?+
Yes. We export with speaker labels as paragraph-style headings, which NVivo auto-codes during import and ATLAS.ti recognises as speaker turns. Dedoose accepts the same DOCX via its transcript import path.
04How many speakers can you diarize in one file?+
Soft ceiling around 12. Past that, acoustic clustering starts merging similar voices — which usually means a 10-15 minute rename pass on your end. Set 'Expected speakers' explicitly in the job form for best results.
05Verbatim or cleaned-up — can I choose?+
Both. Verbatim mode keeps every 'um', false start, and repeated word for discourse analysis. Cleaned strips disfluencies for readability. You pick per-job; the default for the research template is verbatim.
06What about IRB requirements and participant confidentiality?+
Files are processed in our infrastructure, not sent to third-party APIs. We offer a per-job auto-delete-after-N-days flag for IRB protocols. We're SOC 2 Type II and GDPR-compliant; the DPA is on the legal page if your IRB needs it.
07Should I record video or audio-only?+
Audio-only is fine — we don't use video for diarization. If you have video for participant identification, keep it locally for your own coding; uploading just the audio track is faster and cheaper.
08How does the cost compare to Rev human verbatim?+
A 90-minute focus group runs about $2.70 here versus roughly $135 on Rev verbatim. Trade-off is accuracy: we land at 86-94% depending on mic setup, Rev's human transcribers hit ~99%. Most researchers use us for the first pass and only escalate specific groups to human if needed.

Drop a focus group recording. See the transcript in NVivo by tomorrow.

30 free minutes every month. No card. Speaker labels, cross-talk tagging, QDA-ready DOCX export included on every plan.

Start free