Start free

Blog · · 9 min read

PhD dissertation interview transcription: IRB, member-checking, and qualitative coding

Dissertation interviews are governed by IRB protocols and benefit from member-checking. How AI transcription saves weeks of manual typing without breaking the research method.

AI transcription for PhD dissertation interviews — without breaking your IRB protocol

If your dissertation method involves 20-60 semi-structured interviews and your IRB-approved consent form already discloses "audio recordings will be transcribed," AI transcription saves you 4-8 weeks of typing without changing your research design. The transcript still goes back to the participant for member-checking, still gets coded in NVivo or ATLAS.ti, and still belongs to you. What changes is the first pass — instead of three hours of foot-pedal typing per hour of audio, you get a draft in roughly the length of the recording, and you spend 30-45 minutes per hour cleaning it.

The seams worth knowing up front: AI transcripts are a draft, not the certified record. You still listen-and-correct against the audio before sending to participants. Diarization on a noisy café recording with three speakers will not be perfect. And if your protocol promised "audio will not leave an encrypted local drive," AI cloud transcription is off the table for this study — you'd need a local model like Whisper on your own hardware instead.

Why dissertation transcripts have been a hand-typed bottleneck

Qualitative researchers have typed their own interviews for roughly 50 years because the alternatives were worse. Paying a transcription service ran $1.50-$3.00 per audio minute — a 30-interview corpus at 60 minutes each is $2,700-$5,400, which most stipends don't cover. Sending recordings to a human typist also meant a third party handling identifiable data, which created an IRB amendment problem either way.

So most PhD candidates did it themselves with Express Scribe, a USB foot pedal, and a long winter. The standard ratio is 4:1 — four hours of typing per hour of clean audio, more if the recording is messy or multi-speaker. A 25-interview corpus at 45 minutes each is roughly 75 hours of typing before you've coded a single line.

There is a methodological argument that hand-typing IS the first pass of analysis — you notice things while you type. That's real, and we won't pretend it isn't. But most supervisors will tell you the same noticing happens when you read and correct an AI draft against the audio, in a third of the time, with a cleaner audit trail.

What your IRB protocol likely already permits

Most approved consent forms from the last decade include language like "recordings will be transcribed and stored securely" or "transcription may be performed using software tools." If yours does, you are probably already covered. The IRB cares about three things in roughly this order:

  1. Identifiable data handling: where the audio lives, who can access it, how long it's retained, when it's destroyed.
  2. Third-party processors: whether the transcription vendor is named, has a data-processing agreement, and doesn't train models on your audio.
  3. Re-identification risk: whether transcripts get de-identified before being shared with committee members or archived.

If your protocol named a specific service ("transcripts will be produced by Rev.com") and you want to switch, you file a minor amendment — usually a one-page form, approved in 1-2 weeks at most US institutions. If your protocol said "the researcher will personally transcribe," you may need a more substantive amendment because you're adding a processor. If your protocol explicitly forbade cloud processing, you cannot use us — run a local Whisper instance instead.

We run AssemblyAI Universal-3 in production. Audio is processed on AssemblyAI's infrastructure and not used for model training (per their published policy). We expose an opt-out endpoint at /opt-out/{token} so a participant who later withdraws can have their data purged — IRB coordinators often want to see exactly that. We are not a HIPAA BAA-covered product: relevant for clinical interviews involving PHI, less relevant for most social-science dissertations under a standard IRB.

The practical version: print the data-handling section of your protocol, read it next to the vendor's data policy, and send your IRB coordinator one email. They answer this question routinely.

Member-checking against an AI transcript

Member-checking — sending the transcript back to the participant for verification — works the same way with an AI draft as with a hand-typed one, with two adjustments.

First, correct the AI transcript against the audio before you send it. Names, technical jargon, institution names, negations, and numbers are the usual error sites. On a clean 16 kHz interview recording our word error rate sits around 7.88% (per AssemblyAI's published benchmark on Universal-3); on a phone-quality 8 kHz cellular recording it climbs to roughly 17.7%. That's a meaningful difference — a 60-minute interview at 7.88% WER has maybe 500-700 word-level errors, most trivial ("um" placement, "the" vs "a"), but ten or twenty will be substantive and need fixing before a participant sees the document. If a participant can use Zoom, Teams, or a local recorder instead of a phone call, the transcript needs less cleanup.

Second, decide what version you're sending. There are two valid choices:

  • Verbatim with disfluencies: every "um," false start, and overlap preserved. Useful for conversation analysis, discourse analysis, narrative inquiry where pause and hesitation carry meaning.
  • Clean read: false starts removed, "ums" stripped, grammar lightly tidied. Useful for thematic analysis, IPA, grounded theory where content matters more than performance.

We output verbatim by default and offer a cleaned version on export. Tell your participants which one they're reviewing — a participant who sees their own "um, like, you know" pattern transcribed honestly sometimes asks for edits that change the data. That's a methodological choice for you and your supervisor, not a technical one.

A reasonable member-check email goes out within two weeks of the interview, gives the participant 2-3 weeks to respond, and explicitly says "feel free to clarify, correct, or withdraw any passage." Save their reply alongside the transcript. Some participants edit heavily, most don't reply at all — both outcomes are documented in the qualitative literature.

Try it on your audio

Start free →

30 minutes a month, no card.

Coding-ready output: speakers, timestamps, no fake interjections

A qualitative coding pass goes faster when the transcript has three things right: speaker labels that don't flip, timestamps you can use to jump back to the audio, and disfluencies handled consistently.

Speaker labels. If you recorded on Zoom or a two-channel field recorder with separate mics for you and the participant, we channel-split and the speaker labels are perfect — Speaker A is always you, Speaker B is always the participant. If you recorded mono on a single device (phone, single-mic recorder), we use pyannote-3.1 diarization. It's good for 2-4 speakers in a quiet room and degrades past 6 — so a one-on-one interview in a coffee shop is fine; a 5-person focus group with cross-talk will need manual cleanup.

Timestamps. Word-level timestamps export to all major CAQDAS tools, so a code applied at minute 23:14 of the transcript lets you click back to second 1394 of the audio. That matters during analysis, during your viva, and during any later challenge to your interpretation.

Disfluencies. "Uh" and "um" get transcribed when spoken, not invented out of background noise — a known failure mode in older ASR engines that produced phantom "uhs" during silence. False interjections corrupt pause-sensitive analyses, so we don't generate them.

Multilingual interviews. We support 99 languages at one price. ASR quality still depends on audio conditions, accent, code-switching, and specialized vocabulary — budget extra review time and, if you can, get a second reader who shares the participant's dialect.

For the broader pipeline — file formats, batch uploads, naming conventions — our research interview workflow and the researcher landing page cover the operational side.

Export to NVivo, ATLAS.ti, MAXQDA, Dedoose

Each major CAQDAS tool prefers a different import path. Here's what works as of May 2026:

  • NVivo (Windows/Mac): import as .docx with speaker labels as paragraph styles, or .txt with timestamps. NVivo 14 reads our .docx export directly and assigns speakers as cases.
  • ATLAS.ti (Windows/Mac/Web): .docx or .srt. ATLAS.ti Web handles .srt with timestamps cleanly and links them back to the audio file if you upload both.
  • MAXQDA: .docx with timestamps in #00:00:00# format. MAXQDA auto-detects the pattern on import.
  • Dedoose: .docx or .txt. Dedoose is lighter on timestamp handling, so a clean .docx with speaker labels is usually enough.
  • Taguette (free, open-source): .txt or .html. Strip the timestamps if you find them noisy in the highlight view.

We export .docx, .srt, .vtt, .txt, and .json. JSON is what you want if you're scripting anything custom — every word has start time, end time, speaker, and confidence score.

Keep file names stable across the corpus, and don't rename audio after importing the transcript. A pattern like P04_interview_audio.wav paired with P04_transcript_reviewed.docx survives a committee handoff and a five-year archive.

If you're undecided on a tool, the honest summary: NVivo is the institutional default at most R1 universities; ATLAS.ti has the cleanest interface for first-time coders; MAXQDA has the best mixed-methods features; Dedoose is browser-based and cheaper for self-funded students.

How to document AI transcription in your methods chapter

Don't bury this in a footnote. Your methods paragraph protects the study's credibility against any examiner who asks how the transcripts were made.

Name these specifics: recording method, transcription tool and underlying ASR engine, whether transcripts were AI-generated and then human-reviewed, who did the review, verbatim vs cleaned conventions, how speakers and timestamps were handled, whether member-checking was offered and how edits were incorporated, when de-identification happened, and where files lived under your IRB-approved data plan.

A workable template, to adapt — not paste:

Interviews were audio-recorded with participant consent and transcribed using AssemblyAI Universal-3 via Transcription.Solutions. The researcher reviewed each draft against the recording, corrected substantive errors and identifying details, returned transcripts to participants for member-checking within two weeks, and incorporated participant edits before importing the reviewed file into NVivo 14 for thematic coding. Audio and transcripts were stored on the university's encrypted research drive per IRB protocol #XXXX.

If you didn't do member-checking, don't imply you did. If you used strict verbatim conventions, say so explicitly.

What AI cannot do for this work

AI cannot decide whether your IRB protocol permits the upload. That's on you and your coordinator.

AI cannot certify that a published quote is correct. Re-listen to any passage you plan to quote in the dissertation — especially negations, numbers, names, and emotionally charged statements.

AI cannot replace analytic memoing. The transcript is the object you code, not the analysis itself.

And AI cannot know which disfluencies matter to your method. In one study "um" is noise; in another, hesitation is evidence. Your methodology decides, the model doesn't.

A one-interview pilot before you process the whole corpus

Don't upload 30 interviews on day one. Run one through the full pipeline and check every handoff.

  1. Pick your messiest interview — background noise, strongest accent, worst microphone. If the pipeline handles that one, the clean ones are fine.
  2. Upload it on the Free plan — 30 audio-minutes/month, all exports unlocked. A 45-minute interview costs you nothing.
  3. Listen and correct against the audio — time yourself. Most researchers report 30-45 minutes of correction per hour of audio, versus 3-4 hours of cold typing.
  4. De-identify per your protocol before the transcript leaves your machine.
  5. Send to the participant using your standard member-check template. If their corrections cluster on AI errors, tighten the correction pass; if they cluster on substance, that's the data.
  6. Import into your CAQDAS tool and code the first 10 minutes. Confirm speaker labels, timestamps, and disfluency style match your method.
  7. Write the methods paragraph now, while the workflow is fresh.

If the pilot works, the Pro plan at 1200 minutes/month covers roughly 20-25 interviews of typical length. A 30-interview corpus fits in one billing month plus an overage pack.

What next

  • Read your IRB-approved consent form's data-handling section and email your coordinator if "AI transcription" or "third-party transcription service" isn't explicitly covered.
  • Run one pilot interview through the Free plan — 30 minutes, no card required, all exports unlocked.
  • Decide with your supervisor whether your method calls for verbatim or cleaned transcripts before you send anything to participants.
  • If your dissertation involves protected health information under HIPAA, email us before uploading — we handle data securely but are not BAA-covered yet.