Mental health intake transcription: building progress note drafts that survive audit
A 90-minute psychiatric intake produces roughly 12,000–15,000 spoken words. AI transcription can turn that into a structured progress note draft — CC, HPI, PMH, family history, mental status observations — in under 10 minutes of clinician edit time. What it cannot do is sign the note, write the diagnostic impression, or assess risk. Those stay with you.
This is the honest version of where AI fits in mental health intake transcription, what insurance auditors actually look for, and what we ship that helps — and what we don't.
Why intake notes are the hardest note to write
Intake density is the problem. Follow-up sessions revisit known material; intakes establish it from scratch — chief complaint, history of present illness, past psychiatric history, medical history, family history, social history, substance use, trauma history, current medications, allergies, mental status. A single 90-minute session can touch all of them, often out of order, with the client circling back twice.
Clinicians who type during sessions either lose eye contact or lose detail. Clinicians who write after session lose 30–60 minutes of post-session time per intake. Neither scales when you have four intakes a week.
This is where intake session notes AI helps in a narrow way: it captures the verbatim record so you can listen with both ears — tracking affect, thought process, and rapport — and then reorganizes raw transcript into the sections the payer expects. The cognitive load you free up is the part of the work only a clinician can do.
What an AI draft can prep for you
The transcript is the substrate. On top of it, an LLM pass can sort utterances into the structured fields a psychiatric intake template requires. We're not claiming the AI writes the note — we're claiming it stages a draft.
Sections an AI pass can populate from a clean intake transcript:
- Chief complaint (CC): usually the client's own words from the first 10 minutes. Pull verbatim, in quotes.
- History of present illness (HPI): onset, duration, severity, triggers, prior episodes, prior treatment. The model can extract these when the clinician asked about them directly.
- Past psychiatric history (PPH): prior diagnoses, hospitalizations, medication trials, therapy history.
- Past medical history (PMH): chronic conditions, surgeries, current medications, allergies.
- Family history: psychiatric and medical, by relation.
- Social history: living situation, employment, relationships, substance use, legal history, trauma exposure if disclosed.
- Mental status observations: appearance, behavior, speech, mood (client-reported), affect (clinician-observed — see caveat below), thought process, thought content, cognition, insight, judgment.
A psychiatric intake transcript with diarization separates clinician questions from client answers, which is what makes HPI extraction work at all. We use channel-split diarization for stereo recordings (perfect separation) and pyannote-3.1 for mono (good for 2–4 speakers, degrades beyond 6). For family intakes or interpreter sessions, speaker labels need more review.
One number to keep in mind: on clean 16 kHz audio with a good mic, our primary engine (AssemblyAI Universal-3) runs around 7.88% WER. On 8 kHz telephony — phone-only telehealth fallback, for example — WER rises to roughly 17.7%. Clinical terms (SSRIs, SNRIs, specific drug names, DSM codes) sit in the long tail of any general ASR model, so proofread medication names and dosages every time.
What the clinician still has to do
This is the part that does not move. An AI-prepared draft is not a clinical document until you make it one.
The clinician owns:
- Diagnostic impression: DSM-5-TR codes, differential, rule-outs. The model can surface candidate symptoms; it cannot diagnose.
- Risk assessment: suicidality, homicidality, self-harm, grave disability. Even if the client denied SI/HI on questioning, the documented risk assessment is a clinical judgment, not an extraction task. Write it yourself.
- Affect (observed): the client tells you their mood; you observe their affect. Transcripts don't capture affect. Body language, eye contact, psychomotor activity — your notes.
- Treatment plan: modality, frequency, goals, measurable objectives, target dates. Payers audit this section harder than any other.
- Medical decision-making rationale: why this diagnosis, why this medication, why this level of care. AI can summarize what was said; it cannot justify what you decided.
- Signature and credentials: the legal attestation that makes the document a medical record.
One rule to keep: the AI prepares the descriptive sections (what was said), the clinician owns the interpretive sections (what it means and what to do).
What AI gets wrong in intake notes — the failure modes that matter
The most dangerous transcription error is not a typo. It's a fluent, plausible sentence that the client did not say, or that reverses clinical meaning. Build review steps around the predictable failure modes:
- Negation flips: "no intent to act" transcribed as "intent to act", or "denies SI" as "endorses SI". Always verify SI/HI statements against audio.
- Medication name swaps: Lamictal vs. Lexapro, Klonopin vs. clonidine, sertraline vs. citalopram. Common-word collisions ("Celexa" → "Celeste") happen too.
- Diagnostic near-misses: bipolar II vs. bipolar I, MDD with anxious distress vs. GAD, PTSD vs. acute stress.
- Speaker attribution in family or couples sessions: pyannote-3.1 will sometimes assign a client's disclosure to a partner or parent.
- Overlapping speech during conflict or crisis discussion — the highest-stakes moments are also the hardest to transcribe.
- Pronoun and referent drift: "the episode", "that time", "it" — the model loses the antecedent and the LLM downstream invents one.
- Overconfident summaries of trauma or risk: the LLM pass smooths jagged disclosures into clean paragraphs. Read the source.
Treat AI-generated sections as draft text with timestamps back to the transcript. If a sentence affects diagnosis, risk, medication, mandated reporting, or level of care, verify it before you sign.
Insurance audit triggers — what payers actually check
Audits on mental health intakes (CPT 90791, 90792) look for a short list of failure modes. Most have nothing to do with transcript quality and everything to do with metadata around the note.
- Time-in-session match: 90791 requires a face-to-face diagnostic evaluation. Some payers want documented start/end times. If your note says "60-minute intake" and the recording is 38 minutes, that's a clawback.
- Clinician signature with credentials and date: unsigned notes or notes signed by the wrong credential level get denied. Co-signature requirements for trainees vary by payer.
- Medical necessity: the note must justify the diagnosis billed and the treatment recommended. A transcript dump is not medical necessity — your synthesis is.
- Required elements present: state Medicaid programs and major commercial payers publish element checklists. Missing family hx, missing substance use screening, missing risk assessment — all common denial reasons.
- Timeliness: many payers require notes signed within 24–72 hours of session. Drafts sitting unsigned for a week are an audit flag.
- Internal consistency: if the HPI mentions panic attacks but the diagnosis is MDD with no anxiety specifier and no rule-out, the auditor will ask why.
A transcript-backed note has one audit advantage: if there's ever a question about what was assessed, the recording exists. That's also why retention and access controls matter.
Compliance: the part we have to be specific about
We handle audio and transcripts with HIPAA-grade controls at rest and in transit. We are not a BAA-covered product yet. Per HHS HIPAA guidance, a vendor that creates, receives, maintains, or transmits PHI for a covered entity is a business associate; if your policy requires a signed BAA, we are not the right vendor today. Email us if you want to be in the pilot, but do not assume coverage.
For solo practitioners working with cash-pay clients who have consented to recording, or for clinics that de-identify before upload, the workflow below works. For BAA-required workflows, wait for the pilot or use a BAA-covered transcription vendor.
Two practical notes:
- Two-party consent for recording is a state-by-state question. Get written consent at intake, document it, store the consent form with the record.
- Whatever you record is discoverable. Some clinicians keep the audio only until the note is signed, then delete. Others retain for the full record retention period (often 7 years for adults, longer for minors). Pick a policy and document it.
For more on the underlying pipeline, see our audio-to-text feature page and the clinicians vertical page.
The practical workflow: record → transcribe → edit in 10 minutes
Here is the workflow we see working for solo practitioners and small group practices.
- Get consent in writing at intake. A one-paragraph addendum to your standard paperwork covers it. Name the vendor, name the purpose (note preparation), name the retention period.
- Record on the device closest to the speakers. A laptop mic in a quiet office beats a phone mic across the room. For telehealth, record the platform's native track if possible — that gives you stereo with channel separation.
- Upload right after session. A 90-minute intake transcribes in roughly 4–8 minutes of wall-clock time. Note that a full 90-minute file exceeds the Free plan's 30 audio-min/month allowance (as of May 2026); use a shorter test segment to pilot, or move to Pro for 600 audio-minutes/month.
- Run a structured-note prompt against the transcript. CC / HPI / PMH / family hx / social hx / MSE get populated. The output is a draft.
- Edit in a fixed order, not top-to-bottom. Wandering through the note is what blows the 10-minute budget. The pass we recommend:
- 2 min: metadata — date, start/stop time, modality, participants.
- 2 min: proper nouns and clinical terms — names, medications, dosages, hospitals, prior clinicians.
- 2 min: tighten HPI, history, impairment, goals.
- 2 min: write the clinician-owned sections — diagnostic impression, risk, safety plan, treatment plan.
- 2 min: audit fields — medical necessity statement, signature, credentials, attestations.
- Verify SI/HI and medication lines against audio. Open the scrubber on any sentence that affects risk, diagnosis, or dosing.
- Set a retention timer on the audio. Keep per your retention policy, or delete after sign-off. Either is defensible; inconsistency is not.
Total clinician time, post-session: 10–15 minutes for a 90-minute intake, down from 45–60 minutes of typing or dictation cleanup. Intakes stay around 10–15 minutes because the interpretive work is irreducible.
A note on meeting bots for telehealth: we use Recall.ai under the hood for Google Meet, Zoom, and Microsoft Teams. The bot appears in the participant list under a configurable name, and two-party consent disclosure is posted via chat on join. We do not ship live captions inside the session — we transcribe the recording after.
Where we sit vs. other tools you've evaluated
You've probably looked at dictation-first tools (Dragon Medical) and ambient scribes (Abridge, Suki, DeepScribe, Augmedix), plus general transcription (Otter.ai, Rev, Descript). Honest read:
- Ambient clinical scribes: purpose-built for clinical encounters, BAA-covered, EHR-integrated. If you need EHR write-back and a signed BAA today, this is the category.
- Dictation tools (Dragon Medical): mature, fast, narrow. You speak the note; the tool types it. No client audio captured.
- General transcription (Otter, Rev, us): captures the full session, produces a transcript, you build the note from it. Cheaper, more flexible, no BAA today on our side.
We're in the third bucket. Our pitch: high-quality transcript, fast turnaround, flat per-minute pricing across 99 languages, no BAA today. If that matches your situation, the workflow above works. If you need BAA-covered ambient scribing with EHR integration today, the first bucket is what you want.
What next
- Try a single intake this week. Record one consented session, upload a segment on the Free plan (30 minutes/month, as of May 2026), run the structured-note prompt, and time your edit against your current workflow.
- Read the audio-to-text pipeline page for model details (Universal-3, diarization, supported formats).
- If your practice needs BAA coverage before you can pilot, email us — we're building the list and will tell you honestly if you're a fit.
- Write your consent addendum before you record anything. Two paragraphs, vendor named, retention named, client signature. That's the prerequisite.