SOAP note transcription from voice dictation: what works in 2026
If you're a solo or small-practice clinician dictating SOAP notes without an enterprise EHR macro suite, the working answer in 2026 is: dictate into a voice-to-text engine with a medical vocabulary boost, paste the transcript into your EHR, and review every line before signing. AI gets you to a 90%+ accurate draft on clean dictation in under a minute per visit. It does not get you to a signed note, and — read this twice — most general transcription products including ours are not HIPAA BAA-covered, so the legal record still depends on where you store the audio and who handles it along the way.
That's the short version. The rest of this piece is what changes for SOAP specifically, where the seams are, and the trial workflow we'd actually run if we were billing E/M codes tomorrow.
What a SOAP note actually needs from a transcription tool
Generic dictation gets you text. A SOAP note needs structured text with four labelled sections — Subjective, Objective, Assessment, Plan — plus accurate handling of fields that change clinical meaning:
- Medical terminology: drug names ("metoprolol", "hydrochlorothiazide"), anatomy ("L4-L5"), eponyms ("McMurray's test"), abbreviations ("BID", "PRN", "SOB").
- Numerals and units: "BP 138 over 84", "5 milligrams PO", "A1c 6.8". ASR often writes these as words when you want digits, or splits "138/84" into "one thirty-eight eighty-four".
- ICD-10 codes: dictating "I-10" or "E-eleven point nine" needs to land as "I10" and "E11.9", not "I ten" or "eleven point nine".
- Negation and laterality: "no chest pain" vs "chest pain", "right knee" vs "left knee". A single dropped word here can flip the chart.
A SOAP-capable workflow has to either let you dictate the section headers ("Subjective colon...") and have them rendered correctly, or apply a template after the fact. The honest current state: AssemblyAI Universal-3, which is what we run for voice-to-text, handles the prose well — clinical English at 16 kHz comes in around 7-9% WER — but you'll still hand-fix codes, dosages, and laterality on most notes.
Takeaway: transcription gives you a draft. Structure, codes, and the words that change liability are a review step, not an automation step.
Why generic dictation apps mangle medical terms
The default vocabulary in consumer ASR is trained on broad internet English. "Lisinopril" appears far less often in that corpus than "listening pro", which is what you'll see if you dictate fast. Two things fix this:
Custom vocabulary / word boost. Modern ASR APIs (AssemblyAI, Deepgram, Whisper via fine-tuning) accept a list of expected terms with weights. Loading your top 200 drug names, your specialty's procedures, and your local hospital names cuts substitution errors on those terms by a large margin — AssemblyAI's and Deepgram's published documentation reports word-level error on boosted terms dropping from 20-30% to under 5% on the same audio.
Specialty-specific language models. Dedicated medical ASR vendors (Nuance Dragon Medical One, 3M M*Modal Fluency Direct, Augnito) ship pre-trained on clinical corpora. Dragon Medical has a 25+ year head start on medical vocabulary and is still the benchmark for solo dictation accuracy on terminology. The trade-off is licensing cost ($99-199/user/month depending on reseller, as of May 2026) and a Windows-leaning desktop install.
General transcription products — ours included — sit in the middle. We don't ship a pre-built medical model, and custom vocabulary upload is on the roadmap but not yet exposed in our product. On clean lapel-mic dictation in a quiet exam room our underlying ASR (AssemblyAI Universal-3) benchmarks to roughly 8-12% WER on medical English without boosting — usable as a review draft, not a Dragon replacement. If you need word-boost today, AssemblyAI's direct API is the closest match.
Takeaway: if you dictate 30+ notes a day and your terminology is dense (cardiology, oncology, ortho), buy a medical-specific ASR. If you dictate 5-15 notes a day and your vocabulary is general primary care or behavioral health, a general engine with custom vocabulary is enough.
Workflow patterns: dictate-during, dictate-between, dictate-end-of-day
Three patterns actually work. Each has different audio quality and different cognitive load.
Dictate during the visit
You speak the note out loud in front of the patient, often narrating the exam ("inspecting the right knee, no effusion, McMurray's negative"). Audio is captured by a lapel mic or your phone on the desk.
- Pros: note is done when the patient walks out. Patient hears what you're documenting — some clinicians find this builds trust.
- Cons: cross-talk from the patient ends up in the transcript. You need diarization that separates speakers. On mono audio with 2 speakers, pyannote-3.1 (what we use for mono) does this reasonably; on 4+ speakers it degrades.
- Best for: physical therapy, primary care follow-ups, behavioral health intakes where narration is natural.
Dictate between patients
You step into your office for 60-90 seconds between visits and dictate a structured note from memory. Audio is clean (closed door, close mic), no patient voice.
- Pros: highest transcription accuracy of the three patterns. Single speaker, controlled environment.
- Cons: 60-90 seconds × 20 patients = 20-30 minutes of dictation, plus review.
- Best for: most outpatient specialties. This is the pattern Dragon was designed for and where general transcription competes well.
Dictate end-of-day
You batch-dictate all your notes after clinic closes, working from a paper jot sheet.
- Pros: deep work, no context switching during clinic.
- Cons: recall decay. Notes dictated 6 hours after a visit miss details — especially laterality, dose changes, and return precautions. CMS audit risk if your timestamps and your memory diverge.
- Best for: low-volume specialties, locum work, or as a fallback when the day blows up.
Takeaway: dictate-between-patients wins on accuracy and compliance. Pick that as your default and use the others as exceptions.
HIPAA considerations — what we ship and what we don't
This is the section where we have to be precise, because clinicians get hurt by vague vendor claims.
HIPAA is not a checkbox. It's a Privacy Rule and a Security Rule, and the operational requirement for a vendor handling PHI is a signed Business Associate Agreement (BAA) plus technical, administrative, and physical safeguards. Per HHS guidance, a vendor that creates, receives, maintains, or transmits PHI for a covered entity generally needs a BAA. Encryption at rest is necessary, but it does not replace a BAA.
What we ship today:
- Encryption in transit (TLS 1.2+) and at rest (AES-256).
- Access controls and audit logging on internal data access.
- Per-user opt-out and deletion endpoints.
What we do not ship today:
- A signed BAA. We are HIPAA-grade in handling but not BAA-covered. If you are recording PHI and need a covered processor, we are not it yet — email us, we're piloting.
- A native iPhone app. Dictation works in mobile Safari but there's no offline capture.
- On-device-only processing. Audio is processed in our cloud (AssemblyAI Universal-3 inference runs server-side).
Don't forget the recorder. The HIPAA review has to cover the full chain — recorder, storage, transcription vendor, EHR handoff. iOS Voice Memos syncs to iCloud by default. Android recorders often sync to Google Drive. If you dictate into a phone app that auto-uploads before you ever touch the transcription vendor, your PHI has already left the building. Turn cloud sync off on the recording device or use a recorder that writes locally only.
What this means in practice: if you want to use general transcription for SOAP notes today and stay clean on HIPAA, you have three real options:
- De-identify before upload: dictate using MRN tokens or chart IDs your policy permits, no names, no DOB. The transcript is then much less exposed under the Safe Harbor method. This is what most solo practitioners we hear from actually do.
- Use a BAA-covered vendor: Nuance DAX Copilot, Abridge, DeepScribe, Suki, Augmedix, and Microsoft Dragon Medical all sign BAAs. Cost is higher but the legal posture is clean.
- Self-host Whisper on-premise: Whisper Large-v3 on a local GPU keeps audio on your machine. Accuracy is lower than Universal-3 on clinical English (Whisper hallucinates more on silence and accents), but the data never leaves.
Takeaway: if your dictation contains identified PHI, use a vendor with a BAA. If you can de-identify at the microphone — and turn off device-level cloud sync — a general transcription product is workable.
AI scribe products vs general transcription as a SOAP source
This is the comparison readers usually want. The honest framing: AI scribes and general transcription solve different parts of the problem.
AI scribes — Abridge, Nuance DAX Copilot, DeepScribe, Suki, Nabla, Sunoh — record the full clinician-patient conversation and use an LLM to generate a structured SOAP note from the dialogue. You don't dictate; you talk to the patient normally. The product writes the note.
- Strengths: zero dictation overhead. Note is drafted from natural conversation. Most ship BAA, EHR integration (Epic, athenahealth, eClinicalWorks), and ICD-10 suggestion.
- Seams: the LLM can hallucinate plausible-sounding clinical content that wasn't said. Review burden shifts from typing to fact-checking. Pricing is $200-600/clinician/month (as of May 2026, varies by EHR and volume).
- Best for: high-volume primary care, urgent care, any setting where conversation is the natural data source.
General transcription with dictation — us, Otter, Rev, Sonix, Descript — captures what you say and returns text. You drive the structure.
- Strengths: deterministic. The transcript reflects what was spoken. No hallucinated assessment. Cheaper ($10-30/month tiers).
- Seams: you have to actually dictate. No EHR-native integration. ICD-10 coding is on you.
- Best for: clinicians who already dictate, or whose specialty doesn't fit conversational scribing (psychiatry notes with mental status exams, procedural notes, PT/OT progress notes).
A useful test: if you currently dictate or want to, general transcription keeps you in control of the record. If you currently type notes after clinic and want them written for you, an AI scribe is the better fit — pay the higher price for the higher leverage. Our clinician workflow page lays out where we fit in that split.
Takeaway: AI scribes replace your typing. Transcription replaces your typist. Pick based on whether you want a draft from a conversation or a draft from your own narration.
Where EHR dictation fits
If your EHR ships native dictation (Epic Haiku, athenaOne mobile, Practice Fusion's built-in capture), use it for non-PHI-sensitive workflows and when it's already BAA-covered in your contract. The text lands directly in the right field, no copy-paste step. The trade-off is accuracy: most native EHR dictation uses an older ASR engine, often without per-user vocabulary tuning, and WER is noticeably higher than current-gen cloud ASR on the same audio.
A hybrid that works: dictate into a strong external engine (Dragon Medical, or general transcription with a vocabulary list), copy the structured text into the EHR's note field, sign. You lose the "one-tap" feel and gain 5-10 percentage points of accuracy. For solo practices without enterprise EHR macro budgets, this is usually the right trade.
What to test before you trust the workflow
Don't evaluate with a clean paragraph someone read off a script. Use the worst audio you'll actually ship in production. Five test notes cover the failure modes:
- A routine follow-up — baseline accuracy.
- A medication change — drug names, doses, frequency.
- A normal exam — long strings of negatives where one dropped "no" matters.
- An abnormal exam with laterality — "right" vs "left" handling.
- A visit with safety language or return precautions — the words that show up in malpractice exhibits.
For each transcript, review these fields first, in this order: medications and doses, allergies, negation, laterality, numeric values, ICD-10 codes, follow-up interval, referral destination, return precautions. Anything that could change patient instructions or liability gets eyes before sign-off.
Then measure edit time per note, not just WER. A two-minute dictation that takes eight minutes to clean is not working. A two-minute dictation that takes one to two minutes to review and paste beats typing for most practices.
What next
A 1-week trial we'd actually run:
- Record three real visits worth of dictation — between-patients style, lapel or phone mic recorded locally (cloud sync off), no patient identifiers. Aim for ~30 minutes total audio.
- Run it through two engines in parallel: a medical-specialized one (Dragon Medical One trial, or Augnito) and a general one with a custom vocabulary list. Our Free plan gives you 30 minutes/month for the general side, exports unlocked (as of May 2026).
- Score the five-note matrix above: WER on drug names, ICD codes, and section headers, plus edit time per note. This takes 30 minutes and tells you more than any vendor benchmark.
- Decide your HIPAA posture before you scale: if your dictation will contain PHI, shortlist only BAA-covered vendors. If you can de-identify at the mic, the field is wider.
If you need BAA today, we're not it yet, and we'd rather say so than waste your trial. Email us if you want to be on the pilot list when it opens.