Recruiter intake call transcription for ATS notes and candidate scorecards
If you run 6-10 intake and screen calls a day, the bottleneck isn't the talking — it's the 15 minutes after each call where you reconstruct what the hiring manager actually said about must-haves versus nice-to-haves, and whether the candidate's answer on the comp question was "flexible" or "firm at $185k". AI transcription turns that 15 minutes into about 5, and gives you a searchable record the whole pod can audit. It does not replace the scorecard — the human still owns the evaluation. It feeds it.
We transcribe a steady stream of recruiter calls from Zoom, Google Meet, and Teams on AssemblyAI Universal-3. Here's what changes for your workflow when the transcript is sitting next to the ATS tab.
The recruiter's day — three calls, three problems
A typical full-cycle recruiter runs three flavors of call:
- Discovery / intake with the hiring manager: 30-60 minutes, dense with role context, comp band, deal-breakers, prior failed hires. Almost never recorded. The intake doc lives in the recruiter's head until they write it up that night.
- Phone or video screen with the candidate: 20-45 minutes, structured-ish. Notes get pasted into Greenhouse, Lever, Ashby or Workday. Quotes get paraphrased. Tone gets lost.
- Debrief with the interview panel: 15-30 minutes, often rushed. Disagreements that should be on the record get smoothed into a thumbs-up.
The nuance matters more than the headline. A hiring manager who says "we need someone senior" three times and then describes a Series A IC role is telling you something the title field won't capture. Without a transcript, that signal evaporates by Friday.
There's a second-order benefit on the screen call: you stop typing. You look at the candidate, ask the question, listen, and let the engine handle the stenography. That's a better interview, not just a better record.
Fixing the intake first
The intake is the most expensive call to get wrong. If you misread the requirements here, you spend three weeks sourcing the wrong shape of candidate. Hiring managers talk in stream-of-consciousness — listing ten skills, then dropping the casual aside that only three actually matter.
A transcript lets you do one thing that handwritten notes never could: send it back to the manager and ask them to mark the non-negotiables in the text itself. Now the calibration is on paper, in their words, before you've opened LinkedIn. It also kills the argument three weeks later when they reject your whole slate for missing a skill that was never mentioned out loud.
What the ATS actually wants
Greenhouse, Lever, Ashby and Workday all expose two surfaces per candidate: structured fields (stage, source, comp expectation, location, work authorization) and free-text notes (the scorecard, recruiter notes, attachments).
The structured fields are easy. You can fill those from memory or a 30-second skim of the transcript. The free-text notes are where AI transcription earns its keep — because the scorecard rubric usually asks for evidence, not opinion.
A bad ATS note says: "Good communicator, knows Python." A good one says: "Built a Python service that cut p95 latency by 40 ms — transcript 14:22." The second one survives a debrief; the first one collapses the moment anyone asks why.
The accuracy you actually get
We run AssemblyAI Universal-3 as the primary model, with Whisper Large-v3 as a transient-error fallback. On clean Zoom or Meet audio with both speakers on decent headsets, expect roughly 92% accuracy (WER ~7.88%) on 16 kHz English. On dial-in legs where someone joined by phone, it drops to about 82% (WER ~17.7%) on 8 kHz telephony.
In practice that means:
- Names get mangled. "Kubernetes" survives; "Aoife" and "Soren" often don't. Fix names once at the top of the transcript and search-replace.
- Numbers are usually reliable — except when speakers overlap on the comp question. If two people are talking over each other when "$185k" gets said, verify the figure against the audio before pasting it.
- Tone doesn't survive. Sarcasm, long pauses, audible discomfort — the text is flat. Note these manually in the moment if they matter.
- Speaker labels work well for two-person calls. Stereo recordings (separate channel per participant — Zoom's "record a separate audio file" setting) give perfect diarization. Mono falls back to pyannote-3.1, which is solid for 2-4 speakers and degrades past six.
For panel debriefs with five or more people on one mic, expect manual cleanup on who-said-what. The biggest single accuracy fix is behavioral: ask the panel not to talk over each other. That helps the transcript more than any model change.
The bias audit nobody runs but should
Here's the underrated reason to transcribe screens: the same transcript, read cold a day later, often disagrees with the scorecard you wrote five minutes after the call.
You rated the candidate 3/5 on "structured thinking" because they paused a lot. Reading the transcript, the pauses are gone — what's left is three crisp, well-organized answers. Or the opposite: you rated them 4/5 because they were warm and articulate, and the transcript shows they never actually answered the system-design question.
For recruiting ops, this scales into a weekly calibration. Pull five to ten transcripts from the same role and ask four questions:
- Are interviewers asking the same core questions?
- Are scores tied to evidence from the conversation, or to credentials and vibes?
- Is the language symmetric? "Career break" treated as risk on one candidate but "ambitious pivot" on another is the kind of drift that only shows up side-by-side.
- Are concerns written as follow-up questions or as unsupported conclusions?
We're not making a fairness claim about the transcript itself — the model has its own gaps, and accented English on 8 kHz audio is the worst case. But a written record reviewed twice beats a memory reviewed once.
One discipline that pairs with this: keep ATS notes job-related. Protected-class details, medical status, family situation, religion, politics, casual personal observations — none of it belongs in the note unless your legal team has a specific reason and process. If it doesn't help evaluate against the rubric, leave it out.
Consent and recording disclosure
The rules vary, and you have to do your homework.
United States: federal law is one-party consent, but eleven states require all-party consent (California, Florida, Illinois, Maryland, Massachusetts, Montana, Nevada, New Hampshire, Pennsylvania, Washington — and Connecticut for in-person). If the candidate is in any of those, you need their explicit consent on the record, not just a calendar footer.
EU / UK: GDPR treats the recording as personal data. You need a lawful basis, a retention policy, and a way for the candidate to request deletion. UK's ICO expects proportionality — don't hoard audio for two years if the transcript is enough.
A disclosure that works: one sentence at the start. "I'm recording this so I can write accurate notes for the hiring manager — is that okay with you?" Wait for the yes. The yes is now on the recording. If they say no, you don't record. That's the protocol.
For meeting bots that join Zoom, Meet, or Teams, our bot — Recall.ai under the hood — posts a consent message in chat on join and shows up in the participant list under whatever name you configure ("Recruiting Notetaker" is a clearer choice than something cute). There's also an opt-out endpoint at /opt-out/{token} if a candidate wants their data purged after the fact.
One limit worth naming: we do HIPAA-grade data handling at rest, but we are not a BAA-covered product. That matters less for general recruiting than for clinical work — but if you're screening candidates for clinical roles where PHI might come up, don't use us for those specific calls until we ship the BAA.
The workflow — Zoom recording to ATS note in 5 minutes
This is the version we see working for full-cycle recruiters running 30+ calls a week:
- Record on Zoom or Meet. Use cloud recording with "record separate audio files for each participant" turned on. That gets you stereo, which gets you perfect speaker separation.
- Upload the file or drop the meeting URL. A 45-minute Zoom call processes in roughly 3-4 minutes. You'll get a timestamped, speaker-labeled transcript.
- Skim with Ctrl-F. Search the words on your scorecard rubric — "ownership", "scope", "comp", "notice period", "remote". Each hit is a pre-built quote.
- Write the ATS note. Two paragraphs of narrative, three to five direct quotes with timestamps, structured-field updates. About 5 minutes.
- Attach or link the transcript in Greenhouse / Lever / Ashby. Now the panel can read it before the debrief.
What we don't ship: automatic posting into your ATS. There's no native Greenhouse or Lever connector yet — if you want it automated, you wire a webhook into the ATS API yourself. Most teams just copy-paste, which takes less time than configuring the integration.
We also don't do live in-meeting captions. We transcribe the recording after the call finishes. If you need realtime captions during the call, Otter.ai and the native Zoom/Meet captions are better fits.
Where each tool earns its slot
- Otter.ai: strong on live captions and collaborative in-meeting highlights. Works well if you live inside Otter's UI. The seams show when you want a clean export into a different system or non-English audio. See our Otter alternatives comparison for the detail.
- Rev.com: human transcription is the gold standard for hard audio (heavy accents, bad mics) or any record you might need to defend formally. Slower and more expensive — usually overkill for a screen you'll skim once.
- Fireflies / Fathom / Gong: built around the sales-call workflow with CRM logging baked in. Useful if recruiting is run like a sales pipeline. Less useful if you want raw transcripts you control.
- Us: best fit when you want the audio-to-text pipeline cheap, multilingual (99 languages, one price), and explicitly not a black box. You own the transcript, the diarization, the export. Our interview transcription workflow maps closely to the recruiting operating model.
A 1-week pilot on screen calls
Don't roll this out to the whole team. Pilot it on yourself.
- Pick one week. Record every candidate screen — with consent — on Zoom cloud, stereo.
- Transcribe each one. The Free plan covers 30 minutes a month; for a real pilot you want the Pro plan at 600 audio-minutes (as of May 2026).
- Write your ATS notes from the transcript instead of from memory. Time yourself.
- At the end of the week, re-read three transcripts cold and compare to your scorecard. Note any rating you'd revise.
If the time-per-note drops by 5+ minutes and you revise even one scorecard, the math works. If it doesn't, you've lost a week and learned something specific about your process.
What next
- Upload one real 30-minute screen recording on the Free plan and check the diarization on your actual audio.
- Send your next intake transcript back to the hiring manager and ask them to mark the non-negotiables in the text.
- If you're in a two-party-consent state or running EU candidates, write the one-sentence consent script before your next call. That's the part that doesn't get faster with AI.