Getting a usable Microsoft Teams transcript from your meeting recording
To get a Microsoft Teams transcript outside of Teams itself, download the meeting's MP4 recording from OneDrive or SharePoint and upload it to a transcription service. You get speaker-labelled text in 9-11 minutes for a 60-minute file, exportable as DOCX, SRT, VTT, TXT, or JSON. What you lose: chat messages, reactions, and any text shown only on screen share — those live outside the audio track and need to be grabbed separately.
Where Teams puts the recording and how to grab it
When a meeting is recorded, Teams saves an MP4 to the organiser's OneDrive (for non-channel meetings) or to the channel's SharePoint folder (for channel meetings). The file appears in the meeting chat as a thumbnail within a few minutes of ending the call. Right-click → Download gives you the raw MP4.
That MP4 is what you upload. No need to extract audio first — video file transcription accepts MP4, MOV, MKV, WEBM, and AVI directly, up to 2 GB on Pro and 5 GB on Business. Files larger than that (a 3-hour all-hands at 1080p) are best re-encoded to 720p or audio-only M4A before upload.
If your org has SharePoint sharing restrictions and you can't download the file, paste the share link instead — yt-dlp pulls from SharePoint-hosted media the same way it pulls from 1,500+ other sources.
What transcribes vs what's lost
The transcript captures everything in the audio track. That covers spoken dialogue from every participant, plus audio from any screen-shared video. It does not capture:
- Chat messages sent in the meeting sidebar — those live in the Teams chat panel, not the recording.
- Reactions (the hand-raise, the thumbs-up bursts) — visual only.
- Text on screen share — slides, dashboards, code. If nobody reads the text aloud, it doesn't enter the transcript.
- Polls and whiteboard content — separate data sources inside Teams.
For a complete record, you need two exports. Export the chat from Teams (channel meetings: open the channel → ... → Open in SharePoint → meeting folder; 1:1 or group meetings: copy-paste from the chat panel — there's no native bulk export). Then merge it with the audio transcript by timestamp.
How diarization handles Teams' single-track audio
Teams recordings are mono — every participant mixed into one channel. That matters because it determines which diarization method runs.
Diarization (the process of labelling who spoke when) splits a single-channel recording into per-speaker turns using a neural model. On stereo files, Transcription.Solutions uses a channel split (left = speaker 0, right = speaker 1, 100% confidence). On Teams' mono mix, it routes to pyannote/speaker-diarization-3.1, which segments by voice characteristics rather than channels.
This works well for 2-5 distinct voices. Beyond 6 speakers, or when two people sound similar (two men in the same age range, same accent), expect occasional swaps that need a manual fix. You can rename and re-assign speakers by clicking the coloured chip in the transcript view — the popover opens with rename, filter, copy, and jump-to-first-turn actions.
A practical note: if your Teams meeting has 12 participants but only 4 actually talk, diarization labels 4 speakers, not 12. The model segments by voice, not by Teams' participant list.
Accuracy vs Teams' built-in transcript
Teams has its own live transcription feature, enabled per-meeting. It's free and fast. The trade-offs:
| Teams built-in | Transcription.Solutions | |
|---|---|---|
| Languages | 41 (live transcription) | 99 |
| Cost | Included with M365 | $19/mo Pro, $0.04/min Pro / $0.02/min Business overage |
| Diarization | By Teams account identity | By voice (anonymous) |
| Export formats | DOCX, VTT | TXT, SRT, VTT, DOCX, MD, JSON, PDF |
| Accuracy on real-world meeting audio | No published figure | ~88% (11.46% WER on AAI meeting benchmark) |
| Works on uploaded files | No — live only | Yes |
Teams' built-in transcript is convenient when you remembered to turn it on. It struggles with the same things every meeting ASR struggles with: cross-talk, heavily accented English, and second-language speakers. The accuracy gap on cleaner audio is small. The real differences are language coverage, the ability to re-process a recording months later, and SRT/VTT export for video editing or YouTube uploads.
When to use which
Use Teams' built-in transcript if: the meeting is in English (or one of Teams' supported languages), you remembered to enable it, and you only need the text inside Teams.
Upload the MP4 to a separate service if: the meeting language isn't in Teams' list, you forgot to turn live transcription on, you need SRT/VTT for a recap video on YouTube, you need JSON for downstream tooling, or you want a transcript you can clean and share without giving access to the Teams meeting itself.
For recurring workflows — a weekly standup or a sales call review pipeline — drop the MP4 to the REST API with a webhook callback. The transcript and AI summary arrive in your system in about 6× realtime, no manual upload step.
Privacy and retention
Uploaded files are permanently deleted from infrastructure within 24 hours of job completion. Transcripts stay in your account until you delete them. We do not train models on your data. This matters for Teams recordings because they often contain client names, internal financials, and HR conversations that your IT policy treats as restricted — the 24h source deletion lets you keep audit logs of the transcript without keeping a second copy of the raw audio.
FAQ
Can I transcribe a Teams meeting without recording it?
No — there has to be a recording to transcribe. Teams' own live transcription is the only way to capture a meeting without saving an MP4. For any external service, including this one, you upload the recording file (or paste the SharePoint/OneDrive share link). If the meeting wasn't recorded, the audio is gone. Enable recording at the start of the call as a default habit.
Does the transcript include the meeting chat?
No. The chat panel is a separate data stream from the audio track in the MP4. The transcript covers spoken words only. To preserve the chat, export it from Teams separately — for channel meetings, the chat is accessible via SharePoint; for ad-hoc meetings, you currently have to copy-paste from the chat panel. Merge by timestamp afterwards if you need a unified record.
Will it identify speakers by their Teams account names?
No — the output uses anonymous labels (speaker_0, speaker_1, etc.) because the audio file doesn't carry Teams identity metadata. You rename them in the transcript view by clicking the coloured chip and typing the real name. The rename propagates to every turn. For recurring participants, this takes about 30 seconds per meeting.
What's the maximum Teams recording length I can upload?
10 hours per file on Pro and Business. File size caps: 2 GB on Pro, 5 GB on Business. A 1080p Teams recording at default settings runs around 200-400 MB per hour, so the size cap usually hits before the duration cap. If you have a 5-hour all-hands that's over the size limit, re-encode to 720p or extract audio-only M4A before uploading — the transcript is identical.
Does it work with Teams Live Events or webinar recordings?
Yes. Live Events and Teams webinar recordings download as standard MP4 files from the Stream/SharePoint location and process the same way. Diarization works on the presenter audio. Q&A submitted via the panel is a separate data stream (like chat) and isn't in the audio track.
Can I transcribe a Teams meeting in a language Teams doesn't support live?
Yes — this is one of the main reasons to use an external service. 99 languages are auto-detected from the first 30 seconds of the file, with manual override for mixed-language meetings. A Polish-Ukrainian sales call that Teams can't transcribe live will process here at one flat per-minute rate, same as English.
How long does a typical Teams meeting take to transcribe?
Approximately 6× faster than realtime — a 60-minute meeting completes in 9-11 minutes. A 30-minute standup is done in about 5 minutes. The bottleneck is usually the OneDrive download, not the transcription itself. On Pro you can run 20 concurrent jobs, so a backlog of last week's meetings clears in roughly the length of the longest single file.
Related reading
- Interview transcription workflow — speaker diarization, renaming, and exporting clean quotes
- Podcast transcription and show notes — what the AI summary covers and where it falls short
- Video to text export formats — SRT, VTT, and how subtitles work outside Teams
- REST API reference — automating the upload-and-transcribe loop for recurring meetings