Twenty years of broadcast, zero search results
A community radio station with 20 years of weekly programming has roughly 1,000 hours of audio sitting on a NAS or a stack of MiniDiscs. None of it is indexed by Google. None of it is quotable in a citation. None of it shows up when a listener searches for the guest who appeared on a Tuesday night talk show in 2014.
Transcribing that archive with AI turns each episode into a searchable, citable web page — and at current rates, the whole job costs less than two months of a part-time intern. The catch: you have to decide on site architecture, speaker-label cleanup, and an editorial review policy before you upload, or you'll ship 1,000 pages that read like soup.
This piece is for station managers, archivists, and volunteer producers at community and college stations weighing whether radio show transcription is worth the engineering lift. Short answer: yes, but pilot one season first.
The archive problem
Most community stations have three kinds of audio sitting unused:
- Reel-to-reel or DAT tapes from before 2005, often un-digitised.
- MP3/WAV files on station servers from 2005–2015, organised by date folder with no metadata beyond a filename like
MorningShow_10-12-2012.mp3. - A current podcast feed from 2015 onwards, where the RSS shows the episode but the audio is opaque to search engines.
Google indexes the RSS title and show notes. It does not listen to audio. A 58-minute interview with a local author, a city councillor, a touring musician — none of it surfaces unless someone wrote a transcript by hand. Almost nobody did.
The result: stations that have hosted thousands of significant local conversations have a public web footprint smaller than a single Substack newsletter.
Cost math: 1,000 hours at AI vs human rates
Human transcription at professional rates runs $1.00–$1.50 per audio minute for clean single-speaker audio, more for multi-speaker broadcast. For 1,000 hours (60,000 minutes), that's $60,000–$90,000. That number ends most archive projects before they start.
AI transcription changes the math from capital expenditure to a monthly operational line. Our Business plan ships 2,500 audio-minutes per month (as of May 2026), and overage packs at $39 top up additional volume.
Rough sketch:
- 60,000 minutes ÷ 5,000/month = 12 months on Business with no overage.
- Buy overage packs and finish in 2–3 months instead.
- Either path lands in the low four figures — roughly 2–4% of the human-transcription quote.
You give up some accuracy — we'll get to that — but for an archive that is currently zero percent indexable, getting to ~92% accurate text on clean studio audio is a step-change, not a compromise.
What ~92% accuracy actually means for radio
We run AssemblyAI Universal-3 in production. On clean podcast or studio-radio English at 16 kHz or higher, that's around 7.88% WER. On telephone call-ins at 8 kHz, it jumps to ~17.7% WER. Most community radio sits between the two — studio mics for the host are clean, phone callers and remote guests over Skype/Zoom are noisy.
Practical implications for a typical talk-radio hour:
- Host monologue and in-studio guest dialogue: very usable straight out of the model.
- Phone-in callers: names, place names, and acronyms will need spot-fixing.
- Music beds under spoken dialogue: the model occasionally drops words or hallucinates lyric fragments. Tag music-only sections rather than transcribe them.
- Station IDs, jingles, and underwriting reads: transcribed verbatim — useful for search, ugly in reading view.
Be straight with your audience about this: do not market AI-transcribed archives as verbatim certified text. Market them as searchable. The distinction matters if you have legal or journalistic use cases.
Speaker labels at radio fidelity
Diarization — knowing who said what — is the second hard problem. Radio is friendlier than a six-person podcast because the host structure is consistent: one or two regular hosts, one to three guests per segment.
How our diarization behaves:
- Stereo recordings with host on one channel and guests on another (common in older broadcast setups): channel-split diarization, effectively perfect.
- Mono recordings (most archive files): pyannote-3.1, good for 2–4 distinct voices, degrades past 6.
- Phone callers mixed into the studio feed: usually caught as a separate speaker, but call-in shows with rapid caller rotation will produce Speaker 4, Speaker 5, Speaker 6 labels that all need a human pass to rename.
Diarization does not know real names. It separates Speaker A from Speaker B; it cannot know Speaker B is "Professor Maria Alvarez" unless the host says so on air or you supply the metadata. For a one-host, one-guest interview show, this resolves in two minutes per episode. For a four-person panel with callers, expect 5–10 minutes of relabelling.
The SEO win: each episode is a 9,000-word page
A 60-minute interview at normal conversational pace transcribes to roughly 8,000–10,000 words. That's a long-form indexable page per episode — without anyone writing a word.
What it ranks for:
- Guest names. A local author who appeared on your show in 2017 likely has thin web presence beyond their book listing. Your transcript page becomes the primary source for what they actually said.
- Topics discussed. Long-tail queries like "community land trust Asheville 2019" surface transcripts that mention the phrase.
- Quoted phrases. When a listener half-remembers a quote, Google's exact-phrase match pulls them to your page.
- Local-news context. City council members, school board candidates, mutual-aid organisers — your archive is often the only on-record audio of their public statements.
Google's entity-recognition mapping does the rest. The crawler reads the transcript and learns who appeared with whom, on what topic, in what year. After 200 episodes you have 200 long pages of unique content, internally linked by guest, topic, and date. That's a content moat a marketing team would spend $100k building from scratch.
One price, every language
Community radio often broadcasts in two or three languages to serve local populations — Spanish public-affairs programming alongside English news, Hmong elder interviews, Haitian Creole call-in shows. We transcribe 99 languages at the same price, with automatic language detection per file. You can batch a mixed-language archive folder without pre-sorting, and the non-English transcripts capture organic search traffic from non-English queries in your local market.
Podcast vs radio transcription — a real difference
The workflows look similar but the inputs differ:
- Podcasts are usually clean stereo or multi-track, recorded over Riverside or Zencastr, mastered at -16 LUFS. AI transcription gets near-best-case accuracy.
- Radio archives are mono airchecks, broadcast-compressed, often analog-digital transfers with tape hiss or AM-band artifacts. Accuracy drops 2–5 percentage points.
- Radio also has structural noise podcasts don't: traffic reports, weather, station IDs, news network top-of-hour cutaways, FCC-required underwriting reads. These transcribe correctly but clutter the page.
One thing AI transcription does not solve: rights. If the archive includes syndicated NPR feeds, commercial music, or third-party recordings, confirm publication rights before putting full transcript text on the open web. The audio's licence does not automatically cover a published text version.
For the upload pipeline, our audio-to-text flow handles WAV, MP3, FLAC, and most legacy formats. For the editorial workflow — naming speakers, fixing proper nouns, marking music segments — the podcast transcription use case is the closest match, even though the source is broadcast.
Site architecture: per-episode pages plus index pages
Per-episode pages. Always per-episode for an archive of this scale.
Reasons:
- Google rewards topical pages. One mega-page with 1,000 episodes glued together cannibalises every keyword and breaks mobile browsers.
- Internal linking. A per-episode page lets you connect "guest X also appeared on Show Y in 2019" — that builds the internal graph search engines reward.
- User experience. Nobody scrolls a 9-million-word page.
Recommended URL pattern:
yourstation.org/shows/[show-slug]/[YYYY-MM-DD]-[episode-slug]
On each episode page, ship:
- Episode title, show name, original air date.
- Embedded audio player synced to the transcript via timestamp anchors (semantic HTML
<time>tags help Google associate text to audio position). - Speaker-labelled transcript with a fixed "Host: [Name]" mapping at the top.
- A 100-word editorial summary that names guest and primary topic — this is where you capture the entity.
- Internal links to other episodes featuring the same guest or topic.
Then build index pages — by show, by year, by guest, by topic. These are navigation surfaces and internal-link distributors. A guest index page that lists every appearance of one person passes authority to each individual episode page. A topic page connects interviews across years.
If the same episode also ships as a podcast on Apple/Spotify and a YouTube upload, set canonical URLs deliberately. The station's transcript page should be canonical — that's where the indexable text lives. Don't let three versions of the same episode fight each other in search.
Editorial workflow: tiered, not all-or-nothing
You will not hand-edit 60,000 minutes of transcript before publishing. That defeats the project.
A workable tier:
- Deep archive (5+ years old): publish the raw AI output with a visible disclaimer that the text was machine-generated and may contain errors in phone segments or music. Add an email for listeners to flag specific errors.
- Recent or flagship episodes: assign staff or an intern to spot-fix proper nouns, caller names, and the 8 kHz phone segments. Budget 5–10 minutes per episode.
- Legally sensitive episodes (election coverage, named accusations, public-meeting transcripts): consider a human transcription service like Rev.com for the segments that matter. AI plus human hybrid is a sensible approach for archives that mix routine programming with high-stakes content.
This keeps the volume moving while reserving human time for content that earns it.
What we don't ship
- We don't run a CMS. The transcript export (JSON, SRT, VTT, plain text, DOCX) comes out of our tool; you load it into WordPress, Ghost, Hugo, or whatever the station runs.
- We don't auto-generate guest bio blocks. The model can summarise, but accurate biography for a local guest needs a human pass.
- We don't ship a music-vs-speech segmenter as a first-class feature yet. You can manually tag music-heavy episodes for partial transcription.
- We don't currently offer a HIPAA BAA — irrelevant for broadcast archives, but worth saying because some stations carry health programming with caller identifiers.
For comparison: Otter.ai handles speaker labels well for meetings but isn't optimised for bulk archive ingestion. Rev.com offers human transcription if accuracy matters more than budget — useful for flagship episodes. Descript bundles transcription with audio editing, which can be useful if you're also remastering archival audio.
What next
Don't process the full archive in week one. Pilot first.
- Pick one show and inventory it: episode count, total minutes, file formats, whether sources are studio, phone, or off-air aircheck.
- Choose 10–20 representative pilot files that span the real mix — clean studio interview, remote guest by phone, call-in segment, music-heavy episode, panel with 4+ voices, the oldest file in the archive, the single most historically important episode.
- Run them through a 60-minute Free plan upload and measure: average transcript quality by source type, staff cleanup minutes per episode, which page template reads best.
- Publish three pilot episodes as per-episode pages with full transcripts and check what they rank for after 4–6 weeks of indexing. If guest names and topic queries land, scale to the full archive on Pro or Business.
The archive has been sitting there for 20 years. Spending six weeks on a pilot before the full ingest is cheap insurance.