Open archive of 240,000 hours’ worth of talk radio

A group of MIT Media Lab researchers have published Radiotalk, a massive collection of the written texts of talk radio audio with machine-generated transcriptions culminating in a total of 240,000 hours’ worth of speech, marked up with machine-readable metadata.
 
The audio was taken from streaming radio services between Oct 2018 and Mar 2019, and the transcripts run to 2.8 billion words sampled from almost 300 stations across the US.