Audio Intelligence with Raoul Wedel –
– A monthly update on the state of all things AI and Audio
A technological revolution is unfolding, and according to Google CEO Sundar Pichai, AI is “more important than fire or electricity.”
Open AI CEO Sam Altman believes it has the potential to cure diseases, address climate change, and provide private education to everyone worldwide. At the core of this revolution is audio technology, which is set to transform the broadcast radio and podcasting industries and redefine the way professionals work in these fields.
In the late 80s, the biggest innovation for broadcast radio in history was the introduction of the first radio automation systems, capable of running 32 kHz audio files stored on a 20 MB hard drive. Since then, tech giants like Google, Microsoft, and Facebook have dabbled in audio technology, but none have truly succeeded in dominating the space. Apple, however, has remained active in the music industry through the iPod, iTunes, and Apple Music. Meanwhile, other media and technology companies like Spotify, Sirius, and Pandora have flourished.
The year 2023 has been a turning point, witnessing an unprecedented acceleration in audio technology innovations that will enhance the broadcast radio and podcasting experience. Industry giants like Facebook, Google, OpenAI, and Microsoft are investing heavily in the development of advanced audio technologies, driving rapid advancements in the field. The ultimate goal is to achieve multi-modality systems capable of processing text, video, and audio input and output, which will revolutionise content production and distribution for radio and podcasting.
Last year, OpenAI unveiled Whisper, a groundbreaking speech-to-text system that can recognize speech despite background noise and support various accents. In 2023, Google introduced AI Soundstorm, a powerful technology capable of speech-to-text and text-to-speech conversions in 1.100 languages. These open-source technologies can be used to transcribe radio broadcasts and podcasts, making them more accessible and searchable for listeners. The foundation for AI voice technology was laid by (Google) Deepmind’s WaveNet, which has inspired many current AI voice technologies like Respeecher, Resemble.AI, and Elevenlabs. These advancements in AI voice technology could enhance the quality of radio broadcasts and podcasts, as well as introduce new synthesized voices for various applications.
The impact of these technologies on professionals working in the broadcast radio and podcasting industries will be multifaceted. For content creators, the ability to clone voices and generate AI-driven sound effects will enable them to develop richer, more immersive audio experiences. Radio hosts and podcasters can create diverse, engaging content without needing to rely on costly equipment or a large team of collaborators.
Journalists and reporters will be able to take advantage of speech-to-text technologies to rapidly transcribe interviews and other audio content, allowing for easier archiving, referencing, and sharing of information. In addition, AI-generated translations will enable content to reach new audiences, breaking language barriers and promoting cross-cultural exchange.
For radio stations and podcast networks, AI-driven audio editing and source separation tools will streamline the production process and reduce the need for manual intervention. This will allow producers to focus on higher-level creative tasks, such as storytelling and audience engagement. Furthermore, AI-generated music and soundscapes will provide endless options for background scores, eliminating the need to license tracks from external sources and reducing production costs.
Accessibility will also be greatly improved, as AI-powered transcription and translation services make radio broadcasts and podcasts available to a wider audience, including those with hearing impairments and non-native speakers. This will lead to more inclusive and diverse content, fostering a global community of listeners and content creators.
However, the rapid acceleration of AI and audio technology in 2023 also raises concerns regarding job displacement and the potential loss of human touch in content creation. Radio and podcast professionals will need to adapt to these new technologies and focus on developing skills that complement AI-driven tools, such as storytelling, audience engagement, and creative problem-solving.
“Professionals in these fields must adapt and evolve…”
Like it or not, the rapid acceleration of AI and audio technology in 2023 is reshaping the broadcast radio and podcasting industries, bringing forth innovative content and enhanced listening experiences for audiences worldwide. As the race for dominance in this new frontier intensifies, the potential for transformation and growth in these industries is immense. Professionals in these fields must adapt and evolve, harnessing the power of AI to create more engaging, immersive, and accessible content for listeners around the globe.
Main Photo: Maurizio Pesce from Milan, Italy
About the Author
With a career in the radio industry spanning more than 30 years, Raoul Wedel is CEO of Wedel Software, a leading international provider of broadcast software solutions. In 2021 he launched the Adthos Ad Platform, bringing broadcast-quality AI and synthetic voice technology to the audio advertising industry for the first time. The platform continues to deliver more market firsts, including the option of creating 100% AI-generated audio ads.
I could write a thesis based on many of the topics raised and stimulated by this article.
The price of digitising, storing and networking data, particularly audio and video editing has come down significantly.
In the early 1980s, the Fairlight "computer" costing $50000 could sample sounds, record and notate music. In the mid-to-late 1980s, the Commodore Amiga could achieve the same for $1300.
At the same time in the late 1980s, SoundBlaster soundcards could be installed on a PC's pci bus and later on a usb cable. Sound could be recorded, stored and edited.
Similarly, in the early 1990s, the ABC develops the D-Cart system of recording, editing, storing and networking audio. The D-Cart system was adopted by broadcasters including the BBC and CBC.
During the 1990s, as costs of making and selling editing, storing and networking audio fell, broadcasters were using other systems such as AudioVault and ProTools, to name a few.
A summary is that barriers to entry in the manufacturing and selling digital audio equipment fall. There is a short time for the pioneer to enjoy a monopoly on the technology unless the pioneer continues to innovate and adding value.
A similar story applies for the graphics machine used in the production of TV graphics the Quantel Paintbox which cost $250000.
Newer entrants including editing Adobe Photoshop with more powerful and cheaper PCs with GPUs as well as Macs and Androids made systems like the Paintbox redundant.
Similarly the RodeCaster empowers more people to make audio and video productions for broadcast and podcast purposes for less than $1000.
As a result, more people are empowered to use and apply digital technology.
Thus the production of audio and video content is no longer an esoteric and exclusive club.
We can also see thanks to Google that people are empowered to make their own AI programs through open sourced software.
It's generating a reality where more teenage moguls or those moguls who are young at heart or those reinventing themselves to contribute something to the world.
There will be more Bill and Wilhelmina Gates in the world.
When you think about it, with more powerful and faster CPUs and GPUs performing analysis of audio, video, medical research, weather forecasting, AI is all about "number crunching" in the same way as number crunching has been used for less-intensive calculations in engineering, science and accounting for decades.
It is exciting
Thank you
Anthony, I'm so excited and I just can't hide it, Strathfield South, in the land of the Wangal and Darug Peoples of the Eora Nation.