Taking control of news on smart speakers

Steve Ahern examines the issues around disaggregating radio news content for delivery over smart speakers.


How we connect to news and information is changing.

Smart speakers are not game changers in themselves, but when combined with the power of search, voice recognition and artificial intelligence, we are facing another pivot point for content delivery.

It will require radio companies, especially radio newsrooms, to rethink how they are delivering their content. Even for those already playing in this sandbox, there is a long way further to go, and there are several possible paths to take.

Google News has been criticised for its plans to segment audio from radio companies and deliver it in a disaggregated manner, tailored to how the listener wants to hear it.

While the way Google is going about implementing this new development is (rightly in my opinion) not acceptable to established radio businesses, who see it as another way of Google ripping off their content without returning them a profit, there is no doubt that the principles behind the new delivery methodology need serious study by managers, programmers and news editors so they know what is coming next.

Google’s method may not be acceptable to the radio industry, but the thinking behind it is the way of the future, so radio businesses should understand it and find their own ways to implement this new thinking.


The Google Assistant is growing rapidly, there are 4 times as many active daily users as there were a year ago, according to Steve Henn from Google USA, at the recent Radiodays Europe conference. Amazon Alexa is also growing fast and developing new innovations. Spotify is moving into podcasts and tailored content.

Google, Alexa, Spotify, TuneIn and others are conduits to delivering your content to smart speakers.

A smart speaker needs an engine to drive content into it, it doesn’t just happen. You will know this if you have ever connected up a smart speaker, you can’t just turn it on and expect it to work properly, you must download an app and set it up through your phone.

The problem is, of course, that if your content is delivered through these gateways, they have control, not you.

In Australia, we are ahead of the game thanks to CRA’s RadioApp and, to a lesser extent, the ABC Listen app, which have been developed for compatibility with smart speaker content delivery and artificial intelligence systems. Whooshkaa has also just launched new smart speaker delivery skills.




Let’s get back to the criticism of Google News for a moment.

One important thing to remember is that Google’s technology can achieve what it wants to do with or without the cooperation of radio companies. Google can already ingest radio news bulletin audio (or any other content for that matter) and use speech to text recognition to segment the stories, index the key words and deliver the audio, synched to the text, direct to someone’s smart speaker.

If an audio stream is available, there is no technological way to prevent this happening. That is why the BBC is playing hardball with Google and the Australian commercial radio industry is looking at other ways of enforcing its rights to control its owned content, through laws and copyright actions. Whether these actions will work or not remains to be seen.

We are in the middle of a transition in the way news stories are delivered. It’s not just the radio industry, newspapers are facing the same technological changes and are excitedly embracing smart news audio, which wasn’t available to their text based medium in the past.

At Radiodays, there were several examples of newspapers using simple text to speech engines to provide radio news bulletins as flash briefings to smart speakers, but what was more interesting was the newspapers who are deploying a mixture of radio-trained content makers combined with artificial intelligence and speech engines to deliver hybrid human/robot tailored bulletins and documentaries.

This new tsunami of digital innovation is about to come crashing to the shores of the radio industry, and we better understand it… fast.

It’s about much more than smart speakers.


As more people develop the habit of asking their smart speakers, apps or car dashboards for the latest news, they will expect personalised bulletins that are as long as their commute to work and hyper-targeted to their interests. A three or five minute networked bulletin may no longer suffice for these early adopters. We will need to track what they do and learn about their requirements, then figure out the SSML commands that will be needed to deliver the content to them.

SSML stands for Speech Synthesis Markup Language… hypertext for smart speaker audio.

There’s more than one way to skin a cat. Google’s method is not the only way to segment news content, speech recognition is not always reliable and it is a convoluted way to achieve the aim.

Since we already own our content and create and store it in our newsroom systems, there are other, potentially better ways to do what Google wants to do from within our own newsrooms.

If we understand these new developments we will be able to keep control and deliver content in a personalised way for our audiences.

Aggregated metadata and object based media encoding will be some of the pieces in this puzzle, allowing the embedding of complimentary data in the audio recording of the news bulletin. What… wait… how will that work?

Here’s how.

A newsreader goes into a studio now, turns on the news computer screen, turns on the mic, reads the words on the screen and plays the audio grabs from the news system through a fader. Multiple functions with multiple sources.

What if all these functions from different pieces of equipment (news computer, mixing desk, microphone, audio replay software) were all tracked and the associated data captured within the audio recording so that each individual element was controllable in many different ways. If we understand what is happening under the hood, we will be able to retain the power of this new technology in the hands of our radio businesses, rather than letting our content be mined by others who currently understand the potential of the technology better than us.

Imagine this. A news bulletin is read and recorded with metadata and object based media included. My newsroom system’s workflow is programmed and knows what to do when an audio grab is played to enable flexible delivery to the listener at home. The news editor tags stories with appropriate subjects (just as they do now), and includes cross reference links to associated content and external sources.

After the bulletin is read on air, the combined data is saved with the audio and made ready for voice interaction from smart speaker users at home.

At home, I wake up and say ‘Alexa good morning.’

I have previously programmed Alexa to give me time, weather and traffic, then play me my preferred station’s news bulletin based on my topics of interest.

Alexa pulls down the previous hour’s news bulletin.

Theme, first story, let’s say it is about the federal budget.

Then Alexa stops and asks me, do you want to hear what the treasurer said? If yes, it plays the audio grab, if no, it asks me some more questions.

Do you want to hear Alan Jones’ commentary on the budget? Do you want to listen to the ABC Radio PM budget analysis special from last night? Do you never want to hear about the budget again?

I’m a news junkie, so now I’m ready to go down the rabbit hole and listen to commentary and analysis for the next few minutes. I may return to the rest of the bulletin later, or I may not.

At each decision point the station can insert an appropriate advertisement, but please, not just a standard 30 second advert from the schedule, tailor it to the new environment.

There are various ways to make a scenario like this happen. It can be done by a third party externally using station audio feeds, speech recognition and search aggregation, or it can be done by smart systems within our radio stations which draw on the existing content and the power of our news computers and logging systems to deliver the segmented content that can power a voice recognition enabled AI engine to deliver the content to the listener.

The technology is out there now and others are experimenting with it.

What are we, in the radio industry doing?

Are we experimenting too, so that we can develop and own the new ways to deliver our content? Or are we sitting on our hands wondering what is happening.

I would rather be on the front foot. I would rather be asking the provider of my news software system what they are doing to put the power of AI, voice recognition, search and content disaggregation into my hands, so I can use and monetise it to my advantage.

I would also want to be training smart speakers to respond to real human speech patterns, not to try and train people to ask for things in a way that pleases the robots… but that will be the subject of another article. For now, let’s just take back control of our news content by understanding what’s going on, strategising what we can do with it, and working with our equipment suppliers to keep control of our valuable content.

By the way, this is not just relevant for radio, television news is facing exactly the same issues, and should be strategising about them in the same way.

Every new news system that is installed should have the potential to do this. Every old system should be working on upgrades ready for the day when you say to your supplier ‘I want to do this.’

It is an important topic that we will continue to explore here on radioinfo as our industry works out the best ways forward.



About the Author

Steve is the founding editor of this website.

He is a former broadcaster, programmer, senior executive and trainer who now runs his own company Ahern Media & Training Pty Ltd.

He is a regular writer and speaker about trends in media. More info here.





Main image: Shutterstock.



Tags: | | | |