OpenAI targets customer service with new audio models


OpenAI introduced a new suite of audio models that power voice agents in specific enterprise settings, such as customer service.

The models include speech-to-text and text-to-speech audio models in OpenAI’s Realtime API.

The AI vendor also introduced gpt-4o-transcribe and gpt-4o-mini-transcribe. Gpt-4o-transcribe has an improved word error rate performance over OpenAI’s open source speech-to-text model OpenAI said.

The new models capture nuances of speech, reduce misrecognitions and increase transcription reality.

OpenAI also introduced gpt-4o-mini TTS, a text-to-speech model that allows developers to “instruct” the model on what to say and how.

The models build on GPT-4o and GPT-4o-mini architectures.

Tone and audience

According to OpenAI, developers can instruct the models to speak in a specific way. For example, users can tell the models to speak like a “sympathetic customer service agent. “

The new audio models target both OpenAI’s consumer audience and a small portion of the enterprise market, said Gartner analyst Arun Chandrasekaran.

Many consumers use ChatGPT, so those audiences would be interested in some of the tones introduced in the audio API, such as Medieval Knight, True Crime Buff and Bedtime Story, he said.

At the same time, tones like Professional and Calm will be useful in customer service settings in which the agent is dealing with an angry customer, Chandrasekaran said.

“Customer service is one of the fastest growing use cases we are starting to see in the enterprise, and I’m not very surprised that all of these companies are trying to gravitate toward where the money is,” he said.

The new models will reduce the number of human agents needed to handle every interaction and allow for more automated interactive voice response  systems, said Forrester Research analyst William McKeon-White.

“We’ve been seeing these already actually coming online, working with several other second-order consumers of these services who are vendors themselves,” he said. “They’ve already been seeing strong successes with these capabilities.”

McKeon-White said users should benefit from OpenAI’s voice models because of the level of automation and delivery that the vendor provides.

“The fact that it’s just natively part of what open AI is providing now is quite helpful to a lot of enterprises who are seeing a lot of different models at this point,” he said.

OpenAI’s breakdown of the error rate of the new models shows that the models are effective across widely used languages like French and Spanish.

Some challenges

However, McKeon-White said it would be good to see how well the models handle acronyms since speech models find them challenging.

Moreover, because of the competitiveness of customer service applications, OpenAI faces some challenges.

One is that the vendor competes with vendors that approach customer service from a narrow perspective. For example, Sierra AI is an AI startup that focuses solely on customer service.

Chandrasekaran said this differs from OpenAI, which has multiple models and multiple applications for its models.

Another challenge is that many contact center vendors such as Genesys are already embedding AI technology into their products.

“They’re all starting to embed AI into it and, of course, are competitive to what OpenAI is doing,” Chandrasekaran continued.

Moreover, while the APIs are helpful for teams looking to build applications, they are not beneficial for those without teams, McKeon-White said.

“Most organizations we talk with are not ready just to go consume raw APIs to go and build out a net new system,” he said. “It needs business logic, it needs business understanding, and it needs like business integrations to make everything work.”

Esther Shittu is an Informa TechTarget news writer and podcast host covering artificial intelligence software and systems.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *