AT&T announced this week that it is launching speech recognition and transcription application program interfaces (APIs) that can be used by developers for apps in smartphones, tablets, TVs or other devices. The APIs draw on the company's voice recognition services, collectively called Watson.
Not to be confused with IBM's Jeopardy-winning Watson computer , AT&T's Watson focuses on seven different contexts. These include Web search, business search, voicemail to text, SMS, question and answer, TV, and generic capabilities utilizing technologies covered by 600 patents.
Web Search, TV, Q&A
The APIs, originally announced in April, are being used in a new AT&T Translator app for Android and iOS platforms, which translates spoken or written language into another language. Other speech APIs are on their way, optimized for gaming and social media apps.
The Web search capability is designed to recognize several million mobile queries, and business search can search tens of millions of local business entries. Voicemail to text was "trained" from a large set of data acquired from call centers, and the question-answer capability was built around 10 million Q&A sets.
TV allows a user to search show titles, movies, or actors, using the AT&T U-verse program guide, and generic can recognize and process either English or Spanish.
AT&T Watson has also been designed to learn different accents, speaker variations, background environments, platform variations, dialects and speech patterns, with continual improvement over time. Speech SDKs for native and HTML5 apps are being released.
'How Great Is this Technology?'
We asked Al Hilwa, director of Application Development Software Research at IDC, if the AT&T APIs could fill a need for the developer.
"If the technology is sufficiently compelling, it could," he told us. The key question, Hilwa said, is "how great is this technology?" Additionally, he said, developers would need to understand how they can be "looped in to the Watson ecosystem."
AT&T spokeswoman Jan Rasmussen told us that "because the capabilities within the Speech API can perform independently in plug-and-play fashion, AT&T's offering allows for better speed and accuracy and lower latency." As a result, she said, transcription, translation, and analysis happen "in nearly real time," and the company can single out any of the capabilities to enable better performance.
Watson is based on a variety of research and commercial applications conducted by AT&T over the years. These include a speech API released by the company for researchers more than three years ago, called Speech Mashup.
The technology also draws on AT&T's interactive voice response system, which has been used for more than two decades in commercial voice-recognition applications for nationwide operator-assisted service, as well as on the company's previous products in speech translations, mobile voice search of multimedia data, and local business search.