Who is talking? Automatic recognition of different speakers supports customer contact

At the international CCW 2019 trade fair in Berlin, EML European Media Laboratory GmbH is set to reveal how speech technology can be used to increase efficiency and quality control in call centers via automatic transcription, the recognition of different speakers, and a keyword search in real time. To this end, the Heidelberg-based company makes us of various techniques, including artificial intelligence.

Berlin/Heidelberg. Heidelberg IT company EML European Media Laboratory GmbH will once again participate in the CCW – the trade fair for telecommunications – with its own stand (Hall 2, Stand C19). The fair will take place from 19–21 February in Berlin and is expected to welcome more than 8,000 visitors. At the CCW, Heidelberg-based speech technologists will present customer-contact applications completed via telephone that convert incoming calls to text in real time. The resulting text can then be processed quickly and easily. Through the technology's automatic speaker recognition, it is also possible to automatically assign the generated text to the individual speakers involved in the conversation. The solutions that the EML will present are ideally suited to meeting the standards of data protection as they work "on-premise" – i.e., without a permanent data connection to an outside source.

Latest speech-analysis technology

Advantages in the call center: Automatic speaker recognition in customer dialogue. Photo: EML

The "EML Transcription Server" converts incoming calls to machine-searchable text entirely automatically and individually for each speaker. The "EML Speech Mining Server" then classifies the results according to predefined subject categories, the frequency of specific terms, and keywords. This server identifies new terms that occur frequently and links the results together. Real-time voice recognition also allows for easy verification and efficient compliance with a call center's policies. "The topic of 'artificial intelligence' – which has been widespread in the media lately – has been a major part of our lives for quite some time now," explains EML Managing Director Prof. Dr. Andreas Reuter. "We are able to achieve the highest detection rates via neural networks and 'deep learning.'" The system works in several languages: In addition to German and English, EML also offers language analysis in Arabic and Chinese.

Efficiency and quality assurance: Automatic speaker recognition

EML language technology automatically recognizes different speakers and assigns them to their respective conversation partners in the notated text. This process not only applies to telephone calls in a call center but also to all types of meetings: In the end, it has to be clear who said what. This necessity is especially important during consultations: Banks and insurance companies are required by law to record these conversations, and EML's automatic speaker recognition makes this recording possible. A speaker's position can be determined by using so-called "beamforming," and words can thus be correctly assigned to each individual. This process is enabled via the use of several microphones that are combined in an array. The resulting transcription renders an exact, comprehensible conversation of meetings in which statements are traceable to each individual speaker.
This process is notable as it does not require a permanent data connection to the "outside" since the speech-recognition software is installed on the device itself. The voice data remain "on-premise," – i.e., "in-house." "Of course, we also apply the high standards of data protection to the domain of the call center," explains EML development manager Markus Klehr. "Detection and model customization occur on-location, thereby enabling us to comply with stringent European privacy guidelines."

Automatic search for keywords

The automatic transcription of calls enables an efficient, advanced search for specific keywords. Conventional systems need to re-process and search all calls for each new search term. "Advanced Keyword Spotting," on the other hand, finds the keywords in the already-transcribed texts as quickly as a search engine. "We also support Traditional Keyword Spotting (KWS)," says Markus Klehr. "In our view, advanced keyword spotting offers much more: It's a way to relate keywords to one another."

"Your customer's voice": A customized marketing tool

A call-center operator can adapt the speech recognition itself to his or her own domain. For example, when a new product is introduced, the product description and new words can be immediately added to the language model. Current problems as well as emerging trends can thereby be quickly recognized.
Speech analytics can effectively determine how many callers mention specific product names, which may prove to be interesting information to a company's strategy- and marketing departments because the calls represent authentic customer statements that allow products, services, and activities to be adapted faster and more directly. "The call-center employees thereby have the opportunity to make communication more customer-friendly and efficient," says Markus Klehr.

19–22 February 2019: EML at the CCW, Berlin, Estrel Convention Center: Hall 2, Stand C19.

EML European Media Laboratory GmbH
EML European Media Laboratory GmbH was set up as a private IT company by SAP co-founder Klaus Tschira (1940-2015). EML develops software and technologies for automatic speech processing, focusing on the automatic conversion of speech into text (transcription) as used in telephony (speech analytics, voice mail) and mobile applications (voice texting, voice search. www.eml.org