Microsoft: The Dawn of the Cognitive Services Era

ORDER REPRINTS DOWNLOAD COMMENT DISCUSS SHARE

In today’s reality, cognitive services simply recognize emotions in humans to better understand the context of a situation.

Speech

Billions of dollars and countless man years of research and development has led to several cognitive service skills being ready for prime time exploitation by any Microsoft customer. Speech Recognition has progressed to the point digital assistants and switchboard IVR replacements can understand realistic human speech. Applications using cognitive services understand naturally spoken language: context, meaning, slang, jargon, and even local accents. Microsoft researchers achieved the lowest word error rate of 5.9 percent when tested against the industry standard switchboard speech recognition task. This is human parity - the same level of understanding as humans listening to the same conversation.

Language

Text can be converted to speech for automatic verbal responses. Speech can be transcribed as text and piped as input to the APIs. The Language Understanding Intelligent Service (LUIS) allows for models of specific populations and subject matters to be trained in context intent recognition. It can understand the relevance of the conversation to a specific subject domain, return the topic of the conversation, the key phrase in an exchange, and gauge the sentiment of the speaker.

Vision

Image Recognition can interpret real time images from multiple formats. An image can be recognized as a collection of objects and the actions. Image Recognition can identify and assess the collective emotions and ages of people in an image. It can recognize the environment of the image and name the elements displayed. For example a landscape shot could return: “outdoor, nature, forest, tree, river, ravine, and gully.” It can detect text in the image, such as a sign, and understands it. Image Captioning brings this all together. It allows a simple and complete story to be told about the image.

The Combined Power

The greatest potential of Cognitive Services is coming from the integration and combination of these APIs in novel mashups. A dramatic mashup in development at Microsoft is already easing the life of Microsoft’s blind employees. The project Seeingis AI. Computer vision and natural language processing are combined to describe a person’s surroundings, read text out loud, answer spoken questions and identify facial emotions during conversations. For example the visually impaired can take a picture with Pivothead smart glasses and the Microsoft CaptionBot will describe what it sees in the image, identifying objects, people, emotions, actions: “a happy young girl is throwing an orange Frisbee.” Seeing AI can also be accessed from a cell phone. For example, it can give verbal instructions for positioning and snapping a picture of a document. It will recognize the document type as, for example, a menu. Then it can respond to commands to read aloud the appetizers.

Science Non-fiction

One of the biggest marketing obstacles of Cognitive Services is its big brother Artificial Intelligence (AI). The perceived endgame of AI research, creating actual machine intelligence, is a far-point on the distant horizon. AI, via Science Fiction, is seen as a future, not present, technology. Fictional AIs, like the transcendent companion of the picture Her, Samantha, or the diabolical nemesis of Ex Machina’s Ava, have emotions for dramatic tension. In today’s reality, cognitive services simply recognize emotions in humans to better understand the context of a situation. Alternative meaning in speech is often keyed to emotions. Accuracy transcribing speech and translating from culture to culture is aided by the correct interpretation of the emotional intent of the speaker.

Star Trek seeded many ideas for current innovations. One such was the 25^th Century Star Trek communicator, a working replica of which you can buy today as a Bluetooth pin for your tunic. Coupled with this futuristic communicator was the Universal Translator, which after some deft training by the language savant Uhura, interacted with the communicator to translate between two different cultures in real time. Surprise, this capability is in use today. Your developers can access the APIs and, like Uhura, train a service to translate your business needs. Our mobile telecom partner Tele2 did just that.