You may have seen Google’s video promoting its Google Now assistant, but that’s just the tip of the iceberg. Several applications of speech recognition are emerging as new features for interfacing computers and devices, but the next breakthrough in human computer interaction using voice will be driven by advances in speech understanding.
Recent advances in automatic speech recognition, especially boosted from the contributions of deep learning, led recognition rates to levels compatible with practical viability for several scenarios.
In this sense, some of the major global players in computer industry are in a racing for the development and improvement of intelligent personal assistants. Apple with Siri, Samsung with S Voice, Google with Google Now, and soon Microsoft with Cortana are some of the most conspicuous examples of this trend. All those Apps propose adding naturalness into user interfaces through voice inputs and natural language processing, and using intelligent agent’s technology for adding personalization and context awareness.
Main Intelligent Personal Assistants launch dates
An important challenge facing intelligent personal assistants is to overcome errors in the speech recognition module. Automatic speech recognition performance is far from ideal and, for some complex cases like spontaneous speech on noisy environments, is nowhere near human-level efficiency. In practice, restrictions on input variability such as constraints on types of interactions, vocabulary, number of speakers and noise, and the use of additional external knowledge such as user profiles and context information reduce ambiguity and increase recognition performance. However this improvement sacrifices interaction naturalness: requiring user’s adaptation to the interaction modes supported by the system, or flexibility: restricting scenarios in which the system can be used.
How do we see the evolution in this field? In the short term, the trend toward integration of speech recognition with ubiquitous mobile devices is likely to continue. The fusion of information from multiple distributed devices and the knowledge of specific patterns and interests learned from users, as well as the intelligent detection of user gestures will be exploited in industry.
In the mid-term, to move beyond restricted applications to more natural and general human computer interfaces, there must be real progress in the field of speech understanding. Some applications seem to be moving in that direction. For example, by integrating its Google Now with the Google Knowledge Graph, Google can assemble entity meanings and connections during information retrieval, which may be a first step towards semantic integration. However, that intelligent assistant can use only a small set of interaction’s templates.
Unlike speech recognition, the field of speech understanding is much less explored and mature. The possibility of developing more accurate and natural human-computer interactions, however, will undoubtedly power the development of that field, and will open the doors to a whole new world of opportunities.