Artificial Intelligence has learned to recognize the individual voices in the crowd

Artificial Intelligence for voice recognition

The new development of the American IT giant Google looks like a dual-use technology. On the one hand, this is a find for a spy who can identify and overhear a speaker at a distance, even if he is hiding in a crowd of people. On the other hand, a breakthrough in the analysis of voice data will help many disabled people by hearing and increase the efficiency of Google services. So how does this work?

We may be heard that face recognition has been a popular discussion in technology in the past, yet recognize the voice of a person, even in the presence of interference, is also very simple – the problem is to identify its owner. The developers of Google simply put a video camera with an algorithm that reacts to the person’s facial expressions to the microphone. The system compares the movements on the speaker’s face, “reads on the lips” and analyzes the sound in parallel. If the results are the same – excellent, AI isolates this character and can follow only his speech against the background of a common cacophony of sounds.

The neural network was first taught the technique of lip reading, then taught to distinguish people speaking from simply laughing, to recognize facial expressions when speaking, even if the face is partially hidden by a beard or microphone. Then a sorting mechanism was added to the system – when the orator is calculated, its data enters a separate acoustic profile. Due to this AI can distinguish the words of different people, even if they specifically try to confuse it and say or sing in unison.

To understand the conversation of a particular person is a good deed not only for the spies. For example, it is possible to accurately transmit to the hearing aid the words of the interlocutor of an invalid, filtering out other voices, like noise. Or expand the functionality of video chats, like Hangouts and Duo. Plus, these are new features for voice control systems, and it’s impossible to crack voice protection with only a fake acoustic recording.