homehome Home chatchat Notifications


These AI headphones let you listen to a single person in a crowd or noisy area

With these headphones, all it takes is a brief glance at the desired speaker to isolate their voice.

Mihai Andrei
June 3, 2024 @ 9:19 pm

share Share

In the din of a bustling café or a crowded conference, discerning one voice amidst the noise often feels like a superpower. Now, thanks to a groundbreaking innovation by the University of Washington, we may all have that superpower. Leveraging advanced artificial intelligence, researchers have developed headphones that allow users to focus on a single speaker in a sea of sound. All it takes is a brief glance at the desired speaker to isolate their voice, effectively silencing all other background noise.

Headphones have come a long way. They were first invented in the 1880s, out of a need to free up a person’s hands when operating the telephone. Modern headphones do essentially the same thing, but are much more sophisticated. They can be wireless, adjust sound levels, and even apply noise cancellation. A team of researchers wanted to take this to the next level — using AI.

The idea is to identify the desired source of sound and then use AI to keep only that source of sound audible. The headphone wearer turns towards whoever they want to listen to and the headphone “locks on”, continuing to play that voice or sound even if the wearer moves around.

“We tend to think of AI now as web-based chatbots that answer questions,” said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. “But in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.”

Machine learning vocal patterns

The new approach builds on the team’s previous “semantic hearing” research, which allowed users to select specific sound classes that they wanted to cancel. This previous work detected sounds such as birds or specific voices and cancelled them, while leaving others unaffected.

The system is a sort of real time training algorithm. The headphones have an on-board mini-computer that runs machine learning software. The wearer turns towards the sound source and the headphones pick up that source (with a 16-degree margin of error). After an accommodation period of only a few seconds, the “target speech hearing” mode comes in and plays just the targeted speaker’s voice even as the listener moves around. The system also gets better with time as the system gets more training data from the speaker’s voice.

The team tested the system on 21 subjects who were asked to rate how well they could hear the voice before and after filtering, all reported major improvements. The clarity of the speaker was rated nearly twice as high as the unfiltered audio, on average.

“Our user studies demonstrate generalization to real-world static and mobile speakers in previously unseen indoor and outdoor multipath environments. Finally, our enrollment interface for noisy examples does not cause performance degradation compared to clean examples, while being convenient and user-friendly. Taking a step back, this paper takes an important step towards enhancing the human auditory perception with artificial intelligence,” the researchers conclude.

Some limitations to work out

The system has applications in various fields. For individuals with hearing impairment, these AI-powered headphones could offer a significant improvement in their ability to communicate and engage in social settings. In professional environments, where clear communication is crucial, such technology could enhance productivity and reduce misunderstandings. Moreover, for anyone who has struggled to hold a conversation in a noisy café or during a bustling conference, these headphones represent a transformative leap in auditory technology.

But there are still some things to sort out.

The system is promising but it can only work with a single speaker at a time. If there are multiple speakers, and especially multiple speakers in the same direction, the system can have difficulties locking on. The user can run another enrollment to try to improve the clarity, but there are still instances when it won’t work properly. Also, the team is working to integrate the system into a less bulky headset (i.e. earbuds or hearing aids).

The team also released the code for the proof-of-concept device, making it available for others to build on. The system is not commercially available yet but this will make it much easier for other teams to also contribute.

The team presented its findings May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems.

share Share

This 5,500-year-old Kish tablet is the oldest written document

Beer, goats, and grains: here's what the oldest document reveals.

A Huge, Lazy Black Hole Is Redefining the Early Universe

Astronomers using the James Webb Space Telescope have discovered a massive, dormant black hole from just 800 million years after the Big Bang.

Did Columbus Bring Syphilis to Europe? Ancient DNA Suggests So

A new study pinpoints the origin of the STD to South America.

The Magnetic North Pole Has Shifted Again. Here’s Why It Matters

The magnetic North pole is now closer to Siberia than it is to Canada, and scientists aren't sure why.

For better or worse, machine learning is shaping biology research

Machine learning tools can increase the pace of biology research and open the door to new research questions, but the benefits don’t come without risks.

This Babylonian Student's 4,000-Year-Old Math Blunder Is Still Relatable Today

More than memorializing a math mistake, stone tablets show just how advanced the Babylonians were in their time.

Sixty Years Ago, We Nearly Wiped Out Bed Bugs. Then, They Started Changing

Driven to the brink of extinction, bed bugs adapted—and now pesticides are almost useless against them.

LG’s $60,000 Transparent TV Is So Luxe It’s Practically Invisible

This TV screen vanishes at the push of a button.

Couple Finds Giant Teeth in Backyard Belonging to 13,000-year-old Mastodon

A New York couple stumble upon an ancient mastodon fossil beneath their lawn.

Worms and Dogs Thrive in Chernobyl’s Radioactive Zone — and Scientists are Intrigued

In the Chernobyl Exclusion Zone, worms show no genetic damage despite living in highly radioactive soil, and free-ranging dogs persist despite contamination.