homehome Home chatchat Notifications


MIT makes an AI that can predict what the future looks and sounds like

Artificial intelligence is learning in seconds what took humans a lifetime to master.

Tibi Puiu
June 21, 2016 @ 6:48 pm

share Share

As humans, we’re pretty good at anticipating things, but that’s much harder for robots. If you walk down the street and see two people meeting in front of a café, you know they’ll shake hands or even hug a few seconds later, depending on how close their relationship is. We, humans, sense this sort of stuff very easily, and now some robots can too.

Left: still frame given to the algorithm which had to predict what happened next. Right: next frames from the video. Credit: MIT CSAIL

Left: still frame given to the algorithm which had to predict what happened next. Right: next frames from the video. Credit: MIT CSAIL

MIT used deep-learning algorithms — neural networks that teach computers to find patterns by themselves from an ocean of data — to build an artificial intelligence which can predict what action will occur next, starting from nothing but a still frame. Each of these networks was programmed to classify an action as either a hug, handshake, high-five, or kiss. These networks are then merged to predict what happens next. The demonstration video below is telling.

To train the algorithm, the researchers fed more than 600 hours of video footage from YouTube but also the complete series of “The Office” and “Desperate Housewives.” When this was tested, it could correctly predict 43 percent of the time what action would happen in the next one second.

In a second test, the algorithm was shown a frame from a video and asked to predict what object would appear five seconds later. For instance, a frame might show a person attempting to open a microwave. Inside might be some leftover pasta from last night, a coffee mug and so on.

The algorithm correctly predicted what object appeared 11 percent of the time. That might not sound impressive, but it’s still 30 percent better than the baseline measurement and some of its guesses were better than those made by some people. When the same test was given to people, the subjects were right only 71 percent of the time.

“There’s a lot of subtlety to understanding and forecasting human interactions,” says Carl Vondrick, a PhD student at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). “We hope to be able to work off of this example to be able to soon predict even more complex tasks.”

“I’m excited to see how much better the algorithms get if we can feed them a lifetime’s worth of videos,” Vondrick added. “We might see some significant improvements that would get us closer to using predictive-vision in real-world situations.”

For now, there’s no practical outcome in mind for this deep-learning algorithm, but the researchers say their present work might one day lead to better household robots that can work better with humans by vanquishing some of the inherent awkwardness.

Credit: MIT CSAIL

Credit: MIT CSAIL

From the same Computer Science and Artificial Intelligence Laboratory (CSAIL) lab came another predictive artificial intelligence — this time for sound. What it does is basically produce a sound when shown a video of an object being hit.

The assigned sounds are realistic enough to fool a human viewer. When asked whether a sound was real or fake, human participants picked the fake sound over the real one twice as often as a baseline algorithm.

“When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it,” says CSAIL PhD student Andrew Owens, who was lead author of the paper describing MIT’s CSAIL sound algorithm. “An algorithm that simulates such sounds can reveal key information about objects’ shapes and material types, as well as the force and motion of their interactions with the world.”

In other words, both algorithms are now learning in seconds what most humans have to learn in a lifetime: how people react, social norms, even what sound a drum stick makes when it hits a puddle of water.

share Share

This 5,500-year-old Kish tablet is the oldest written document

Beer, goats, and grains: here's what the oldest document reveals.

A Huge, Lazy Black Hole Is Redefining the Early Universe

Astronomers using the James Webb Space Telescope have discovered a massive, dormant black hole from just 800 million years after the Big Bang.

Did Columbus Bring Syphilis to Europe? Ancient DNA Suggests So

A new study pinpoints the origin of the STD to South America.

The Magnetic North Pole Has Shifted Again. Here’s Why It Matters

The magnetic North pole is now closer to Siberia than it is to Canada, and scientists aren't sure why.

For better or worse, machine learning is shaping biology research

Machine learning tools can increase the pace of biology research and open the door to new research questions, but the benefits don’t come without risks.

This Babylonian Student's 4,000-Year-Old Math Blunder Is Still Relatable Today

More than memorializing a math mistake, stone tablets show just how advanced the Babylonians were in their time.

Sixty Years Ago, We Nearly Wiped Out Bed Bugs. Then, They Started Changing

Driven to the brink of extinction, bed bugs adapted—and now pesticides are almost useless against them.

LG’s $60,000 Transparent TV Is So Luxe It’s Practically Invisible

This TV screen vanishes at the push of a button.

Couple Finds Giant Teeth in Backyard Belonging to 13,000-year-old Mastodon

A New York couple stumble upon an ancient mastodon fossil beneath their lawn.

Worms and Dogs Thrive in Chernobyl’s Radioactive Zone — and Scientists are Intrigued

In the Chernobyl Exclusion Zone, worms show no genetic damage despite living in highly radioactive soil, and free-ranging dogs persist despite contamination.