New AI training approach could finally allow computers to have imaginations

Researchers at the University of Southern California (USC) are trying to teach a computer not how to love, but how to imagine.

People generally don’t have any issue imagining things. We’re pretty good at starting from scratch, and we’re even better at using our experience to imagine completely new things. For example, all of you reading this could probably imagine the Great Wall of China but made from spaghetti and meatballs, or a cat in a pirate hat.

Computers, however, are notoriously bad at this. It’s not their fault, we’ve built them to be fast, accurate, and precise, not to waste their time daydreaming; that’s our job. But giving computers the ability to imagine — to envision an object with different attributes, or to create concepts from scratch — could definitely be useful. Although machine learning experts have dealt with this issue up to now, we’ve made precious little progress.

However, a new AI developed at the USC mimics the same processes that our brains use to fuel our imagination, being able to create entirely new objects with a wide range of attributes.

Creative computing

“We were inspired by human visual generalization capabilities to try to simulate human imagination in machines,” said lead author Yunhao Ge, a computer science PhD student working under the supervision of Laurent Itti, a computer science professor.
“Humans can separate their learned knowledge by attributes — for instance, shape, pose, position, color — and then recombine them to imagine a new object. Our paper attempts to simulate this process using neural networks.”

In other words, as humans, it’s easy to envision an object with different attributes. But, despite advances in deep neural networks that match or surpass human performance in certain tasks, computers still struggle with the very human skill of “imagination.”

One of the largest hurdles we’ve faced in teaching computers how to imagine is that, generally speaking, they’re quite limited in what they recognize.

Let’s say we want to make an AI that can design buildings. We train such systems today by feeding them a lot of data. In our case, this would be a bunch of pictures of buildings. By looking at them, the theory goes, the AI can understand what makes a building a building, and the proper way to design one. In other words, it understands its attributes, which can then be replicated or checked against. With these in hand, it should be able to extrapolate — create virtually endless examples of new buildings.

The issue is that our AIs are still trained to understand features for the most part, not attributes. This means stuff like certain patterns of pixel layouts, which words a certain word is most likely to be encountered after. A simple but imperfect way to describe this is that a properly-trained AI today can recognize a building as a building, but it has no idea what a building actually is, what it’s used for, or how. It can check if a picture looks like a picture of a wall, and that’s about it. For our practical purposes today, this type of training is sufficient.

Still, in order to push beyond this point, the team used a process called disentanglement. This is the sort of process is used to create deepfakes, for example, by ‘disentangling’ or separating a person’s face movements and identity. Using this process, one person’s appearance can be replaced with another’s, while maintaining the former’s movements and speech.

The team took groups of sample images and fed them into the AI, instead of using one picture at a time as traditional training approaches do. They then tasked the program with identifying the similarities between them, a step called “controllable disentangled representation learning”. Information gleaned here was them recombined in a “controllable novel image synthesis,” which is programmer speak for ‘imagining things’.

It’s still much more crude than what we’re able to do using our brains, but as far as the mechanisms that underpin them, the processes aren’t very different at all.

“For instance, take the Transformer movie as an example” said Ge. “It can take the shape of Megatron car, the color and pose of a yellow Bumblebee car, and the background of New York’s Times Square. The result will be a Bumblebee-colored Megatron car driving in Times Square, even if this sample was not witnessed during the training session.”

The AI generated a dataset of 1.56 million images from the data used to train it, the team adds.

Artificial imagination would be a huge boon especially in research, for example in efforts to discover new drugs. We often get the idea from movies that once a computer becomes smart enough, it can take over the world and the human race effortlessly. Definitely thrilling stuff. But the fact of the matter remains that all the processing power in the world won’t be able to devise new medicine, for example, without the ability to first imagine something. The processing power can check (with the right code) how some molecules interact. But in order to do that, you have to first think of interacting those molecules — and that’s handled by imagination.

“Deep learning has already demonstrated unsurpassed performance and promise in many domains, but all too often this has happened through shallow mimicry, and without a deeper understanding of the separate attributes that make each object unique,” said Itti,
“This new disentanglement approach, for the first time, truly unleashes a new sense of imagination in A.I. systems, bringing them closer to humans’ understanding of the world.”

The paper “Zero-shot Synthesis with Group-Supervised Learning” has been presented at the 2021 International Conference on Learning Representations and is available here.