Knowing for the sake of knowing: algorithm developed to hardwire curiosity into robots

To better flesh out artificial intelligence (AI), computer scientists have put together an algorithm that makes machine curious to explore and learn simply for the sake of learning. In the long run, such programs could even take bots out of the factories and put them side-by-side with researchers.

Sage advice.
Image credits Gerd Altmann.

The concepts of intelligence and curiosity feel so deeply entwined to us that it’s almost impossible to imagine one going very far without the other. And yet even the most powerful machine brains we’ve built up to now have had to make do without any kind of curiosity — computing and returning an answer when instructed to, going to the screensaver in the absence of input.

It’s not like we’re only figuring this out now. Scientists have been working on various ways to imbue our silicone friends with curiosity for quite some time now, but their efforts have always fallen far under the benchmark set by our innate inquisitiveness. One important limitation, for example, is that most curiosity algorithms can’t determine whether something will be interesting or not — because, unlike us, they can’t assess the sum of data the machine has in store to see potential gaps in knowledge. By comparison, you could tell with a fairly high confidence if a book will be interesting or not without reading it first.

Judging books by their cover

But Todd Hester, a computer scientist currently working with Google DeepMind in London, thinks that robots should actually be able to go against this morsel of folk wisdom. To that end, he teamed up with Peter Stone, a computer scientist at the University of Texas at Austin to create the Targeted Exploration with Variance-And-Novelty-Intrinsic-Rewards / TEXPLORE-VENIR algorithm.

“I was looking for ways to make computers learn more intelligently, and explore as a human would,” he says. “Don’t explore everything, and don’t explore randomly, but try to do something a little smarter.”

The way they did so was to base TEXPLORE-VENIR on a technique called reinforcement learning. It’s one of the main ways humans learn, too, and works through small increments towards an end goal. Basically, the machine or human in question tries something, and if the outcome brings is closer to a certain goal (such as clearing all the board in Minesweeper) it receives a reward (for us it’s dopamine) to promote that action or behavior in the future.

Reinforcement learning works for us — by making stuff like eating feel good so we don’t forget to eat — and it works for machines, too — it’s reinforcement learning that allowed DeepMind to master ATARI games and Go, for example. But that was achieved through random experimentation, and furthermore, the program was instructed to learn the game. TEXPLORE-VENIR, on the other hand, acts similarly to the reward circuits in our brains by giving the program an internal reward for understanding something new, even if the knowledge doesn’t get it closer to the ultimate goal.

Image credits Troy Straszheim / Wikimedia.

As the machine learns about the world around it, TEXPLORE-VENIR rewards it for uncovering new information that’s unlike what it’s seen before — exploring a novel patch of forest, or finding a new way to perform a certain task. But it also rewards the machine for reducing uncertainty i.e. for getting a deeper understanding of things it already ‘knows’. So overall, the algorithm works more closely to what we understand as curiosity than previous programs.

“They’re fundamentally different types of learning and exploration,” says Konidaris. “Balancing them is really important. And I like that this paper did both of those.”

Testing points

The researchers put TEXPLORE-VENIR to the test in two different scenarios. First, the program was presented with a virtual maze constructed of four rooms connected by locked doors. Its task was to find a key, pick it up, and then use this key to unlock a door. To score the algorithm’s efficiency, each time the simulated bot passed a door it earned 10 points and had a 3000 step cap during which to achieve the highest score possible. The bot was first allowed a 1000-step exploration phase to familiarize with the maze.

When this warm-up period was done under the direction of TEXPLORE-VENIR, the bot averaged 55 door point in the 3000-step phase. For other curiosity algorithms, it averaged anywhere between 0-35 points, with the exception of R-Max, a program which also scored 55 points. When the program had to explore and pass through doors simultaneously, TEXPLORE-VENIR averaged around 70 points, R-Max around 35, while the others clocked in at under 5 points, the researchers report.

The second round of testing was performed with a physical robot, the Nao. It included three separate tasks, during which the machine earned points for hitting a cymbal, for holding a pink tape (which was fixed on his hand) in front of his eyes, and finally for pressing a button on its foot. For each task, it was allowed 200 steps to earn points but was given an initial 400-step period to explore — either randomly or using TEXPLORE-VENIR.

Each method of exploration was used 13 times. Overall, Nao found the pink tape on his hand much faster using TEXPLORE-VENIR than the random approach. It pressed the button on 7 out of the 13 trials after using TEXPLORE-VENIR, compared to zero times after exploring randomly. Lastly, it hit the cymbal in one of five trials after using TEXPLORE-VENIR, but not once after exploring randomly. TEXPLORE-VENIR allowed the robot to better understand the basics about how its body, the environment, and the task at hand worked — so it was well prepared for the trials after the exploration period.

As the team notes, striking a balance between internal and external rewards is the most important thing when it comes to learning. Too much curiosity could actually impede the robot. If the intrinsic reward for learning something is too great, the robot may ignore extrinsic rewards (i.e. those from performing its given tasks) altogether. R-Max, for example, scored fewer points in the simultaneous exploration and door-unlocking phase because its curiosity distracted it from its task, which I guess you could chalk up as AI ADHD. Too little curiosity, on the other hand, can diminish the bot’s capacity for learning. We’ve probably all had that one test where the grade was more important than actually learning anything — so you memorize, take the test, and then your mind wipes everything clean.

Hester says the next step in their research is to better tailor the algorithm after our brain architecture and use deep neural networks to make bots “learn like a child would.”

The full paper “Intrinsically motivated model learning for developing curious robots” has been published in the journal Artificial Intelligence.