The big tech companies in Silicon Valley are betting big on artificial intelligence, for good reason too. Previously, ZME Science told you all about how Google’s amazing dream-like neural networks operate to make sense of images. Later, I had a lot of fun toying with a similar neural network from Microsoft also available to the public which can interpret emotions based on facial expressions. Now, Microsoft is showcasing another powerful tool that writes captions for images.
They’re very basic — essentially descriptions of what the neural network algorithm “sees”. Sometimes the CaptionBot does a pretty good job.
Pretty good. It seems to be handling celebrities and portraits nicely but gets really confused if there are more things in focus.
In fact, after I gave it a couple more spins it started to sound like a half-blind uncle!
Then it really showed its (hilarious) limitations.
The CaptionBot uses two neural networks, one to parse the images and the other to generate the caption. The images are identified using Microsoft’s Computer Vision API, mixed with data from the Bing Image Search API. The Emotion API is what detects facial expressions. As it gets fed more photos, including these I played with, it will naturally get better and better. You can try it on your own here. Post some in the comments. This should be fun!