Researchers from the University of Washington and the Allen Institute for Artificial Intelligence (AI2) have developed a computer software that scored 49% on high-school geometry SAT tests – an average score for a human, but a great one for current AIs.
Considering how computers work, you’d think they should ace any test – especially math test – but the key difference is the way the test was presented. It wasn’t presented in a binary form or a form that the AI would naturally understand and perform well. It was presented in actual text, just like a regular student would receive it. This mean that it understood not only the explanations, but also the accompanying diagrams and charts.
“Unlike the Turing Test, standardized tests such as the SAT provide us today with a way to measure machines ability to reason and to compare its abilities with that of a human,” said Oren Etzioni, CEO of AI2. “Much of what we understand from text and graphics is not explicitly stated, and requires far more knowledge than we appreciate. Creating a system to successfully take these tests is challenging, and we are proud to achieve these unprecedented results.”
The AI is called GeoS, and its breakthrough is indeed significant. While software programmers have no problem putting things into a perspective that software can understand and crunch, they struggle quite a lot when they have to make a computer understand things like a human.
It works by reading and interpreting the text and diagrams, then matching it with possible logical solutions and puts them through its geometry solver. It then compares its solution with multiple choice options given in the paper.
“We’re excited about GeoS performance on real-world tasks,” said Ali Farhadi, a senior research manager at AI2. “Our biggest challenge was converting the question to a computer-understandable language. One needs to go beyond standard pattern matching approaches for problems like solving geometry questions that require in-depth understating of text, diagram and reasoning.”
GeoS is just one of the many projects which are currently trying to take different human exams. The Allen Institute’s Project Aristo is trying to master fourth grade science, while Fujitsu and IBM are working on passing the University of Tokyo entrance exam.