Software bugs are a tale as old as time — which, in the case of programming, means about 75 years. In 1947, programmer Grace Murray Hopper was working on a Mark II Computer at Harvard University when she noticed a moth that was stuck in the relay, preventing the computer program from running. It was the first “bug”, and countless others have followed since then.
In the history of programming, bugs have ranged from harmless to absolutely catastrophic. In 1986 and 1987, several patients were killed after a Therac-25 radiation therapy device malfunctioned due to an error by an inexperienced programmer, and a software bug might have also triggered one of the largest non-nuclear explosions in history, at a Soviet trans-Siberian gas pipeline.
While events such as this are rare, it’s safe to say that software bugs can do a lot of damage and waste a lot of time (and resources). According to a recent analysis, the average programmer produces 70 bugs per 1,000 lines of code, with each bug demanding 30 times more time to fix than it took to write the code in the first place. In the US alone, an estimated $113 billion is spent identifying and fixing code bugs.
That might soon change.
Microsoft recently announced the creation of a machine learning model that can accurately identify high-priority bugs 97% of the time. The model has an even higher rate of success (99%) in distinguishing between security and non-security bugs.
In a recent report, Scott Christiansen, a senior security program manager at Microsoft, praised the algorithm, adding that Microsoft’s ultimate goal was to design a bug-detection system that is “as close as possible” to the accuracy of a security expert.
“We discovered that by pairing machine learning models with security experts, we can significantly improve the identification and classification of security bugs.”
The bug detection system uses two statistical techniques: the frequency-inverse document frequency algorithm (TF-IDF) examines the code for keywords and assesses their relevance, and the logic regression model calculates the probability of the existence of a specific class or event.
Then, the program classifies security and non-security bugs and ranks them as “critical”, “important”, or “low-impact”.
The algorithm is still a work in progress, but Microsoft has announced that it will make its finding open-source on GitHub, which could end up saving a lot of time and energy for coders all around the world.
In the meantime, you can read a published academic paper, Identifying security bug reports based solely on report titles and noisy data, for more details.
“Every day, software developers stare down a long list of features and bugs that need to be addressed,” Christiansen said. “Security professionals try to help by using automated tools to prioritize security bugs, but too often, engineers waste time on false positives or miss a critical security vulnerability that has been misclassified. To tackle this problem data science and security teams came together to explore how machine learning could help.”