The internet has become indispensable for modern society, connecting billions of users across a vast and diverse network of websites. Yet for many people, particularly those with disabilities, navigating this digital maze can be daunting. A team of researchers at Ohio State University aims to change that by developing an artificial intelligence agent that can complete complex tasks on any website using simple language commands.
“For some people, especially those with disabilities, it’s not easy for them to browse the internet,” said Yu Su, assistant professor of computer science and engineering at Ohio State and co-author of the study. “We rely more and more on the computing world in our daily life and work, but there are increasingly a lot of barriers to that access, which, to some degree, widens the disparity.”
The study, due to be presented at the NeurIPS conference in December, outlines how AI can function as a digital assistant, simplifying the web for all users. The technology uses large language models, allowing it to mimic human-like browsing behavior across websites it has never encountered before.
From Barriers to Access
Imagine a user with limited motor control who struggles to navigate a flight booking website filled with dropdown menus, tiny buttons, and multi-step forms. With this new AI agent, they could simply type a command like, “Book a round-trip flight from New York to Paris leaving December 15th and returning December 22nd,” and the agent would (hopefully) seamlessly handle all the steps—from selecting dates to filling in passenger information.
Or consider a visually impaired user trying to access information on a government website with an overwhelming number of links and forms. Instead of relying on screen readers to painstakingly describe every element, they could tell the AI agent, “Find the application form for a passport renewal.” The agent would locate the correct page, navigate through the necessary steps, and provide the user with clear instructions or even complete parts of the process for them, significantly reducing the time and effort required to accomplish the task.
This accessibility could also extend to online website builders, which are notoriously complex for some users with disabilities. An AI agent could guide users through the process of creating and customizing a site, making what was once an overwhelming task simple and intuitive. By reducing the need for precise navigation and offering streamlined assistance, the technology bridges the gap between accessibility and autonomy for web users.
How It Works: Training the Web’s New Guide
The researchers’ solution is called Mind2Web, the first dataset designed for training AI agents to handle real-world websites. Unlike earlier efforts that relied on simplified, simulated environments, Mind2Web introduces agents to the sprawling complexity of modern sites. Researchers compiled over 2,000 tasks from 137 websites, ranging from booking international flights to browsing Netflix catalogs.
These tasks aren’t simple. For example, booking a flight requires 14 separate actions—steps that can be overwhelming for users unfamiliar with website layouts or navigation quirks. Yet the AI model, trained on these diverse challenges, excelled by understanding website structures and predicting the next steps, much like a human might.
“It’s only become possible to do something like this because of the recent development of large language models like ChatGPT,” said Su. Since OpenAI’s chatbot burst onto the scene in late 2022, these models have reshaped industries by handling everything from creative writing to medical advice. The Ohio State project now extends this potential to website interaction.
The researchers also introduced MindAct, a framework combining smaller and larger language models to optimize task performance. This dual approach allows the AI to process only the most relevant information, avoiding the inefficiency of analyzing thousands of raw HTML elements on every page. The result: an agent that outperforms existing tools while remaining computationally efficient.
A Tool of Opportunity—and Caution
While the technology holds immense promise, it also raises serious ethical questions. Flexible AI agents could empower users by streamlining online tasks, but their capabilities could be misused.
“On the one hand, we have great potential to improve our efficiency and to allow us to focus on the most creative part of our work,” Su said. “But on the other hand, there’s tremendous potential for harm.” He highlighted risks such as financial fraud or the spread of misinformation if AI agents are left unchecked.
Researchers are calling for careful regulation and oversight as these tools develop. Su and his team remain optimistic, noting the transformative potential of this technology for making the web more inclusive.
“Throughout my career, my goal has always been trying to bridge the gap between human users and the computing world,” said Su. “That said, the real value of this tool is that it will really save people time and make the impossible possible.”
As AI-powered agents become more advanced and accessible, they could redefine how our society interacts with the internet. For now, researchers like Su are working to ensure that this evolution benefits everyone, leaving no user behind.