Is AI Moderation a Useful Tool or Another Failed Social Media Fix?

Social media promised to connect the world more than ever before. But in the meantime, it’s turned into a gargantuan double edged sword. Sure, on one hand, it allows us to keep up with friends and interests easier than ever before. But it’s also quickly becoming a massive source of disinformation and mental health problems.

From hate speech to cyberbullying, harmful online discourse is on the rise. It’s also extremely difficult to stop. Due to the sheer volume of content, any efforts to curb this issue turn into a game of whack-a-mole. By the time you stop one toxic profile, three more pop up. Then among all this, there’s also AI.

AI enters social media

AI makes it easier than ever to create content, whether that’s helpful or toxic content. In a new study, however, researchers showed that AI can also help address this problem. The new algorithm is 87% accurate in classifying toxic and non-toxic text without relying on manual identification.

Researchers from East West University, Bangladesh, and the University of South Australia developed an optimized Support Vector Machine (SVM) model to detect toxic comments in both Bangla (a language spoken by over 250 million people) and English. Each model was trained using 9,061 comments collected from Facebook, YouTube, WhatsApp, and Instagram. The dataset included 4,538 comments in Bangla and the rest in English.

SVMs have been used before to categorize social media content. Although the process is typically fast and relatively straightforward, it’s not accurate enough. In this case, however, the SVM was almost 70% accurate in categorizing comments. The researchers then developed another type of classifier, called Stochastic Gradient Descent (SGD). This was more accurate, reaching around 80% accuracy, but it also flagged harmless comments as toxic. It was also much slower than the SVM.

Then, the researchers fine-tuned and mixed these models into a single one, which they call an optimized SVM. This model was fast and had an accuracy of 87%

“Our optimized SVM model was the most reliable and effective among all three, making it the preferred choice for deployment in real-world scenarios where accurate classification of toxic comments is critical,” says Abdullahi Chowdhury, study author and AI researcher at the University of South Africa.

It’s useful, but not perfect

Toxicity in social media is a growing issue. We’re drowning in a sea of negativity and mistrust, and AI can be both a solution and a problem. It is, much like social media itself, a double-edged sword.

The model seems to work just as fine in different languages, so it could be used to tackle global online toxicity. Social media companies have repeatedly shown that they are unwilling or unable to truly tackle this issue.

“Despite efforts by social media platforms to limit toxic content, manually identifying harmful comments is impractical due to the sheer volume of online interactions, with 5.56 billion internet users in the world today,” she says. “Removing toxic comments from online network platforms is vital to curbing the escalating abuse and ensuring respectful interactions in the social media space.”

More advanced AI techniques like deep learning could improve accuracy even further. While more research is needed, this could enable real-time deployment in social media platforms, essentially flagging harmful comments.

Could AI moderation truly be the solution to social media toxicity, or is it just another pseudo-techno-fix destined to backfire? The answer isn’t straightforward. AI has already shown immense potential in automating tasks, detecting patterns, and filtering harmful content faster than any human moderation team could. However, past attempts at AI-driven moderation have been far from perfect.

Moreover, AI lacks the nuance of human judgment. A sarcastic joke, a political discussion, or a cultural reference can easily be misclassified as toxic. At the same time, genuinely harmful comments can sometimes slip through the cracks, either because the AI was not trained on a diverse enough dataset or because bad actors find ways to game the system.

The real challenge isn’t just building better AI — it’s ensuring that these systems serve the public good rather than becoming another layer of digital dysfunction.

Journal Reference: Afia Ahsan et al, Unmasking Harmful Comments: An Approach to Text Toxicity Classification Using Machine Learning in Native Language, 2024 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT) (2025). DOI: 10.1109/3ict64318.2024.10824367