Stochastic parrot? New study suggests ChatGPT plagiarizes beyond just "copy" and "paste"

In the few months since ChatGPT was introduced publicly, it’s taken the world by storm. It has the ability to produce all sorts of text-based content, even passing exams that are challenging for humans. Naturally, students have started taking notice. You can use ChatGPT to help you with essays and all sorts of homework and assignments, especially since the content it outputs isn’t plagiarized — or isn’t it?

According to a new study, language models like ChatGPT can plagiarize on multiple levels. Even if they don’t always take ideas verbatim from other sources, they can rephrase or paraphrase ideas without changing the meaning at all, which is still not acceptable.

“Plagiarism comes in different flavors,” said Dongwon Lee, professor of information sciences and technology at Penn State and co-author of the new study. “We wanted to see if language models not only copy and paste but resort to more sophisticated forms of plagiarism without realizing it.” Lo and behold, it really did.

Being a university student nowadays can be pretty challenging. After the pandemic lockdown period, plenty of things have changed: universities face staff shortages and mental health problems as there’s much more online work to do, which can be challenging in multiple ways. In addition to technical challenges, like needing to own a laptop or computer with a stable enough internet connection, students have had to develop a complementary set of skills — particularly in terms of computer literacy. More and more, you need to know how to manage the online course management system, navigate through lectures and recordings, and edit and submit assignments and essays strictly digitally. A few years ago, you may have gotten away without using things such as Google Drive or a pdf editor but nowadays, that just doesn’t fly.

Understandably, students jumped at the opportunity of having an AI assistant do the work for them. At first glance, it seems safe to do because despite being trained on existing data, the AI produces new text which cannot be accused of plagiarism. Or so it would seem.

Lee and colleagues focused on identifying three forms of plagiarism:

verbatim, or direct copying;
paraphrasing or rephrasing;
rewording and restructuring content without quoting the original source.

All these are, in essence, plagiarism.

Because the researchers couldn’t construct a pipeline for ChatGPT, they worked with GPT-2, a previous iteration of the language model. They used 210,000 generated texts to test for plagiarism “in pre-trained language models and fine-tuned language models, or models trained further to focus on specific topic areas.” Overall, the team found that the AI engages in all three forms of plagiarism, and the larger the dataset the model was trained on, the more often the plagiarism occurred. This suggests that larger models would be even more predisposed to it.

“People pursue large language models because the larger the model gets, generation abilities increase,” said lead author Jooyoung Lee, doctoral student in the College of Information Sciences and Technology at Penn State. “At the same time, they are jeopardizing the originality and creativity of the content within the training corpus. This is an important finding.”

It’s not the first time something like this has been suggested. A paper that came out just over a year ago and was already cited over 1,300 times claims that this type of AI is a “stochastic parrot” — simply parroting existing information, without truly producing anything new.

It’s still early days for this type of technology and much more research is required to understand problems such as this one, but companies seem eager to release this technology into the wild before this kind of issue can be understood. According to the study authors, this research highlights the need for more research into the ethical conundrums that text generators pose.

“Even though the output may be appealing, and language models may be fun to use and seem productive for certain tasks, it doesn’t mean they are practical,” said Thai Le, assistant professor of computer and information science at the University of Mississippi who began working on the project as a doctoral candidate at Penn State. “In practice, we need to take care of the ethical and copyright issues that text generators pose.”

In the meantime, AI text generators are set to trigger an arms race. Plagiarism detectors are all over this — being able to detect ChatGPT shenanigans (or shenanigans from any generative AI) is valuable to ensure academic integrity. But whether or not they will actually succeed remains to be seen. For now, current tools don’t seem to do a good enough job.

Meanwhile, university students (and not only) will continue to use ChatGPT for their assignments if they can get away with it. A new dawn of plagiarism may be upon us, and it’s not so easy to tackle.

The researchers will present their findings at the 2023 ACM Web Conference, which takes place April 30-May 4 in Austin, Texas.

Stochastic parrot? New study suggests ChatGPT plagiarizes beyond just "copy" and "paste"

Researchers Say They’ve Solved One of the Most Annoying Flaws in AI Art

The small town in Germany where both the car and the bycicle were invented

Scientists Created a Chymeric Mouse Using Billion-Year-Old Genes That Predate Animals

Americans Will Spend 6.5 Billion Hours on Filing Taxes This Year and It’s Costing Them Big

Evolution just keeps creating the same deep-ocean mutation

Underwater Tool Use: These Rainbow-Colored Fish Smash Shells With Rocks

This strange rock on Mars is forcing us to rethink the Red Planet’s history

Scientists Found a 380-Million-Year-Old Trick in Velvet Worm Slime That Could Lead To Recyclable Bioplastic

A Dutch 17-Year-Old Forgot His Native Language After Knee Surgery and Spoke Only English Even Though He Had Never Used It Outside School

Your Brain Hits a Metabolic Cliff at 43. Here’s What That Means

Stochastic parrot? New study suggests ChatGPT plagiarizes beyond just "copy" and "paste"

Related Posts

Researchers Say They’ve Solved One of the Most Annoying Flaws in AI Art

The small town in Germany where both the car and the bycicle were invented

Scientists Created a Chymeric Mouse Using Billion-Year-Old Genes That Predate Animals

Americans Will Spend 6.5 Billion Hours on Filing Taxes This Year and It’s Costing Them Big

Evolution just keeps creating the same deep-ocean mutation

Underwater Tool Use: These Rainbow-Colored Fish Smash Shells With Rocks

This strange rock on Mars is forcing us to rethink the Red Planet’s history

Scientists Found a 380-Million-Year-Old Trick in Velvet Worm Slime That Could Lead To Recyclable Bioplastic

A Dutch 17-Year-Old Forgot His Native Language After Knee Surgery and Spoke Only English Even Though He Had Never Used It Outside School

Your Brain Hits a Metabolic Cliff at 43. Here’s What That Means