homehome Home chatchat Notifications


Stochastic parrot? New study suggests ChatGPT plagiarizes beyond just "copy" and "paste"

If you're a student using ChatGPT, you may want to think again before using it.

Mihai Andrei
March 17, 2023 @ 1:20 pm

share Share

In the few months since ChatGPT was introduced publicly, it’s taken the world by storm. It has the ability to produce all sorts of text-based content, even passing exams that are challenging for humans. Naturally, students have started taking notice. You can use ChatGPT to help you with essays and all sorts of homework and assignments, especially since the content it outputs isn’t plagiarized — or isn’t it?

According to a new study, language models like ChatGPT can plagiarize on multiple levels. Even if they don’t always take ideas verbatim from other sources, they can rephrase or paraphrase ideas without changing the meaning at all, which is still not acceptable.

“Plagiarism comes in different flavors,” said Dongwon Lee, professor of information sciences and technology at Penn State and co-author of the new study. “We wanted to see if language models not only copy and paste but resort to more sophisticated forms of plagiarism without realizing it.” Lo and behold, it really did.

Image credits: Nick Morrison.

Being a university student nowadays can be pretty challenging. After the pandemic lockdown period, plenty of things have changed: universities face staff shortages and mental health problems as there’s much more online work to do, which can be challenging in multiple ways. In addition to technical challenges, like needing to own a laptop or computer with a stable enough internet connection, students have had to develop a complementary set of skills — particularly in terms of computer literacy. More and more, you need to know how to manage the online course management system, navigate through lectures and recordings, and edit and submit assignments and essays strictly digitally. A few years ago, you may have gotten away without using things such as Google Drive or a pdf editor but nowadays, that just doesn’t fly.

Understandably, students jumped at the opportunity of having an AI assistant do the work for them. At first glance, it seems safe to do because despite being trained on existing data, the AI produces new text which cannot be accused of plagiarism. Or so it would seem.

Lee and colleagues focused on identifying three forms of plagiarism:

  • verbatim, or direct copying;
  • paraphrasing or rephrasing;
  • rewording and restructuring content without quoting the original source.

All these are, in essence, plagiarism.

Because the researchers couldn’t construct a pipeline for ChatGPT, they worked with GPT-2, a previous iteration of the language model. They used 210,000 generated texts to test for plagiarism “in pre-trained language models and fine-tuned language models, or models trained further to focus on specific topic areas.” Overall, the team found that the AI engages in all three forms of plagiarism, and the larger the dataset the model was trained on, the more often the plagiarism occurred. This suggests that larger models would be even more predisposed to it.

“People pursue large language models because the larger the model gets, generation abilities increase,” said lead author Jooyoung Lee, doctoral student in the College of Information Sciences and Technology at Penn State. “At the same time, they are jeopardizing the originality and creativity of the content within the training corpus. This is an important finding.”

It’s not the first time something like this has been suggested. A paper that came out just over a year ago and was already cited over 1,300 times claims that this type of AI is a “stochastic parrot” — simply parroting existing information, without truly producing anything new.

It’s still early days for this type of technology and much more research is required to understand problems such as this one, but companies seem eager to release this technology into the wild before this kind of issue can be understood. According to the study authors, this research highlights the need for more research into the ethical conundrums that text generators pose.

“Even though the output may be appealing, and language models may be fun to use and seem productive for certain tasks, it doesn’t mean they are practical,” said Thai Le, assistant professor of computer and information science at the University of Mississippi who began working on the project as a doctoral candidate at Penn State. “In practice, we need to take care of the ethical and copyright issues that text generators pose.”

In the meantime, AI text generators are set to trigger an arms race. Plagiarism detectors are all over this — being able to detect ChatGPT shenanigans (or shenanigans from any generative AI) is valuable to ensure academic integrity. But whether or not they will actually succeed remains to be seen. For now, current tools don’t seem to do a good enough job.

Meanwhile, university students (and not only) will continue to use ChatGPT for their assignments if they can get away with it. A new dawn of plagiarism may be upon us, and it’s not so easy to tackle.

The researchers will present their findings at the 2023 ACM Web Conference, which takes place April 30-May 4 in Austin, Texas.

share Share

How Hot is the Moon? A New NASA Mission is About to Find Out

Understanding how heat moves through the lunar regolith can help scientists understand how the Moon's interior formed.

This 5,500-year-old Kish tablet is the oldest written document

Beer, goats, and grains: here's what the oldest document reveals.

A Huge, Lazy Black Hole Is Redefining the Early Universe

Astronomers using the James Webb Space Telescope have discovered a massive, dormant black hole from just 800 million years after the Big Bang.

Did Columbus Bring Syphilis to Europe? Ancient DNA Suggests So

A new study pinpoints the origin of the STD to South America.

The Magnetic North Pole Has Shifted Again. Here’s Why It Matters

The magnetic North pole is now closer to Siberia than it is to Canada, and scientists aren't sure why.

For better or worse, machine learning is shaping biology research

Machine learning tools can increase the pace of biology research and open the door to new research questions, but the benefits don’t come without risks.

This Babylonian Student's 4,000-Year-Old Math Blunder Is Still Relatable Today

More than memorializing a math mistake, stone tablets show just how advanced the Babylonians were in their time.

Sixty Years Ago, We Nearly Wiped Out Bed Bugs. Then, They Started Changing

Driven to the brink of extinction, bed bugs adapted—and now pesticides are almost useless against them.

LG’s $60,000 Transparent TV Is So Luxe It’s Practically Invisible

This TV screen vanishes at the push of a button.

Couple Finds Giant Teeth in Backyard Belonging to 13,000-year-old Mastodon

A New York couple stumble upon an ancient mastodon fossil beneath their lawn.