AI Writing

How LLMs Distort Our Written Language: 5 Key Insights from the 2026 Study

According to a 2026 study, LLMs can significantly change the tone, style, and creativity of writing (even when prompted to focus on grammar). Learn the key findings and implications from the study here.

In a March 2026 arXiv paper titled How LLMs Distort Our Written Language, researchers from Google DeepMind, UC Berkeley, the University of Washington, and more, explored how large language models (LLMs) such as ChatGPT, Claude, and Gemini can affect the meaning of human writing.

Their overall finding? LLMs can significantly change what we write.

Here’s a closer look at what they found.

Key Findings (TL;DR)

  • Extensive LLM use made argumentative essays more neutral.
  • Heavy LLM users found their essays to be less creative and less in their own voice.
  • Even when models had been asked to make minimal or grammar edits, LLMs frequently changed the meaning or overall conclusion of an essay.
  • LLMs often made writing more formal (less personal), emotional, and analytical than human writing.
  • AI-written peer reviews not only gave higher scores to papers on average, but they also emphasized different strengths and weaknesses than humans did.
How LLMs Distort Our Writing, 2026 Study

Study Overview

In this study, researchers used three datasets:

  1. Human study
  2. ArgRewrite-v2
  3. Peer reviews

Let’s take a closer look at each dataset below.

1. Human study to investigate how people interact with LLMs

First, the researchers examined how people use LLMs when writing argumentative essays. 

They recruited 100 native English-speaking participants in the US and asked them to write an argumentative essay that answered the question, “Does money lead to happiness?” 

They divided the 100 participants into two groups:

  • A control group of 45 that couldn’t use AI for their essay
  • An AI-assisted group of 55 that could use an LLM (gpt-4o-mini) while writing

Note: The researchers didn’t want to force a specific workflow. So, before beginning the study, they created two additional categories for participants. LLM-influenced for participants who self-reported generating 40% or less of their essay text, and whose transcripts supported that claim. LLM for participants who relied on AI more heavily.

That distinction reflects something important about AI-assisted writing: it doesn’t necessarily look the same for everyone.

2. ArgRewrite-v2 to compare LLM vs. human-written edits

Next, the researchers compared how LLMs and humans edited the same essays.

They started with a dataset of 86 argumentative essays written by university students in 2021 (the dataset was released in 2022, before the launch of ChatGPT).

For the original human-written dataset, each participant:

  1. Wrote the initial essay arguing for or against self-driving cars (D1)
  2. Received expert human feedback about how to improve the essay
  3. Revised the initial draft to create the second draft (D2)

The researchers then built their own dataset of LLM-generated (gpt-5-mini, gemini-2.5-flash, and claude-haiku) D2 drafts using D1 essays.

The researchers also prompted the LLMs to perform different revisions:

  1. General revisions to improve the essay as a whole
  2. Minimal edits for essential corrections
  3. Grammar edits
  4. Completion revisions for unfinished essays
  5. Expansion revisions to expand on the text’s original ideas

This setup allowed the researchers to compare LLM-revised essays to human revisions.

3. Peer reviews to assess LLM use “in the wild” 

To better understand LLM use in a real-world setting, the researchers initially looked at 75,000 peer reviews from the International Conference on Learning Representations (ICLR) 2026. 

Since the ICLR review process doesn’t allow AI, the reviewers would likely write original, high-quality reviews, or the study acknowledged they might at least hide any LLM use as much as possible.

Of the 75,000 reviews: 

  • 21% were probably generated with LLMs
  • 39% likely had LLMs edit or write sections of the text

The researchers analyzed 18,000 of the ICLR 2026 peer reviews across 9,000 papers using the LLM-as-a-Judge classifier to compare the scores/evaluations of strengths/weaknesses of AI reviews vs. human reviews.

The researchers made a point of using only papers with one review entirely written by humans and one LLM-generated review to help control for potential bias.

Finding #1: Extensive LLM Use Made Writing More Neutral

LLMs showed a higher proportion of neutral essays in a 2026 study
Heavy LLM users showed a much higher proportion of neutral essays than human and LLM-influenced users - source

Participants who heavily relied on LLMs were most likely to produce neutral essays.

Essays produced by heavy LLM users were much less likely to take a strong position for or against the topic than those written entirely by humans. 

Further, the researchers found that even when an LLM was prompted to just make some minimal edits (using the second dataset), such as grammar editing, it often ended up changing the conclusion or argumentative claim made in the essay (more on that below - see finding 3).

Finding #2: Heavy LLM Users Felt the Writing Was Less Creative and Less Like Them

Heavy LLM users reported that their essays weren’t as creative or in their own voice

Heavy LLM users reported their essays were less creative and less in their own voice than those in the human and LLM-influenced groups

However, they still reported comparative (or higher) satisfaction with the final essay. 

This suggests that people recognize the extensive use of LLMs can result in a loss of voice and creativity in their writing, yet they still find the AI support for navigating writing struggles, such as organization, beneficial.

Finding #3: LLMs Changed the Meaning (Even During Grammar Edits)

With the ArgRewrite-v2 dataset, researchers noticed a significant difference between human and LLM revisions.

While humans tend to make small shifts while editing, LLMs tend to change the essay’s meaning.

The uniformity of these shifts across different LLMs also suggests a convergence toward LLM-preferred linguistic patterns… that may not reflect the original intent or voice of human writers.” - Study

So, LLMs were actually shifting diverse human writing styles for the essays and pushing them toward a “shared semantic mode.” As a result they pieces were less reflective of the writer’s voice.

This happened even when the prompt for the LLM was grammar edits.

Finding #4: LLMs Changed Vocabulary, Style, and Tone

Study on the Impact of LLMs on Language
LLMs used fewer pronouns, resulting in more formal writing - Image Source

Perhaps unsurprisingly, LLM edits didn’t just change the meaning of essays. They also changed how the essays were written at multiple levels.

The researchers found that LLMs:

  • Made more extensive vocabulary changes than humans when revising essays. LLMs replaced more of the original wording than humans, altering the individual voice and style in the writing overall.
  • Shifted writing toward a less personal, more formal style. LLM-edited essays tended to use fewer pronouns and more nouns, making them seem less personal.
  • Increased emotional language in revised essays. LLM edits increased overall emotional wording and even shifted to positive language when the original essay may have been more critical.
    • For essays about self-driving cars (the ArgRewrite-v2 topic), this could mean downplaying concerns about safety, job displacement, or ethical issues in favor of enthusiasm about technological progress.” - Study 
  • Produced more analytical, statistical writing. Essays from the heavy LLM user group were also generally more analytical than human-written essays, which incorporated personal experience.

These findings suggest that LLM use changed both what essays said and how they said it. Often, it made the writing more standard and less reflective of individual voice or experience.

Finding #5: AI Peer Reviews May Impact Science Institutions

Comparison of peer reviews in a Study
AI peer reviews were more likely to comment on scalability and reproducibility of research, while human ones were more likely to discuss clarity and relevance. - Image Source

With the ICLR 2026 dataset, the researchers found that LLM use in peer reviews can shift the final decisions, arguments, and even scores.

Not only do LLMs tend to give slightly higher scores than humans (“LLMs assign scores 10% higher than humans”), but their reviews also tend to comment on different strengths and weaknesses in scientific papers. 

While humans are more likely to comment on relevance of research and paper clarity, LLMs are much more likely to mention scalability and reproducibility as strengths and weaknesses.

Why These Findings Matter

The biggest takeaway from this study is that heavy LLM use may not just change how we write, but also what our writing says. 

That has implications for just about anyone who may choose to use AI to help with their written work, including writers, marketers, educators, students, and researchers.

  • Writers need to be aware that LLMs can do more than just polish prose; they can decrease creativity in writing and change voice/tone.
  • Marketers already know brand voice is tied to positioning, trust, and the unique way your company presents its point of view. When LLMs standardize wording, it can take away from what makes your brand sound specific and memorable.
  • For teachers and students, if heavy LLM use can shift stance, reduce originality, and make writing just generally feel less personal, it can significantly change essays and learning critical writing and argumentative essay skills.
  • For researchers, AI-assisted evaluation isn’t always neutral; an LLM’s higher scores and different evaluation criteria could affect how research is judged.

Final Thoughts

Sure, AI can make writing easier and faster; however, if that convenience comes at the cost of more standardized language, softer or shifted stances on issues, and less individual voice and creativity, then the tradeoffs may not be something that every writer is comfortable with.

Maintain transparency in what you’re writing with Originality.ai’s AI Checker.

Then, try out Deep Scan to find out how you can ethically make writing sound more human. 

Further reading:

Sherice Jacob

Sherice Jacob

Sherice Jacob is a seasoned copywriter and content professional fluent in English, Spanish, and Catalan, with over 25 years of experience crafting high-converting copy. Passionate about AI, she enjoys exploring the new innovations and possibilities it brings to the world of content creation.

Al Content Detector & Plagiarism Checker for Marketers and Writers

Use our leading tools to ensure you can hit publish with integrity!

Try our AI Checker now!

cross image
Free Tool Popup image

Sign up now!

Free Tool Image step1
Free Tool Image step2
Free Tool Image step3
Free Tool Image step4
Free Tool Image step5
Free Tool Image step1
Free Tool Image step2
Free Tool Image step3
Free Tool Image step4
Free Tool Image step5