AI Writing

Study finds popular LLMs make content more neutral in sentiment

We analyzed 100 articles for their sentiment, or how positive or negative they were, and then had them rewritten by three Large Language Models (LLMs): OpenAI’s ChatGPT, Anthropic’s Claude2, and Meta AI’s Llama2. The new texts’ sentiment scores were then analyzed for any changes.

Jonathan Gillham

We analyzed 100 articles for their sentiment, or how positive or negative they were, and then had them rewritten by three Large Language Models (LLMs): OpenAI’s ChatGPT, Anthropic’s Claude2, and Meta AI’s Llama2. The new texts’ sentiment scores were then analyzed for any changes.

Summary of Key Findings:

  1. LLM rewrites moved the sentiment scores closer to the middle, neutral part of the scale.
  2. Sentiment scores differed by LLM, with Llama2 having the most positive scores and Claude2 having the most negative.
  3. Rewritten articles were made shorter than the original, which could be part of the reason sentiment scores changed.

Study Method and Data:

  1. 100 articles from popular websites were rated by Sapien.IO’s Sentiment Analysis for how positive, neutral, or negative each was.
  2. We had three different LLMs (ChatGPT, Claude2, and Llama2) each paraphrase the article, and then analyzed the sentiment of the new text. These ratings were then compared to the original article’s sentiment rating.
  3. The sentiment rating given, along with each rewritten article’s word count, was analyzed for any relationships. 

Study Findings on AI Paraphrased Content

Sentiment analysis is the process of analyzing and categorizing texts as positive, neutral, or negative, and to what degree. It is often used to assess opinions and feelings expressed in reviews or open-ended questions in surveys.

Many of the stories studied here had sentiment made more neutral after generative AI rewrote them. In the Sentiment Analysis scale, 1 is highly negative, 5 is highly positive, and 3 is neutral. LLMs tended to move a story’s sentiment closer to 3, whether the original writing was more negative or positive. In the aggregate, the rewritten articles had their sentiment flattened.

Overall, the analysis showed no more than half a point in difference between the original articles’ average Sentiment Analysis score of 2.54 (slightly more negative than neutral) and the LLMs’ rewrite averages of 2.72 (Claude2), 2.95 (ChatGPT), and 3.08 (Llama2). However, those differences became pronounced when looking at articles that originally held sentiment scores of 1 or 5. In those cases, the rewrites differed by more than a point and up to 1.5 points on average, pulling toward a neutral 3. If the original scored a 1, the rewrites averaged 2.35. When the original was a 5, the rewrites averaged 3.56.

Fewer Words in LLM Rewrites

A possible explanation for the neutralization in sentiment could be that all three LLMs reduced the number of words when they rewrote articles. Claude2 reduced words by a notable 43.5%, compared to 13.5% for ChatGPT and 15.6% for Llama2. While shortening an article can be desirable for some purposes, the reduction might eliminate details or potent phrases that indicate how negative or positive the sentiment of the story is. Losing those details or descriptive words could be behind part of the movement toward a rating of 3, neutral, for stories with either the most positive or the most negative sentiment.

This study was small, but the data displayed suggests a slightly positive correlation between sentiment scores and word counts, with longer texts receiving higher scores. The trend was highlighted by comparing the three LLMs to each other. Across all levels of sentiment in the original articles, Claude2 consistently had both the lowest sentiment scores and the lowest word count, and Llama2 had the highest sentiment scores and highest word count.

Summary

Employing LLMs to rewrite or paraphrase another text can offer speed and ease in content production, but it comes with caveats. There might be a sound reason for coverage of a news event to have highly negative or positive sentiment, and dampening those qualities might prevent readers from perceiving how potentially troublesome or heartening an event might be. Outside of news content, publishers might desire to convey a particular kind of sentiment to evoke feelings in readers, and a neutral-scoring story might struggle to do so. On the other hand, there could be uses for making texts with more neutral sentiment that read more like “just the facts.” Publishers might want to consider the tone and purpose of a piece and know that LLMs might modify texts in ways that affect those goals.

Jonathan Gillham

Founder / CEO of Originality.AI I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Serious Content Publishers

Improve your content quality by accurately detecting duplicate content and artificially generated text.