AI Studies

Does AI Translation Impact AI Detection? Originality.ai Study

Does AI translation impact AI detection? Get insight into whether translated content is still detectable by the Originality.ai AI detector!

Many writers, students, publishers, and SEO teams wonder whether translating AI-generated text can make it harder for AI detectors to identify or if translating human-written content makes it look more AI-generated.

Here at Originality.ai, we've taken a dataset of 498 human-written and 498 GPT-4o-written samples that were first created in English and then translated them into Spanish and Portuguese. We then did a round-trip back into English. 

Afterward, we took the dataset and analyzed each iteration of the samples to identify false positives (human-written text incorrectly detected as AI) and true positives (GPT-4o-written text correctly detected as AI) rates with Originality.ai AI detection.

Quick Study Overview (TL;DR)

For context, prior research has studied multilingual AI detection and back-translation as a detector bypass method. This study focuses on a practical localization workflow

  1. English content translated into Spanish and Portuguese with Google Translate
  2. Then translated back into English

Top questions answered by this study:

  • Does translating AI-generated content make it harder to detect?
  • Does round-trip translation back into English bypass AI detection?
  • Does translating human-written content increase the rate of false positives?

Key Study Findings:

  • Finding 1 - Human-Written Content & Direct Translation: After translation, false positives remained low. 
    • The original English samples had a false positive rate of 0.40%. 
    • After direct translation, the false positive rate remained low: 2.21% for Spanish and 0.40% for Portuguese. 
  • Finding 2 - Human-Written Content & Roundtrip Translation: Translation adds some noise to human text. 
    • When human-written English samples were translated into Spanish or Portuguese and then translated back into English, the false positive rate increased to 28.02% for the Spanish round-trip workflow and 27.84% for the Portuguese round-trip workflow. 
    • This suggests wording changes, sentence restructuring, or stylistic smoothing because of the machine translation.
  • Finding 3 - AI-Written Content & Direct/Round-trip Translation: Translation did not reduce AI detection accuracy.
    • The original English samples (GPT-4o) were identified as AI-written at a 100% true positive rate, and the translated Spanish and Portuguese versions were also identified at a 100% true positive rate. 

Why This Matters

Translation is a critical part of many business workflows. It helps with website localization, SEO content expansion, publisher/editorial review of translated submissions, and more. 

All of these scenarios have a similar solution in mind: using automated translations to provide information to a large multilingual audience in language(s) the owners of the information are not proficient in.

At the same time, AI content can have notable considerations associated with it. Content teams, educators, and publishers need to know whether:

  • Translated human-written content may be incorrectly flagged as AI-generated
  • Translation can be used to disguise AI-written content (and potentially bypass detection)

Methodology

A portion of the samples from this dataset was used for the English human-written and GPT-4o-written content. This dataset came from a study called A Comprehensive Dataset for Human vs. AI Generated Text Detection

Of the original dataset, 498 samples of human-written content and 498 samples of GPT-4o-written content were collected and then processed through our translation/AI scan pipeline.

These initial datasets were translated using Google Sheets’ Google Translate feature, from English to Portuguese and Spanish, and then back to English. 

Each iteration of the dataset had an AI detection rate calculated. 

English samples were scanned for AI using our Lite 1.0.2 model, and non-English results were scanned for AI using our multi-language model.

The Results

The results showed two clear patterns. 

  • First, translating GPT-4o-written content did not reduce AI detection performance in this test. 
  • Second, human-written content generally remained identifiable as human after direct translation into Spanish and Portuguese, but round-trip translation back into English increased the false positive rate.

Human-written content results

For human-written content, the original English samples had a false positive rate of 0.40%

After direct translation, the false positive rate remained low: 2.21% for Spanish and 0.40% for Portuguese. This suggests that direct translation into these languages did not, on its own, cause a major increase in human-written content being misclassified as AI-generated.

The more noticeable change happened after round-trip translation. 

When the human-written English samples were translated into Spanish or Portuguese and then translated back into English, the false positive rate increased to 28.02% for the Spanish round-trip workflow and 27.84% for the Portuguese round-trip workflow. This suggests that repeated automated translation can introduce wording changes, sentence restructuring, or stylistic smoothing that makes some human-written text appear more machine-like to an AI detector.

This distinction is important. A direct translation workflow, such as translating an English website page into Spanish or Portuguese, is different from a round-trip workflow where the text is translated and then translated back into English. 

In this study, the direct translation workflow had a low false positive rate, while the round-trip workflow added more noise to the text.

GPT-4o-written content results

For GPT-4o-written content, the results were consistent across every tested translation path

The original English samples were identified as AI-written at a 100% true positive rate, and the translated Spanish and Portuguese versions were also identified at a 100% true positive rate

The same was true after the samples were translated back into English. In other words, translation did NOT function as an effective bypass method for GPT-4o-written content in this study.

Study Findings: Summarized

The charts below summarize these findings. 

Human-written false positives remained low after direct translation but increased after round-trip translation.

By contrast, GPT-4o-written true positives remained consistent across the original English, direct translation, and round-trip English conditions.

Overall, the results suggest that AI translation and AI generation should be interpreted differently. 

Translating human-written content can affect AI detection scores, especially after multiple automated translation steps, but direct translation into Spanish and Portuguese remained low-risk in this test. 

At the same time, translating GPT-4o-written content did not make it harder for Originality.ai’s models to identify it as AI-written.

Considerations for Users

The biggest takeaway from this study is that translation did not function as an effective AI detection bypass for GPT-4o-written content. 

Across the tested Spanish and Portuguese workflows, AI-written samples remained consistently flagged as AI-written, including after direct translation and round-trip translation back into English.

For companies using translation as part of normal content workflows, the results are also encouraging. 

  • Human-written content that was translated directly into Spanish or Portuguese generally remained identifiable as human-written. 
  • This suggests that website localization, SEO content expansion, and other translation workflows can be used without automatically assuming that translated human content will be misclassified as AI.

However, the round-trip results are important. 

  • When human-written English content was translated into Spanish or Portuguese and then translated back into English, the false positive rate increased. 
  • This suggests that repeated automated translation can introduce wording changes, sentence smoothing, or other translation artifacts that make the text appear more machine-like.

The practical distinction is this: AI translation is not the same as AI authorship. 

A human-written article translated into another language has a different content history than an article generated from scratch by an AI model. AI detection scores on translated content should be interpreted with that context in mind, especially when the content has gone through multiple rounds of automated translation.

For publishers, educators, businesses, and content teams, the best approach is to treat AI detection as one signal in a broader review process. 

Directly translated human content should not be assumed to be AI-generated simply because a translation tool was used. At the same time, translation should not be assumed to hide AI-written content, as the GPT-4o-written samples in this study remained detectable throughout the tested translation workflows.

Final Thoughts

Our study found that translating GPT-4o-written content into Spanish or Portuguese did not reduce AI detection performance. AI-written samples remained consistently flagged as AI-written after direct translation and after round-trip translation back into English.

For human-written content, the initial translation into Spanish and Portuguese produced low false positive rates. 

However, round-trip translation back into English increased false positives, suggesting that repeated automated translation can add noise or machine-like phrases to otherwise human-written text.

The main takeaway is that AI translation and AI authorship should be treated differently. 

Translating human-written content can change the style of the text, especially after multiple translation steps, but direct translation is not the same as generating new content with AI. 

For content teams, publishers, educators, and businesses, translated content should be reviewed with context, while AI-written content should not be expected to bypass detection simply because it has been translated. 

Read More AI Detection Studies:

Additional Reading: Prior Research

Previous research has already looked at the relationship between translation and AI detection, especially in the context of bypassing detectors. 

ESPERANTO Study

One paper, ESPERANTO, tested whether back-translation could make AI-generated text harder to detect. 

Back-translation means translating text into another language, or multiple languages, and then translating it back into the original language. 

The researchers found that this type of translation-based rewriting could reduce detection performance for several detectors tested at the time. However, those results should be read in context: the study did not test Originality.ai’s current models, and detector performance can change significantly as models are updated.

Further, when our Originality.ai research team ran an extension of the study on Originality.ai models, Originality.ai’s AI detector continued to show robust and resilient performance.

MULTITUDE  Study

Other research has focused on the broader challenge of multilingual AI detection. The MULTITuDE benchmark was created to test AI-generated text detection across multiple languages, including Spanish and Portuguese. 

Its main relevance for this study is that AI detection should not be assumed to work the same way in every language. 

Multilingual detection needs to be tested directly, especially as both generators and detectors continue to improve. Read more about Originality.ai’s multilingual accuracy.

HERO Study

A related area of research also separates fully AI-generated text from other types of machine-assisted writing, such as machine-polished or machine-translated text.

For example, the HERO paper approached machine-translated human writing as a different category than text generated from scratch by an AI model. 

That distinction matters for publishers, educators, and businesses reviewing translated content. 

A human-written article translated with a tool like Google Translate has a different content history than an article created directly by GPT-4o.

Considerations From Previous Studies in the Context of This Study

Taken together, previous research suggests that translation can affect AI detection results, but the impact depends on the detector, language, translation process, and model being tested. 

In contrast to the risks identified in earlier research, this Originality.ai study found consistent results across the tested translation workflows: 

  • GPT-4o-written content remained flagged as AI-written after translation
  • Human-written content generally remained identifiable as human during the initial translation into Spanish and Portuguese. 
  • The initial direct-translation false positive rate remained low for Spanish and Portuguese. 
  • Human-written samples did receive higher AI scores after additional translation iterations, especially after round-trip translation.

Study Limitations

This study focused specifically on English content translated into Spanish and Portuguese using Google Translate. Results may differ with other languages, other translation tools, or more complex translation workflows.

The AI-written samples in this study were generated with GPT-4o. Other AI models may produce different writing patterns, and future models may behave differently as generation quality continues to improve.

The study also used different Originality.ai models depending on the language being scanned. English samples were scanned with Lite 1.0.2, while Spanish and Portuguese samples were scanned with the multi-language model. This reflects a practical real-world usage pattern, but it also means the English and non-English scans are not a one-to-one comparison of the same detector model.

Another limitation is that the dataset was based on a specific type of written content. Results may vary for short-form content, student essays, technical documentation, marketing copy, product reviews, or highly edited publisher content.

Finally, round-trip translation is not the same as a typical localization workflow. Most businesses translating website content from English into Spanish or Portuguese would not translate that content back into English before publishing. For that reason, the round-trip results are most useful for understanding how repeated automated translation can change text, rather than as a direct reflection of normal website translation.

Jonathan Gillham

Jonathan Gillham

Founder / CEO of Originality.ai I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

Al Content Detector & Plagiarism Checker for Marketers and Writers

Use our leading tools to ensure you can hit publish with integrity!

Try our AI Checker now!

cross image
Free Tool Popup image

Sign up now!

Free Tool Image step1
Free Tool Image step2
Free Tool Image step3
Free Tool Image step4
Free Tool Image step5
Free Tool Image step1
Free Tool Image step2
Free Tool Image step3
Free Tool Image step4
Free Tool Image step5