Back to previous page

Translation Memory Data Cleaning

Leveraging the power of your translation memory (TM) in new projects requires a clean dataset. The challenge, however, is to ensure data quality in project after project without sacrificing time – and cost – efficiency gains. 

With our new AI-driven offering, Argos brings automation speed to translation memory data cleaning. The cleaned data boosts the wins of machine learning (ML) in localization – namely consistent high quality, lower costs, and faster time-to-market results across projects, products, and services.

Cleaning TM data

Translation memories (TMs) have been the unsung heroes of the localization industry for almost 30 years. They’ve stored billions of words and helped saved millions in costs. They’ve improved content quality and project consistency.

But TMs age over time. They need maintenance. They need care. Without these, their high value gradually decreases. And project outcomes suffer.

Argos is committed to delivering high-quality translation content that brings value to your products and services. Improving linguistic assets – by cleaning translation memory data – can mean consistent results, lower production costs, and faster time-to-market results.

The symptoms of aging TMs

In the past, cleaning TMs has been a long, tedious, and manual process. It was even more difficult when using multiple vendors. Which is why it was done rarely (if ever).

Sidestepping TM cleaning often results in:

  • reused 100% matches and context matches without review, introducing TM errors
  • multiple translations for the same source segment
  • inconsistencies between corporate terminology and the TM terms
  • very old (and often obsolete) entries in TMs

The Argos TM cleaning solution

We can now use AI to automate TM cleaning – restoring data quality, consistency, and cost efficiencies to the localization process.

Argos AI engineers have developed a process that works with:

  • TM content distribution
  • language detection models
  • targeted regular expressions

All at a fraction of the cost and time needed for manual review.

Our multiple AI networks work in harmony. Layer 1 addresses TM quality distribution by categorizing the TM segments. The Layer 2 AI is trained to analyze the target and source to determine if a segment should be translated or not.

These are augmented by other TM cleaning checks:

  • Consistency: Is one text source translated in multiple ways? (This excludes inconsistencies that are intentional or context dependent.)
  • Numbers: Are there numerical differences between source and text?
  • Glossary: Does it adhere to terminology? (For large TMs, we recommend 100 or fewer terms.)
  • Regular expressions: Are there differences in units of measurement, spelled out numbers, or symbols?

Argos carries out the process over four phases.

The Four-Phase Process

Phase 1: Scoping
We help our clients define the scope of content for review and auxiliary services. The resulting QA report for auxiliary services and AI services shows the overall health and weaknesses of the TMs.

Phase 2: Quoting
We use our AI-driven system to process the TMs and generate a TM distribution report. The quote is based on the total number of issues.

Phase 3: Implementation
The automated TM cleaning is examined by Argos linguists. The removal of duplicate TM segments is followed by consistency checks and a final spot-check.

Phase 4: Finalization
We issue a final, easy-to-understand report comparing before and after cleaning, detailing the number of errors fixed and TM distribution.

Benefits of TM cleaning

The goals of automated TM cleaning are multiple. We expect to improve linguistic quality in all projects that use the AI-cleaned TM. Clients can count on reduced costs as context matches and 100% matches will no longer need to be manually reviewed. And linguists can enjoy a more streamlined process – no more wasted time on reviewing context matches and editing fuzzy matches, because automation quickly and accurately identifies the areas requiring linguistic review.