The Fundamental Reason Why Machine Translations Can Never Replace Translators

There is a lot of speculation about how machine learning and artificial intelligence could take creative jobs away from people.

Considering how quickly technology evolves and what machine learning is capable of, concerns about job security among English to Japanese translators is understandable.

In this article, I am going to explain why you can never rely on a machine translation to translate your website and marketing materials and why no matter how much technology evolves, machine translation can never replace adept translators.

Machines are not smart enough to learn a language

Machines are simply not smart enough to perform the intricate process of language learning:

An intelligent being cannot treat every object it sees as a unique entity, unlike anything else in the universe. One has to put objects in categories to be able to apply hard-won knowledge about similar objects, encountered in the past, to the object at hand. But whenever one tries to utilize a set of criteria to derive categories and subcategories, the category disintegrates. (Pinker, Steven. How the Mind Works. 1997)

Machine translation is all about rules. Before machine learning, machine translation systems used colossal bilingual dictionaries and hard-coded rules to calculate the word order in the final output, which, as you can imagine, does not translate well.

This is how things were in 1950. By the 21st century, computers had acquired more powerful computation abilities and a larger storage capacity, which enabled higher quality machine translations. However, the basic mechanism hasn’t changed: dictionaries and rigid, with every single rule of translation having to be taught to the computer.

A bachelor, of course, is simply an adult human male who has never been married. But now imagine that a friend asks you to invite some bachelors to her party. What would happen if you used the same definition to decide which people to invite?

  • Arthur has been living happily with Alice for the last five years. They have a two-year-old daughter and have never been officially married.
  • Charlie is 17 years old. He lives at home with his parents and is in high school.
  • David is 27 years old. He left home at 13, started a small business, and is now a successful young entrepreneur leading a playboy lifestyle in his penthouse apartment.

Merriam-webster defines “bachelor” an unmarried man, but we all know only David is a bachelor.

The straightforward definition of ‘bachelor’ does not adequately represent all the possibilities of who fits into the category. Knowing who a bachelor is is just common sense, but there’s nothing common about common sense. Somehow a human or robot brain must develop it. Moreover, common sense is not simply an almanac about life that can be dictated by a teacher or downloaded like an enormous database. No database could list all the facts we tacitly know, and no one ever taught them to us. (Pinker, Steven. How the Mind Works. 1997)

In How the Mind Works, Steven Pinker explains how complex and elegant the process of language learning of the human brain is.

Before machine learning, teaching a machine to recognize cats was impossible. This is because you had to explain what a cat looks like. For example, “if an object has 4 legs, big eyes, little pointy ears, and a hairy body, then it is a cat.” Of course, many things have 4 legs and big eyes, so computers would confuse cats and dogs. Also, it was necessary to explain what legs and eyes were, etc. Defining cats and creating a corresponding algorithm was nearly impossible.

Machine learning is the process of feeding computers with millions of pictures and tagging objects in images like cats, dogs, humans, churches, specific people, etc. Then, the computer creates its own algorithms to recognize cats and other objects like Google’s image recognition.

Google Image recognition

Learning a language is more difficult than that because language learning includes learning what a feeling, a metaphor, or a lie is. A picture of a cat is always just that, a picture of a cat, but the word “bachelor” can mean many different things. Additionally, the definition of some words can be different from one place to another, and the meaning of a word can always change in the future.

Translation is way too irregular for computers to create reliable algorithms

Translation is much more complex than learning isolated words in a language. From one language to another, the grammar, order of words, and metaphors are different.

The Japanese language has three scripts: hiragana, katakana, and kanji. “こころ”, “ココロ” and “心” are one word, “KOKORO,” written in hiragana, katakana, and kanji. It means something like a mind, heart, and soul.

“こころ,” which is written in hiragana, gives a warm impression to readers. “ココロ,” which is written in katakana, sounds cold and artificial because katakana is often used for foreign words, and many engineering terms are foreign words. Sci-Fi comics or novels would use ココロ for the minds of robots and こころ for the minds of humans. The kanji “心” often has a more masculine or formal connotation.

Teaching those differences to a machine is a lot more difficult than teaching a machine to recognize cats. Maybe you can have machines learn it by tagging “こころ” in a book as “warm” or “non-robotic” like you would tag “cat” in images. However, unlike images of cats, nuances of words can be different in different contexts or even with the use of different fonts and colors. There are too many variables for machines to learn.

Creating machines that can learn to find the closest match in a pair of languages is next to impossible.

Statistically, a machine won’t be able to perform a creative translation.

Machines can’t learn the subtleties of translation. Feed them 1,000 Sci-Fi books in one language and their translation into Japanese, and they will figure out statistically that ”mind” is translated to “ココロ”, not “こころ” or “心” when the word is used in texts related to technology. Machines won’t know what those words mean, but they can make the right choice of words through this kind of process.

Statistical machine translations were developed at the beginning of the 21st century. With statistical translations, rather than coding knowledge into software and creating a lexicon and grammatical rules for the translation of one specific language to another, large amounts of text in both languages are fed to the software, and the computer is programmed to analyze this data. Google is already working on developing this technology by asking users of Google Translate to proofread machine-translated texts.

Statistical translations would enable machines to learn complex grammar and appropriate syntax by itself.

However, this is still not good enough. Why is that? Marketing translation and creative translation are not merely translation. They consist of a process known as transcreation.

Machine translation cannot translate context

A language is always linked to its culture, so no matter how perfect a translation is, if the cultural elements of the language aren’t taken into account, the translated text will fail to communicate to the readers in the same manner as the writer intended.

Westerners’ confidence often appears rude or arrogant to Japanese people. Japanese modesty often appears as a lack of confidence to Westerners.

Game of Thrones or the Bible references are very common references in the West but not in Japan.

Arbitrarily, we all know what good writing and what bad writing are. If you are given two books, one written by a professional writer and another written by a high school student, you will most likely be able to guess which is written by the pro, but will not be able to explain why the pro’s writing is better.

Good writing is a very subjective concept, so I cannot tell you exactly what makes good transcreation, but we know a machine won’t be creative enough to add the explanation of a biblical reference in the translated version or to change a Game of Thrones’ reference to a Naruto reference.

Good writing is often seen as really bad writing to computers.

Machine translation and machine learning are used to give the machine data, to let people tag the data, and to allow the machine to generate a sorting algorithm. The machine creates the rules.

Good writing often breaks the rules.

Jack Kerouac is one of the best writers in history, but his writing is often grammatically incorrect and very unorthodox. If a machine was to judge the quality of his writing, it would be rated very poorly.

Very well-known slogans also look bad to computers:

“Got milk?” “Think outside of the box.” “I’m loving it.” “Just do it.”

Computers will never come up with this type of copywriting. They have nothing to do with computers, food, or sports.

Good writing is more than just writing. It’s a very complex and unquantifiable thinking process that machines cannot copy unless we invent machines that can pass a very high-level Turing test, but for now, that is really just science fiction, and it’s likely that if this happens, people will start claiming that robots have souls and should have human rights.


“What a piece of work is a man! How noble in reason, how infinite in faculty! In form and moving how express and admirable!” (Hamlet)

1 thought on “The Fundamental Reason Why Machine Translations Can Never Replace Translators”

  • Masaharu: I loved this article! Very good insights and summary of the history of machine translation. I liked, in particular, this phrase: “…there’s nothing common about common sense.” Did you follow the controversy that occurred in Spain recently when one of the government ministries commissioned a machine translation of a web page? One result, in particular, was hysterical, and poignantly (+comically) denounced by a Spanish translator. (This also got quite a bit of press in Spain, and, to its credit, once the hilarious error of the machine translation was made known, the government “rectified” the web page). One of the areas that interests me in this debate is: which language pairs are the most difficult for machine translation,i.e. French or Spanish to English, Japanese or Chinese to English, etc. I suspect there’s quite a bit of “room for discussion” when this variable is introduced.

    Regards,
    Colleen

    Colleen Roach, Ph.D.
    French & Spanish to English Translations

Leave a Reply

Your email address will not be published. Required fields are marked *