Guest blog by Yves Savourel, Vice President of R&D at Argos Multilingual
Machine translation (MT) has become a very important topic in the world of languages and translations. More and more companies have begun to apply MT as it can benefit their translation projects. But what exactly is machine translation and which different types exist? These are the points I’m going to look at more closely in the following post.
By definition, MT is a form of computational linguistics and language engineering that uses software to translate text or speech from one language to another. Basically, one source language word is substituted by a word in the target language in the machine translation process. But don’t confuse MT with computer-aided translation (CAT), which is when a human translator uses computer software to assist with the translation process; CAT tools don’t automatically translate content.
There are several machine translation engines that analyze and process content differently. The most common ones are rule-based machine translation and statistical machine translation.
Rule-Based Machine Translation (RBMT)
Rule-based engines use countless grammar and linguistic rules to analyze the content and break down the text. When using the rules, the grammatical structure of the source language gets transferred into the target language. Bilingual dictionaries are also used for the language pairs and customized terminology lists can be added to fine-tune the engine. By adding specific terminology on a certain topic or industry, a more reliable translation outcome can be created on this specific topic. Rule-based engines do not require bilingual corpora, also known as large and structured sets of texts, to create a translation system.
Rule-based engines produce quite predictable, but also very consistent, output due to the number of grammatical rules and dictionaries the translations are based on. Thanks to the set rules, every error can be corrected with a target rule. Thus, by adding more rules and more dictionaries or terminology, the translations can be improved.
Statistical Machine Translation (SMT)
Unlike RBMT, statistical machine translation does not analyze texts based on language rules. Instead this engine “learns” how to translate texts. Therefore, it analyzes huge amounts of data in the language pairs and then uses its statistical translation models to create the translation of the source content. This model is built by analyzing bilingual corpora and requires an appropriate volume of bilingual content to do so. With SMT, a specific topic or industry can also be focused on by providing more data relevant for the topic in question.
Machine Translation & Neural Networks
Machine translation is evolving. Since about 2013, Internet giants like Google and Microsoft have been exploring the possibility of using neural networks. Neural networks are statistical learning models that were first used in speech and image recognition technology. Using them in machine translation enables engines to train themselves on how to translate texts using a process that is similar to the way a human brain works, through patterns and structures. This process is called “deep learning” and it is based on principles established by implementing big data analytics.
Although neural machine translation (NMT) is a new approach, it is seen as a great breakthrough and has already become very popular among MT researchers, since it is clear that it improves the translation in most cases, offering an output that looks more fluid and more human.
They say that NMT creates more fluent translations and can reduce post-editing efforts by up to 25%. For some linguistic professionals, there is also no doubt anymore that neural machine translation is performing better than rule-based or statistical machine translation. NMT systems understand and see the similarity of words, consider entire sentences and learn complex relationships between languages (Source: 3 Reasons Why Neural Translation is a Breakthrough).
As each engine processes and generates data differently, the engine chosen for a project depends on the target languages and the availability of reference materials for the given source files. In general, machine translation works best with content that is repetitive and simple, where the same words are being reused and synonyms are minimized. There is no doubt about the advantages of MT: it increases productivity, reduces time-to-market and improves terminology consistency.
Argos Multilingual is a leading language service provider. At Argos our mission is to provide our customers with high-quality innovative language solutions as a respected business partner in the localization industry. The company provides a full range of language translation services that cover all needs. Machine Translation is one of Argos Multilingual‘s most important Consulting Services.
Please see the Machine Translation section of this blog for more info about MT and PEMT (MTPE).