Pangeanic was the first translation company in the world to make commercial use of the statistical machine translation system Moses as reported at the Association for Machine Translation in the Americas (AMTA) in 2010 and the European Union project Euromatrixplus. Nowadays, Pangeanic’s neural machine translation engines are first-of-class and have been chosen by US government agencies, the European Union and Member States, as well as many translation companies.
Dozens of corporations, businesses and language service providers, have benefited by a flexible approach that is user-centric and provides the highest levels of control, customization and ownership to the users.
PangeaMT is Pangeanic’s own, independent translation technology division with a clear focus on customized, domain-specific Machine Translation (MT)
Neural Machine Translation
It is a general agreement that Neural Machine Translation (NMT) has surpassed Statistical Machine Translation (SMT) in terms of fluency and adequacy when humans read the texts produced by the software. NMT uses a large artificial neural network that resembles what happens in the human brain with thousands of connections. One of the main advantages of NMT is that the context of the translation is much longer than SMT (phrase-level translation). Currently, developers mostly use sequence-to-sequence approaches where the full context of the sentence is taken into account. Accuracy and fluency of the translations increase with the use of NMT. Other advantages of NMT in respect to SMT are that NMT only requires a fraction of the memory needed by SMT and all parts of the NMT models are trained jointly (end-to-end approach) in order to maximize the target translation performance. Pangeanic is at the forefront of research and development of translation technologies incorporating NMT, embedding it in different processes.
Pangeanic has developed and used machine translation for many applications. It has reported successful use cases for many of its clients at industry events like Localization World Barcelona 2011, Localization World Paris 2012, Localization World London 2013, as well as numerous TAUS summits in the United States, Europe and in Japan, META Forum Berlin 2013 and Japan Translation Federation.
Pangeanic was also one of the largest donors of training data to TAUS, which in turn provided access to millions of words as training corpus. This enhanced PangeaMT platform and provided our team with the opportunity to experiment further, with millions and millions of aligned sentences. Machine translation became part of company culture since 2009. Since then, machine translation services to corporations and even other translation companies have become part of Pangeanic’s range of services. From 2012 to 2016, Pangeanic has been a member of the EU’s Marie Curie action EXPERT Project, advancing the state-of-the-art with young and experienced researchers.
History of our Machine Translation Solutions
As a forward-thinking and technology-savvy translation company, Pangeanic wins a post-editing contract in 2007 to work for the European Commission as MT output post-editors. It is at this time when we become acquainted with institutional user needs and (re-)evaluated several commercial MT products we had been using. Soon we decided to develop our own machine translation technology. Pangeanic was quoted as the first language service provider to make commercial use of Moses in EU’s Framework development program euromatrixplus.net (the second, more perfected release of Moses). Since then, many presentations, awards and implementations have followed, and Pangeanic has made a name for itself as a leading machine translation implementation company. It also markets its machine translation services in other areas beyond the translation industry and is heavily involved in two more EU machine translation R&D programs, EXPERT and Casmacat (User Group).
Pangeanic obtained the biggest contract for machine translation infrastructures for the European Commission (2017) with its iADAATPA project. Neural machine translation technology has been integrated in Pangeanic’s workflow to benefit its clients with faster translation turnarounds. Our neural networks-based engines also serve EU projects, US government agencies and international companies on the cloud and on-premise.
Language Pairs and Combinations
PangeaMT was developed upon a large statistical framework, with quality estimation and re-training.
Statistics worked very well in several related languages (Romance languages and English, German and Scandinavian languages). However, our links to Japanese industry soon provided requests to add Japanese and Chinese to our service portfolio. In 2011, Pangeanic developed hybrid machine translation services which were included as part of the system features. In 2017, all our system were migrated to a new framework based on neural networks.
Features – No lock-in machine translation services
Despite our initial statistical roots, we were able to overcome many of Moses shortcomings in order to fit the needs of the translation industry: our solutions go beyond text-based MT and are capable of taking input and producing output in industry-standards, such as TMX and XLIFF. PangeaMT provides API access to CAT tools, so you do not need to change your translation environment: you can benefit from adding your future translations in a virtuous re-training cycle.
Our solutions just avoid having you locked-in by expensive upgrades year after year
Another PangeaMT breakthrough is our inline mark-up parser. PangeaMT handles tags extremely efficiently. Statistical machine translation systems (as they come from open sources releases) usually produce plain text output because this is also the format they process. However, we are keen to see PangeaMT solutions in use and adapted to the most demanding language industry requirements. We focused our effort on developing SMT engines capable of handling in-line coding typical of other content formats used in localization production environments. Thanks to this parser, PangeaMT can identify in-lines without attempting to translate them, and it places them back in the resulting text, too. An in-line placeholder acts first by copying and transferring all XML and code information to a separate module. The translation engine does its work and then places the in-line back into the translated segment. At the time of its release, our in-line parser constituted an innovation well-above the current level of maturity of well-known SMT systems. We keep learning and improving with every development commissioned by an existing or new client and language combination. We therefore remain open as to apply new hybridization techniques, even ad-hoc rules, that we research and implement ourselves or co-develop in conjunction with our clients. We are aware of the fact that for some language combinations it will be necessary to resort to some linguistic-informative techniques that will be part of the pre- or post-processing phases. Right word and phrase reordering in the MT output is not an easy goal to achieve, especially when the languages involved are not closely linked from a linguistic family standpoint, or when one of the two languages is a really flexible and so MT-challenging word order (WO). Some language-specific fixing procedures may come handy. In some other cases, it may be useful to use one language as pivot to train engines in languages that are not close. These and other techniques may be used or taken as a basis for expanding our PangeaMT solution palette. Please visit our machine translation division website to learn more about PangeaMT.