Pangeanic was the first translation company in the world to make commercial use of Moses as reported at the Association for Machine Translation in the Americas (AMTA) in 2010 and the European Union project Euromatrixplus.
Dozens of corporations, businesses and language service providers, have benefited by a flexible approach that is user-centric and provides the highest levels of control, customization and ownership to the users. Pangeanic has developed and used machine translation for many applications. It has reported successful use cases for many of its clients at industry events like Localization World Barcelona 2011, Localization World Paris 2012, Localization World London 2013, as well as numerous TAUS summits in the United States, Europe and in Japan, META Forum Berlin 2013 and Japan Translation Federation.
Pangeanic was also one of the largest donors of training data to TAUS, which in turn provided access to millions of words as training corpus. This enhanced PangeaMT platform and let our team the opportunity to experiment further, with millions and millions of aligned sentences. Machine translation became part of company culture since 2009. Since then, machine translation services to corporations and even other translation companies have become part of Pangeanic’s range of services. From 2012 to 2016, Pangeanic has been a member of the EU’s Marie Curie action EXPERT Project, advancing the state-of-the-art with young and experienced researchers. PangeaMT is Pangeanic’s own, independent translation technology division with a clear focus on customized, domain-specific Machine Translation (MT). The current version of the platform is v3.
As a forward-thinking and technology-savvy translation company, Pangeanic wins a post-editing contract in 2007 to work for the European Commission as MT output post-editors. It is at this time when we become acquainted with institutional user needs and (re-)evaluated several commercial MT products we had been using. Soon we decided to develop our own machine translation technology. Pangeanic was quoted as the first language service provider to make commercial use of Moses in EU’s Framework development program euromatrixplus.net (the second, more perfected release of Moses). Since then, many presentations, awards and implementations have followed, and Pangeanic has made a name for itself as a leading machine translation implementation company. It also markets its machine translation services in other areas beyond the translation industry and is heavily involved in two more EU machine translation R&D programs, EXPERT and Casmacat (User Group).
We began as keen followers of the statistical-driven paradigm of machine translation. This worked very well in several related languages (Romance languages and English, German and Scandinavian languages). However, our links to Japanese industry soon provided requests to add Japanese and Chinese to our service portfolio. In 2011, Pangeanic developed hybrid machine translation services which were included as part of the system features.
Despite our Moses bias, we have been able to overcome many of Moses shortcomings in order to fit the needs of the translation industry: our solutions go beyond text-based MT and are capable of taking input and producing output in industry-standards, such as TMX and XLIFF. PangeaMT provides API access to other translation platforms so you do not need to change your translation environment but you can benefit from adding your future translations in a virtuous re-training cycle. Using open standards means that you will never have to buy expensive TM software again. Our solutions just avoid having you locked-in by expensive upgrades year after year. Another PangeaMT breakthrough is our inline mark-up parser. PangeaMT handles tags extremely efficiently. Statistical machine translation systems (as they come from open sources releases) usually produce plain text output because this is also the format they process. However, we are keen to see PangeaMT solutions in use and adapted to the most demanding language industry requirements. We focused our effort on developing SMT engines capable of handling in-line coding typical of other content formats used in localization production environments. Thanks to this parser, PangeaMT can identify in-lines without attempting to translate them, and it places them back in the resulting text, too. An in-line placeholder acts first by copying and transferring all XML and code information to a separate module. The translation engine does its work and then places the in-line back into the translated segment. At the time of its release, our in-line parser constituted an innovation well-above the current level of maturity of well-known SMT systems. We keep learning and improving with every development commissioned by an existing or new client and language combination. We therefore remain open as to apply new hybridization techniques, even ad-hoc rules, that we research and implement ourselves or co-develop in conjunction with our clients. We are aware of the fact that for some language combinations it will be necessary to resort to some linguistic-informative techniques that will be part of the pre- or post-processing phases. Right word and phrase reordering in the MT output is not an easy goal to achieve, especially when the languages involved are not closely linked from a linguistic family standpoint, or when one of the two languages is a really flexible and so MT-challenging word order (WO). Some language-specific fixing procedures may come handy. In some other cases, it may be useful to use one language as pivot to train engines in languages that are not close. These and other techniques may be used or taken as a basis for expanding our PangeaMT solution palette. Please visit our machine translation division website to learn more about PangeaMT.