Implementation of Machine Translation: Pangeanic case study at TMS Inspiration Days

Pangeanic has been invited as one of the 3 guest speakers at TMS Inspiration Days (19-20 April 2012) to showcase its transition from Language Service Provider to machine translation software application vendor.  Manuel Herranz’ talk will deal with the initial application of statistical models which were applied as a solution to increasing demand for automotive clients and how this internal project changed the company’s DNA. PangeaMT, Pangeanic’s feature-rich DIY SMT solution will be explained as a tool developed with the specific needs of the localization industry in mind, how language companies are freeing themselves from “the TM syndrome”, and embracing machine translation as a tool with which to offer new services and develop new business areas.

The DIY SMT solution has been revolutionizing the localization industry by providing a service for data cleaning tools, data preparation, engine creation and the freedom to retrain and update engines at will.

TMS Inspiration Days is an international conference focusing on the business and technology aspects of the translation industry. This 3rd edition of the event will be held from 19th to 20th April 2012 in Krakow (Poland) under the banner of “Technology for business”. The conference agenda already includes other two presentations: “Keynote: Overview of translation technology” by CSA’s Ben Sargent and “Selling in America” by Renato Beninato.

The conference will begin with the “Keynote: Overview of translation technology” lecture to be held by Ben Sargent from Common Sense Advisory. Ben has been involved in the translation industry since 1989. At CSA, he focuses on technology-related areas, particularly dealing with CMS and TMS tools. The lecture will focus on the major technological trends and solutions available on the translation market and their impact on the functioning of an enterprise.

PangeaMT Syntax-Based Hybrid presentation at Japan Translation Federation

Pangeanic’s CEO Manuel Herranz will take part in Japan’s largest LSP exhibition in Tokyo on Tuesday 29th November, the 21st Japan Translation festival at 15:15. A full program with bios can be found at  http://www.jtf.jp/jp/festival/festival_program.html

Manuel will be presenting at JTF on Tuesday 29th about the application of machine translation technologies in EU languages for localization and also our new development for EN –> JP. This is geared for LSPs and corporations which need to deal with sensitive data and create and maintain their own MT technologies in-house. With more and more security breaches, having the ability to offer not only TM but also machine translation pre-translation in-house and a post-editing environment is incresingly becoming a must for many companies which deal with sensitive data.

The exhibition will take place in 4-2-25, Kudan-Kita, Chiyoda-ku, Tokyo.

Machine translation has been applied very successfully for many EU languages at Pangeanic and within specific fields. We build engines for particular domains like automotive, bio/medicine and of course electronics. Basically any field which has a “controlled input”  is ideal for automation.  What is more, our latest offering the famous “DIY SMT” provides the ability to re-train engines and create new domains in-house, with no data transfer. This gives full ownership and control to LSPs and corporations alike.

This year’s presentation follows Pangeanic’s first introduction of SMT techniques for LSPs in 2010 (pictures below)

Iwanaga-san explaining the benefits of SMT for the localization industry

Iwanaga-san explaining the benefits of SMT for the localization industry

Manuel at BIJ - Pangeanic's booth

Manuel at BIJ - Pangeanic's booth

Pangeanic's Manuel and BIJ's Christina at the booth

Pangeanic's Manuel and BIJ's Christina at the booth

BIJ's Iwanaga-san at JTF

BIJ's Iwanaga-san at JTF

International Workshop: Using Linguistic Information for Hybrid Machine Translation

Pangeanic’s development team gathered important input on real advances on MT from the academia at the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT-2011) and the practical Saturday session ML4HMT (META NET WP2) in conjunction with DKFI. These sessions were really geared for development personnel and those with a personal interest in making use of the best of research on MT and its different flavours to improve current state-of-the-art systems. Attendants and presenters were academics from the US, European Union and Japan involved in different MT areas. The theme was state-of-the-art developments in combinations (often involving Moses) and hybridation of rule-based approaches with statistics. Sessions dealt with combined approaches using syntax, grammatical information, rules and statistical systems.

As different research teams are facing the same problems worldwide, some similar, other new and imaginative approaches are beginning to emerge, for example:

Alon Lavie and Manuel Herranz exchanging views on hybridation

Alon Lavie and Manuel Herranz exchanging views on hybridation.

    • Lemmatisation, annotation for morphologically-rich languages, for example Czech and Basque and even lesser resources in the case of the 2nd one.
    • Syntax-based approaches and word re-ordering for very unrelated languages (such as Asian or Semitic languages into and out of European languages)
    • Web-based annotation tools
    • Hybridisation of techniques, starting from analysis at a morphological layer, then analytical layers, tectogrammatical layers, and then transfer, and on to synthesis to t-layers, a-layer and m-layer.
    • Word disambiguation
    • Mixture of rule-based and statistical approaches to improve predictability.
    • Post-editing effort estimation for MT systems and with systems including no linguistic features or having some. Linguistic features are relevant for direct useful error detections and for automatic post-editing. But for sentence-level CE there are issues with sparsity and with representation.
    • New metrics like VERTa, using linguistic knowledge organised in different levels (lexical, morphological, syntactic information and sentence semantics)

      Next time you think languages, think Pangeanic