
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pangeanic, language service provider, translation services, machine-translation services</title>
	<atom:link href="http://www.pangeanic.com/feed/?lang=en" rel="self" type="application/rss+xml" />
	<link>http://www.pangeanic.com</link>
	<description>Pangeanic, language service provider, translation services, machine-translation services</description>
	<lastBuildDate>Wed, 30 Jun 2010 18:59:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Pangeanic welcomes Sony Europe Localization Director</title>
		<link>http://www.pangeanic.com/2010/06/30/pangeanic-welcomes-sony-europe-localization-director/?lang=en</link>
		<comments>http://www.pangeanic.com/2010/06/30/pangeanic-welcomes-sony-europe-localization-director/?lang=en#comments</comments>
		<pubDate>Wed, 30 Jun 2010 18:59:07 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com/?p=556</guid>
		<description><![CDATA[While we were attending the last EAMT Conference (see dedicated post in our News), Pangeanic also welcomed Sony Professional Europe Localization Director, Salomé López-Lavado. Pangeanic has been a realiable language vendor and technology consultant for several years.
The visit highlighted and strengthened our relationship even further by focusing on the expansion and deployment of our PangeaMT [...]]]></description>
			<content:encoded><![CDATA[<p>While we were attending the last <a href="http://www.eamt2010.org/">EAMT Conference</a> (see dedicated <a href="http://www.pangea.com.mt/?p=495">post </a>in our News), Pangeanic also welcomed Sony Professional Europe Localization Director, Salomé López-Lavado. Pangeanic has been a realiable language vendor and technology consultant for several years.</p>
<p>The visit highlighted and strengthened our relationship even further by focusing on the expansion and deployment of our PangeaMT customized MT solution for Sony Europe in several languages and exploring the integration of our technologies in tailor-made, corporate globalization management environments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2010/06/30/pangeanic-welcomes-sony-europe-localization-director/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pangeanic, first LSP to create SMT division deploying TDA data, TAUS Data Association partner highlights</title>
		<link>http://www.pangeanic.com/2010/06/29/pangeanic-first-lsp-to-create-smt-division-deploying-tda-data-taus-data-association-partner-highlights/?lang=en</link>
		<comments>http://www.pangeanic.com/2010/06/29/pangeanic-first-lsp-to-create-smt-division-deploying-tda-data-taus-data-association-partner-highlights/?lang=en#comments</comments>
		<pubDate>Tue, 29 Jun 2010 16:02:58 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com/?p=552</guid>
		<description><![CDATA[TAUS Data Association (TDA) highlights in a new online report the fact that Pangeanic is a first example of a LSP company that, making extensive use* of TDA data, has succeeded in creating a new Statistical Machine Translation division, PangeaMT. In so doing, Pangeanic then evolves from being a well-established language service provider to becoming [...]]]></description>
			<content:encoded><![CDATA[<p>TAUS Data Association (TDA) highlights in a new <a href="http://www.tausdata.org/index.php/news/news/113">online report</a> the fact that Pangeanic is a first example of a LSP company that, making extensive use* of TDA data, has succeeded in creating a new Statistical Machine Translation division, PangeaMT. In so doing, Pangeanic then evolves from being a well-established language service provider to becoming an innovative language technology solution provider that supports and benefits from globalization industry data-geared initiatives, such as TAUS´s TDA.</p>
<p>PangeaMT provides industry specific statistical machine translation (SMT) engines for automotive, consumer electronics, and industrial sectors. The service was launched in 2009 at a recent TAUS User Conference with an offer to train engines for free for companies seriously looking into deploying open source MT with a TMX workflow. If you would like to know more about PangeaMT´s current Spring campaign, please contact us.</p>
<p>    * Worth pointing out that Pangeanic leads the TDA data downloaders´ list, well ahead companies, such as Lionbridge, Oracle or WeLocalize. Downloaded data: 302,334,953 words. Info collated by TAUS Data Association and distributed to their partners at the end of March 2010.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2010/06/29/pangeanic-first-lsp-to-create-smt-division-deploying-tda-data-taus-data-association-partner-highlights/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pangeanic CEO to speak in Localization World 2010</title>
		<link>http://www.pangeanic.com/2010/06/01/pangeanic-ceo-to-speak-in-localization-world-2010/?lang=en</link>
		<comments>http://www.pangeanic.com/2010/06/01/pangeanic-ceo-to-speak-in-localization-world-2010/?lang=en#comments</comments>
		<pubDate>Tue, 01 Jun 2010 08:23:31 +0000</pubDate>
		<dc:creator>Pangeanic</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com/?p=539</guid>
		<description><![CDATA[Mr M Herranz, Pangeanic&#8217;s CEO, will take part in the MT in the Real World discussion panel within the Localization World 2010 conference in Berlin on Wed, 9th June. As highlighted by its title, this session focuses on authentic MT implementations and practices, moving away from mere sales talk to comprobable facts and informative experiences. From our own standpoint, Manuel [...]]]></description>
			<content:encoded><![CDATA[<div style="background-color: #ffffff;margin: 0px;font: 13px/19px Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;padding: 0.6em">Mr <a href="http://www.localizationworld.com/lwber2010/speakers.php#mHerranz">M Herranz</a>, Pangeanic&#8217;s CEO, will take part in the <a href="http://www.localizationworld.com/lwber2010/programDescription.php#C7">MT in the Real World</a> discussion panel within the <a href="http://www.localizationworld.com/">Localization World</a> 2010 conference in Berlin on Wed, 9th June. As highlighted by its title, this session focuses on authentic MT implementations and practices, moving away from mere sales talk to comprobable facts and informative experiences. From our own standpoint, Manuel will stress what it takes to make PangeaMT system implementation sucessful. Discussion across the panelist team, made up of MT buyers and providers, as well as with the audience, is meant to be highly interactive, providing a practical insight on current projects and results and outlining ongoing work and future challenges.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2010/06/01/pangeanic-ceo-to-speak-in-localization-world-2010/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New version of PangeaMT for German</title>
		<link>http://www.pangeanic.com/2010/05/31/new-version-of-pangeamt-for-german/?lang=en</link>
		<comments>http://www.pangeanic.com/2010/05/31/new-version-of-pangeamt-for-german/?lang=en#comments</comments>
		<pubDate>Mon, 31 May 2010 08:28:13 +0000</pubDate>
		<dc:creator>Pangeanic</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com/?p=544</guid>
		<description><![CDATA[Pangeanic follows a constant improvement policy with regard to all PangeaMT developments. We are a forward-looking, innovation-driven company, always eager to test, adopt and apply the latest MT-related techniques, also for the languages that are being used or considered by our customers.Elia Yuste at our Business Development Department.
An example of this would be the improved version [...]]]></description>
			<content:encoded><![CDATA[<p><span style="text-indent: 0px;border-collapse: separate;font: medium 'Times New Roman';letter-spacing: normal;color: #000000"><span style="line-height: 19px;font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;font-size: 13px">Pangeanic follows a constant improvement policy with regard to all PangeaMT developments. We are a forward-looking, innovation-driven company, always eager to test, adopt and apply the latest MT-related techniques, also for the languages that are being used or considered by our customers.</span><a href="mailto:%20e.yuste@pangeanic.com" target="_blank">Elia Yuste</a> </span>at our Business Development Department.</p>
<p>An example of this would be the improved version that PangeaMT has been made available today to Sybase, one long-standing client for the English-German language pair.<br />
This version makes use of special tokenization techniques and post-processing modules to reach a considerably better output.</p>
<p>The version is customization of Moses that integrates features such as<br />
* TMX generator (for TMX input and output)<br />
* TXT data handling<br />
* Inline parser to handle tags and formatting information contained in TMX directed to documentation (HTML, FrameMaker, InDesign, Word, etc)<br />
* Cygwin integration</p>
<p>If you are considering the implementation of MT in your workflow, please contact Ms</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2010/05/31/new-version-of-pangeamt-for-german/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pangeanic to visit associated companies in Japan and China</title>
		<link>http://www.pangeanic.com/2010/05/31/pangeanic-to-visit-associated-lsp-companies-in-japan-and-china/?lang=en</link>
		<comments>http://www.pangeanic.com/2010/05/31/pangeanic-to-visit-associated-lsp-companies-in-japan-and-china/?lang=en#comments</comments>
		<pubDate>Mon, 31 May 2010 08:00:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com/?p=548</guid>
		<description><![CDATA[Following our participation in TAUS Tokyo Summit in April, Mr M Herranz, our CEO will visit B.I. Japan in Tokyo and B.I. China in Shanghai. Apart from discussing in detail ongoing joint business operations with these outstanding language service providers in Asia, with whom Pangeanic has been working in association for a number of years [...]]]></description>
			<content:encoded><![CDATA[<p>Following our participation in <a href="http://www.pangea.com.mt/2010/04/pangeamt-presented-tokyo/">TAUS Tokyo Summit</a> in April, Mr M Herranz, our CEO will visit B.I. Japan in Tokyo and B.I. China in Shanghai. Apart from discussing in detail ongoing joint business operations with these outstanding language service providers in Asia, with whom Pangeanic has been working in association for a number of years now, the main goal will be to explore further business avenues in connection with PangeaMT. Our translation automation solutions are already internally in use within Pangeanic for major localization accounts derived from these Asian partners, especially those ascribed to the automotive sector.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2010/05/31/pangeanic-to-visit-associated-lsp-companies-in-japan-and-china/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PangeaMT to be presented in Tokyo</title>
		<link>http://www.pangeanic.com/2010/04/10/pangeamt-to-be-presented-in-tokyo/?lang=en</link>
		<comments>http://www.pangeanic.com/2010/04/10/pangeamt-to-be-presented-in-tokyo/?lang=en#comments</comments>
		<pubDate>Sat, 10 Apr 2010 09:12:19 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com/?p=532</guid>
		<description><![CDATA[PangeaMT will be introduced in Japan as part of TAUS Tokyo Summit from 14th-16th April, 2010.
The Tokyo summit will be the first of its kind in the country and it will have a strong focus on Use Cases and MT practical applications to localization workflows for Japanese industries or the Japanese language.
PangeaMT will feature as [...]]]></description>
			<content:encoded><![CDATA[<p>PangeaMT will be introduced in Japan as part of <a href="http://translationautomation.com/events/forums/taus-executive-forum-localization-business-innovation-focus-on-asia.html">TAUS Tokyo Summit</a> from 14th-16th April, 2010.<br />
The Tokyo summit will be the first of its kind in the country and it will have a strong focus on Use Cases and MT practical applications to localization workflows for Japanese industries or the Japanese language.<br />
PangeaMT will feature as a leader in open standards implementation, with a strong focus in compatibility via its TMX and XLIFF workflows.<br />
If you would like to speak to a representative of PangeaMT, please <a href="mailto:eyuste@pangea.com.mt">email us</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2010/04/10/pangeamt-to-be-presented-in-tokyo/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FEDER funds award to develop Statistical Machine Translation</title>
		<link>http://www.pangeanic.com/2010/01/15/feder-funds-award-to-develop-statistical-machine-translation/?lang=en</link>
		<comments>http://www.pangeanic.com/2010/01/15/feder-funds-award-to-develop-statistical-machine-translation/?lang=en#comments</comments>
		<pubDate>Fri, 15 Jan 2010 11:12:49 +0000</pubDate>
		<dc:creator>Pangeanic</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com/?p=509</guid>
		<description><![CDATA[Pangeanic has been awarded EU funds under the FEDER programme and Valencia&#8217;s local government IMPIVA in order to develop English-Spanish Statistical Machine Translation prototypes.
The award number is IMIDTA/2009/741.  For Pangeanic, this marks the beginning of a series of developments into other European languages and different combinations to service both industry and institutions.
The award corroborates the company&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>Pangeanic has been awarded EU funds under the FEDER programme and Valencia&#8217;s local government IMPIVA in order to develop English-Spanish Statistical Machine Translation prototypes.</p>
<p>The award number is IMIDTA/2009/741.  For Pangeanic, this marks the beginning of a series of developments into other European languages and different combinations to service both industry and institutions.</p>
<p>The award corroborates the company&#8217;s long-term drive to implement, develop and offer customized translation automation solutions that accelerate and cut multilingual translation costs.</p>
<p><img class="alignnone size-full wp-image-511" src="http://www.pangeanic.com/wp-content/uploads/2010/01/Feder.png" alt="Feder" width="50" height="35" /> <img class="alignnone size-full wp-image-510" src="http://www.pangeanic.com/wp-content/uploads/2010/01/inpiva.jpg" alt="inpiva" width="98" height="35" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2010/01/15/feder-funds-award-to-develop-statistical-machine-translation/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pangeanic only Spanish LSP to be mentioned in EU report</title>
		<link>http://www.pangeanic.com/2009/12/17/pangeanic-only-spanish-lsp-to-be-mentioned-in-eu-report/?lang=en</link>
		<comments>http://www.pangeanic.com/2009/12/17/pangeanic-only-spanish-lsp-to-be-mentioned-in-eu-report/?lang=en#comments</comments>
		<pubDate>Thu, 17 Dec 2009 19:34:19 +0000</pubDate>
		<dc:creator>Pangeanic</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.pangeanic.com/?p=485</guid>
		<description><![CDATA[Pangeanic has been mentioned as one of very few LSPs that are embracing technology and leading the way in deployment and tuning of open-source machine translation (MT) solutions to particular needs in the recent EU report &#8220;Studies on translation and multlinguism &#8211;  The size of the language industry in the EU&#8221;, pg 83.
The strategy of investing in [...]]]></description>
			<content:encoded><![CDATA[<p>Pangeanic has been mentioned as one of very few LSPs that are embracing technology and leading the way in deployment and tuning of open-source machine translation (MT) solutions to particular needs in the recent EU report <a href="http://ec.europa.eu/dgs/translation/publications/studies/size_of_language_industry_en.pdf" target="_blank">&#8220;Studies on translation and multlinguism &#8211;  The size of the language industry in the EU&#8221;</a>, pg 83.</p>
<p>The strategy of investing in people who can bring new skills and customize fit-for-purpose solutions to particular machine-translation applications is highlighted in the report, which places importance in access to data via initiatives like <a href="www.tausdata.org" target="_blank">TDA</a>.</p>
<p>Whilst large sets of data do not automatically translate in perfect statistical machine translation engines, the selection and customization of TM as well as other data is one of the key points in developments designed to accelerate language transfer, bringing time and cost savings to companies.</p>
<p>The study also reports on large LSPs like SDL and Lionbridge and the possible MT &#8221;lock&#8221; strategies behind their marketing.</p>
<p><a href="Pangeanic.com.MT" target="_blank">Pangeanic.com.MT</a> is the statistical machine translation division, with a mission to build and adapt SMT solutions that will work effectively in particular applications and domains, with client data and customized sets.</p>
<p>The report can be downloaded from the link below.</p>
<p><span style="font-family: 'MS Shell Dlg';font-size: 12px"><a href="http://ec.europa.eu/dgs/translation/publications/studies/size_of_language_industry_en.pdf" target="_blank">http://ec.europa.eu/dgs/translation/publications/studies/size_of_language_industry_en.pdf</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2009/12/17/pangeanic-only-spanish-lsp-to-be-mentioned-in-eu-report/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PangeaMT with TDA tests provide up to 50% more</title>
		<link>http://www.pangeanic.com/2009/10/12/pangeamt-with-tda-tests-provide-up-to-50-more/?lang=en</link>
		<comments>http://www.pangeanic.com/2009/10/12/pangeamt-with-tda-tests-provide-up-to-50-more/?lang=en#comments</comments>
		<pubDate>Mon, 12 Oct 2009 14:57:53 +0000</pubDate>
		<dc:creator>Pangeanic</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://localhost/wordpress_pangeanic/?p=364</guid>
		<description><![CDATA[Valencia,   1 October 2009.
Pangeanic   conducted a series of tests with PangeaMT for specific language domains by combining its own statistical data with data obtained from TAUS&#8217;s TDA during late September. The aim of the test was to prove that increased amounts of trustable, regular data from TDA would help Pangeanic&#8217;s own [...]]]></description>
			<content:encoded><![CDATA[<p align="left">Valencia,   1 October 2009.</p>
<p align="left">Pangeanic   conducted a series of tests with PangeaMT<a name="PangeanicMT"></a> for specific language domains by combining its own statistical data with data obtained from <a href="http://www.translationautomation.com/">TAUS</a>&#8217;s <a href="http://www.tausdata.org/">TDA </a>during late September. The aim of the test was to prove that increased amounts of trustable, regular data from TDA would help Pangeanic&#8217;s own technologies to improve output percentage quality, and to open up new domain developments.</p>
<p align="left"><span style="font-family: Arial, sans-serif; color: #ff6633;"><span style="font-size: medium;"><span style="text-decoration: underline;"><strong>Background</strong></span></span></span></p>
<p align="left">&gt;Version   1 was a development concerned mainly with technical/engineering,   electronics and automotive industries for general, user-manuals and   scientific journal publication. Version 2 (PangeaMT) builds on that   experience and adds several new areas: Software (SOF), Consumer and   Professional Electronics + Computer Hardware (ECH),   Marketing-Business-Economics (MBE), Legal-Pro (LEG),   Healthcare-Pharma-Life Sciences (HEALTH).</p>
<p align="left"><a name="pangeanicMT"></a>PangeaMT   is based on a Moses engine enhanced with an applied set of heuristics   according to each language in question. The translation process is   fully TMX-based. The concept is to have SMT  acting as a plug-in to   existing systems, not as an alternative solution or technology. It   also integrates a parser that can interpret code/tags in the TMX and   place it in the resulting translated segment. Post-editing can take   place in any environment, thus resulting in an application-agnostic SMT plug-in.</p>
<p align="left">
<p align="left"><span style="font-family: Arial, sans-serif; color: #ff6633;"><span style="font-size: medium;"><span style="text-decoration: underline;"><strong>Data</strong></span></span></span></p>
<p align="left">Three   domains were selected for the test in the English-Spanish language   pair (no distinction as to Lat.Am/EU), with the following number of files:</p>
<ul>
<li>ECH (Electronics-Computer Hardware): 800 tmx</li>
<li>MBE (Marketing-Business-Economics): 76 tmx</li>
<li>SOF (Software): 80 tmx</li>
</ul>
<p align="left">Data   sets were selected according to the following criteria.</p>
<p align="left">a)   Language Model to follow</p>
<p align="left">b)   TDA data availability</p>
<p align="left">c)   Subject field</p>
<p align="left">
<p align="left"><span style="text-decoration: underline;"><strong>ELECTRONICS   – COMPUTER HARDWARE</strong></span></p>
<p align="left">The   aim was to improve on existing engines (Electronics). To this end,   TDA data from Intel and Dell in Spanish was added to existing sets   coming from Sony. Not all data available from TDA from particular   donors was used as fit for the customized training. Some was   discarded for a variety of reasons. Client-specific terminology was   applied to original donor&#8217;s data sets for terminology standardization   purposes. Pangeanic contributed with small sets of self-generated   data. The result was a medium size 3,9M word engine specifically   designed for the field of application and with the client&#8217;s   terminology applied through donor&#8217;s TMX files in order to ease   post-editing.</p>
<p align="left">
<p align="left">The data set for electronics was:</p>
<p align="left"><img class="alignnone size-full wp-image-399" title="2009-09__m21a22d76" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__m21a22d76.gif" alt="2009-09__m21a22d76" width="364" height="126" /></p>
<p align="left">
<p align="left"><strong>SOFTWARE</strong></p>
<p align="left">The   aim of this development was to build a fresh engine with TDA data   only in the subject field of a potential client to offer a solution   which would show enough ROI for our SMT as a plug-in. To this end, we   selected TDA data from several software donors in a subject field   related to the product lines. We did not include Microsoft data   initially as the size of the TM would have created a bias towards   Microsoft terminology. However, engine enhancement is not discarded   in future or more general releases. Again, not all data available   from TDA from particular donors was used in the customized training.   Some data was discarded and Pangeanic  contributed with small sets of   self-generated data.</p>
<p align="left">The   data set for software was:</p>
<p align="left"><img class="alignnone size-full wp-image-400" title="2009-09__m40c55628" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__m40c55628.gif" alt="2009-09__m40c55628" width="364" height="162" /></p>
<p align="left">
<p align="left"><strong>MARKETING-BUSINESS-ECONOMICS</strong></p>
<p align="left">The   aim of this development was to build a first test-bench engine   serving as a business case within an uncontrolled, general field    that has usually been “a work of literature” and out of   the scope of traditional MT systems (particularly Rule-Based MT).   Marketing and  Economics are above natural speech and can be   elaborate, complex texts and sometimes flowery or metaphorical.   Again, the aim is to offer a solution which would show enough ROI for   our SMT as a plug-in. The client did not provide enough training data   and TDA   did not offer enough bulk related material for this purpose. In this   case, to show some results was more essential than to finalize a   large engine.</p>
<p align="left">The   data set for marketing-business-economics was:</p>
<p align="left"><img class="alignnone size-full wp-image-401" title="2009-09__3cf5d5a2" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__3cf5d5a2.gif" alt="2009-09__3cf5d5a2" width="364" height="126" /></p>
<p align="left">
<p align="left">
<p align="left"><span style="text-decoration: underline;"><strong>Process</strong></span></p>
<p align="left">The   tables below describe the processes followed in the training. We can   see that sentence length increases from domain to domain, that 2,000   representative segments (just over 20,000 words in all three cases)   were not incorporated in the training so they could be used in the   tests (BLEU/Meteor scores). Some sentences happened to be common   (identical) to the training (18, 12, 2 respectively) mostly because   of the nature of the source files (user manuals, software   strings/commands in some cases which contain certain repetitions).</p>
<p align="left">Perplexity   is a measure that gives us an idea of the complexity of the task and   how similar the test is to the training.  The higher the   perplexity, the higher the difficulty.</p>
<p align="left"><img class="alignnone size-full wp-image-403" title="2009-09__mb136cc4" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__mb136cc4.gif" alt="2009-09__mb136cc4" width="449" height="227" /></p>
<p align="left"><img class="alignnone size-full wp-image-404" title="2009-09__7327d93a" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__7327d93a.gif" alt="2009-09__7327d93a" width="449" height="244" /></p>
<p align="left">
<p align="left">
<p align="left"><img class="alignnone size-full wp-image-405" title="2009-09__mcfe648c" src="http://www.pangeanic.com/wp-content/uploads/2009/10/2009-09__mcfe648c.gif" alt="2009-09__mcfe648c" width="449" height="244" /></p>
<p align="left"><span style="text-decoration: underline;"><strong>Results</strong></span></p>
<p align="left">Model   training + optimization: Moses+MERT</p>
<p align="left">Language   models: 5-grams</p>
<p align="left">#   TMX files for each category</p>
<p align="left">ECH:   800</p>
<p align="left">MEB:   76</p>
<p align="left">SOF:   80</p>
<p align="left">Translation   results English-&gt;Spanish</p>
<p align="left">BLEU:   ECH: 49.98</p>
<p align="left">MEB:   24.39</p>
<p align="left">SOF:   47.78</p>
<p align="left">Meteor   0.8.3</p>
<p align="left">ECH:   0.4312</p>
<p align="left">MEB:   0.2610</p>
<p align="left">SOF:   0.4377</p>
<p align="left">The   best scoring domain is Electronics-Computer Hardware, with almost 50%   scoring in BLEU and 43 in METEOR.</p>
<p align="left">Results   in Software are also very high (47,78% and 43,7% respectively).</p>
<p align="left">This   is a new domain for our development and we have used TDA     data almost exclusively.</p>
<p align="left">Marketing-Business-Economics   lags behind with around 25% in both. Specific, “imaginative”   marketing TMs weigh a lot here, and there is less content from TDA.   Marketing literature is, by definition, not necessarily as accurate   as the other two fields, which are fairly controlled languages. The   engine was a first step, a test development still to be enhanced with   further data.</p>
<p align="left">Nevertheless,   the results surpass our expectations. A 50% BLEU-Meteor scoring can   translate in large increases in language production. Even the 25%, as   an initial result for marketing leaves a lot of room for improvement   once even more data is available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2009/10/12/pangeamt-with-tda-tests-provide-up-to-50-more/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Improving the quality of a customized SMT system using shared training data</title>
		<link>http://www.pangeanic.com/2009/09/19/improving-the-quality-of-a-customized-smt-system-using-shared-training-data/?lang=en</link>
		<comments>http://www.pangeanic.com/2009/09/19/improving-the-quality-of-a-customized-smt-system-using-shared-training-data/?lang=en#comments</comments>
		<pubDate>Sat, 19 Sep 2009 15:02:10 +0000</pubDate>
		<dc:creator>Pangeanic</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://localhost/wordpress_pangeanic/2009/09/19/improving-the-quality-of-a-customized-smt-system-using-shared-training-data/</guid>
		<description><![CDATA[
At the MT Summit in Ottawa (August 28, 2009), Microsoft’s Chris Wendt  presented the findings from a recent pilot project using translation memories from more than ten TDA members to train the Microsoft statistical machine translation engine.
Main tests were performed in two languages: Chinese and German, with customization for Sybase iAnywhere. Additional tests also were [...]]]></description>
			<content:encoded><![CDATA[<div>
<p>At the MT Summit in Ottawa (August 28, 2009), Microsoft’s Chris Wendt  presented the findings from a recent pilot project using translation memories from more than ten <a href="http://www.pangeanic.com/www.tausdata.org" target="_blank">TDA</a> members to train the Microsoft statistical machine translation engine.</p>
<p>Main tests were performed in two languages: Chinese and German, with customization for Sybase iAnywhere. Additional tests also were run on Polish and Japanese languages with customization for Adobe and Dell.</p>
<p>BLEU scores went up significantly with increases between 22% and 74% compared to engines trained purely on Microsoft or general available data.</p>
<p>This is a link to this seminal presentation</p>
<p><a href="http://www.slideshare.net/TAUS/improving-the-quality-of-a-customized-smt-system-using-shared-training-data-1965125">Improving the quality of a customized SMT system using shared training data</a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.pangeanic.com/2009/09/19/improving-the-quality-of-a-customized-smt-system-using-shared-training-data/?lang=en/feed/&amp;lang=en</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
