|
|
 |
MARIE
Ngra(m)-based Statistical M(a)chine T(r)anslat(i)on d(e)coder
|
MARIE consists of an Ngram-based statistical machine translation decoder, which aims at being helpfull to the research community in the field of Statistical Machine Translation. It has been developed at the TALP Research Center of the Universitat Politècnica de Catalunya (UPC) by Josep M. Crego as part of his PhD thesis, with the aid of Adrià de Gispert and under the advice of professor José B. Mariño.
Description
Statistical machine translation can be performed using the MARIE decoder when supplied at least a translation model. It was specially design to deal with tuples (bilingual translation units) and a translation model learnt as a typical Ngram language model (Ngram-based SMT), despite of this, MARIE can use phrases (bilingual translation units) and behave as a typical phrase-based decoder (phrase-based SMT).
In order to perform better translations, the decoder can make use of a target language model, a reordering model, a word penalty and any additional translation models, all introduced in the search following a log-linear combination of models.
Tools for building language models are freely available (we recommend the SRI Language Modeling Toolkit). Methods to learn translation models can be found after a brief look at current research papers on SMT.
The decoder is released with a manual which describes its usage and inner workings. Details of the decoder have also been presented in the next international conference (reference MARIE decoder citing this paper):
How to download
MARIE can be downloaded free of charge under the GNU General Public License. Fill out your email address (to be used to notice future software upgrades) and you will be redirected to the web page where you can download it.
Acknowledgements
This work has been partially supported by the Spanish government, under grant TIC-2002-04447-C02 (Aliado Project), the European Union, under FP6-506738 grant (TC-STAR project) and and the Universitat Politecnica de Catalunya (UPC-RECERCA grant).
We would also like to thank the rest of members of the SMT group in the Signal Theory and Communications Department of the UPC for their comments, suggestions and contributions in the development and testing work: Patrick Lambert, Rafael Banchs, Marta Ruiz and José A. R. Fonollosa.
|
Send your comments and suggestions to Josep M. Crego.
TALP Research Center
Universitat Politècnica de Catalunya (UPC)
September 24th, 2005. Barcelona
|