Free/open-source machine translation software
Here’s a non-exhaustive list of links to existing free/open-source machine translation systems, which I will try to complete as I find about them. To the best of my knowledge, software listed here has:
- either a free license as defined by the Free Software Foundation,
- or an open-source license as defined by the Open Source Initiative.
- Apertium, a free/open-source rule-based machine translation platform.
- Matxin, a free/open-source rule-based machine translation system for Basque.
- OpenLogos, a free/open-source version of the historical Logos machine translation system.
- Anusaaraka, English-Hindi machine translation system.
Statistical machine translation systems
- Moses, a statistical machine translation system.
- Marie, an n-gram-based statistical machine translation decoder.
- Joshua, an open source decoder for statistical translation models based on synchronous context free grammars
- Phramer, an open-source statistical phrase-based machine translation decoder
- GREAT, a decoder based on stochastic finite-state transducers, which includes a training toolkit.
Training translation models
- Giza++ is a tool to train translation models for statistical machine translation (see also the related mkcls tool to train word classes)
- Thot is a toolkit to train phrase-based models for statistical machine translation.
- IRSTLM, free/open-source language modelling tool to be used with Moses instead of SRILM, which is not free.
- RandLM, space-efficient ngram-based language models built using randomized representations (Bloom Filters etc).
- Kenneth Heafield’s software for the fast filtering of ARPA format language models to multiple vocabularies.
- Holger Schwenk’s Continuous Space Language Model toolkit (CSLM) works by projecting the word indices onto a continuous space and using a probability estimator operating on this space.
- Kenneth Heafield’s scripts that make it easy to score machine translation output using NIST’s BLEU and NIST, TER, and METEOR.
- RIA is a tool for automatic induction of transfer rules for Transfer-Based Statistical Machine Translation using dependency structures.
- Chaski: Distributed phrase-based machine translation training tool based on Hadoop.
Example-based machine translation systems
- The Cunei machine translation platform, an example-based machine translation system.
- The CMU Example-Based Machine Translation System.
- The Tilburg University Phrase-based memory-based machine translation system.
- The DCU OpenMaTrEx marker-driven example-based machine translation system (partially released before as Marclator) .
Multi-engine machine translation / system combination
- MANY: Open Source Machine Translation System Combination.
- Kenneth Heafield’s multi-engine machine translation system.
Aligners and translation models
- Giza++: training of statistical translation models.
- Anymalign, a multilingual sub-sentential aligner.
- Ventsislav Zhechev’s Sub-tree aligner which can be used for the automatic generation of parallel treebanks.
Web services around machine translation
- Tradubi is an open-source Ajax-based web application for social translation built upon Apertium (may be tested online).
Distributed machine translation
- ScaleMT (no release yet, browse at the Apertium Subversion repository) is a free/open-source framework for building scalable machine translation web services.
Other useful tools
… that may be used to build machine translation systems
- Freeling, a free/open-source suite of language analyzers.
- Bitextor, an automatic bitext harvester
- Foma, a finite-state machine toolkit and library
- HFST, Helsinki Finite State Technology for natural-language morphologies.
- VISL CG-3, the constraint grammar parser at the Visual Interactive Syntax Learning project of Syddansk Universitet: browse Subversion repository, source snapshots.