Many thanks to everyone who took part

First International Workshop on Free/Open-Source Rule-Based Machine Translation

2nd—3rd November

Universitat d'Alacant,
Alacant (Spain)

Introduction | Scope | Invited speakers | Programme | Proceedings | Presentations | Contact


The free/open-source software movement has arrived into the field of machine translation. Machine translation is special in that, in addition to specific algorithms, it heavily depends on extensive language-dependent data. Therefore, not only the engine or the tools used to manage these data have to be free/open-source, but also the data themselves. There are many machine translation packages of this type available, but most of them are corpus-based, and, in particular, statistical machine translation systems: rule-based systems built on these principles are still little known and little used.

There are distinct advantages to having free/open-source licences for rule-based machine translation: linguistic knowledge for a language pair is encoded explicitly in the form of linguistic data, so that both humans and the machine translation engine can process it. This makes them naturally available to build knowledge for other language pairs or even for other human language technologies besides machine translation, and, conversely, linguistic knowledge from other sources may be reused to build machine translation systems. The free and open scenario makes this reuse easier, and, if copylefted licences are used, builds a commons of knowledge and resources that benefits all the language communities involved. These advantages are even clearer for less-resourced languages, for which large bilingual corpora are not available, and for morphologically rich languages, which even with large corpora suffer from data sparseness.

This workshop aims at bringing together the experience of researchers and developers in the field of rule-based machine translation who have decided to board the free/open-source train and are effectively contributing to creating that commons of explicit knowledge: machine translation rules and dictionaries, and machine translation systems whose behaviour is transparent and clearly traceable through their explicit logic.


The main areas of interest for the workshop are as follows: Note that this is intended as a guideline, and we welcome submissions on other aspects of free and open-source rule-based machine translation.

Important dates:


All submissions should be made through the conference management system, the url of which is:

Submissions should describe original work, completed or in progress, be anonymous (no authors, affiliations or addresses, and no explicit self-reference), be no longer than eight (8) pages of A4, and be in PDF format. Initial versions of papers must conform to the conference format, which can be found in freerbmt09.tar.gz

Where a submission discusses software or data, in final publication it will be required to include information on how both the software and the data can be publically accessed. The software and data should be clearly licensed under an approved licence. A list of free software licences may be found at

Proceedings will be published as part of the open-access repository of Universitat d'Alacant.

If you come across any problem with your submission, please do not hesitate to contact the organisers.


In order to register for the workshop, the registration form should be filled out and emailed to Along with the form, a fee of 50€ should be paid to the Escuela de Negocios. Fundación General Universidad de Alicante with IBAN: ES54 2090 3191 14 0040113031 and SWIFT: CAAMES2A, with transfer charges being paid by the transferer.

The name of the bank, in case you need it, is "Caja Mediterráneo" and their address is:

Campus Universitario
Centro de Servicios, Local 1
03690, San Vicente del Raspeig, Spain


The workshop will take place at the Departament de Llenguatges i Sistemes Informàtics (DLSI) at the Universitat d'Alacant (see the red marker on the map to the right). Once in the Politecnica 4 building, look for the Salon de Actos room in the basement.

For accommodation we recommend the Villa Universitaria, which is situated just next to the university (contact: or +34 902 10 19 29). Accommodation is also available in the city, but please take into account that the bus service connecting the city with the university takes around 30 minutes.

Further information about Alicante can be found on Wikipedia and Wikitravel.

Invited speakers:

Amba Kulkarni, "Anusaaraka: An accessor cum Machine Translator"
India being multilingual, there is a demand for translation both among Indian languages as well as from English to Indian languages. Translation being not reliable, Anusaaraka aims to provide complete access to the source text in addition to translation. With an appropriate division of load between man and machine, Kannada-Hindi Anusaaraka, developed in early 90s, demonstrated that it is possible to reduce the language barrier considerably. However it is necessary for an Anusaaraka reader to undergo some training on the syntactic divergences and special notation used to handle the divergences in word-meaning mappings between the source and the target language. In the later version of Anusaaraka, in order to reduce the burden on a user, the state-of-the-art MT system formed an important component of it. Care was taken to develop the architecture in such a way that, it can cater to the needs of diverse requirements ranging from faithful access to the full fledged translation.
Kepa Sarasola, "Matxin: developing sustainable MT for a less-resourced language"
Following the strategy defined in IXA group for reusing linguistic resources and NLP tools, in year 2000 (but not before), we decided that we had enough languages resources and tools (bilingual dictionaries, morphological and syntactic analysers and parsers) that could be reused to build an RBMT system for the Spanish-Basque pair. The system built is called Matxin and is available at Since 2006 we are collaborating with DCU building a Spanish—Basque system based on EBMT and SMT paradigms. We could get better results with bigger parallel corpus, but it is difficult to get it for Basque, a minority language. Based on our work we have published a strategy for sustainable MT for lesser-resourced languages; it is based on incremental design, reusability, standardisation and open source. We have developed MT engines based on the three paradigms (RBMT, SMT and EBMT), so our position is optimal to experiment with hybrid systems and multi-engine systems.


The programme for the workshop can be downloaded here.


All the papers included in the proceedings can be found at the Open Access Repository of Universitat d'Alacant.



Programme committee:


If you have any questions at all, please feel free to contact the organisers at

Supported by:

DLSI transducens Prompsit