Here are some resources that I have collected, and written for free (as in free software) computational linguistics in Afrikaans. Most of them are used, or will be used with the Apertium machine translation system for the English—Afrikaans language pair.
The morphological analyser is specified in an XML format, and compiled into a finite state machine which can be read by the lt-proc utility. For any given input lexical form, it outputs all of the possible analyses which can be found in the dictionary. For example:
$ echo "aan die deur" | lt-proc af-en.automorf.bin
^aan/aan<pr>$ ^die/die<det><def><sg>/die<det><def><sp>/die<det><def><pl>$ ^deur/deur<n><sg>/deur<pr>$
The analyser currently has around 11,000 lemmata, and can analyse around 20,000 surface forms. You can try it online here
The part of speech tagger was trained in an unsupervised manner on the database dump of the Afrikaans Wikipedia. As input it takes the output of the morphological analyser (see above). For any given set of input analyses, it outputs the most likely analysis for the word in context. For example:
$ echo "Die man klop aan die deur." | lt-proc af-en.automorf.bin | apertium-tagger -g af-en.prob
^Die<det><def><sg>$ ^man<n><sg>$ ^klop<vblex><pres>$ ^aan<pr>$ ^die<det><def><sg>$ ^deur<n><sg>$^.<sent>$
$ echo "Die man stap deur die gang." | lt-proc af-en.automorf.bin | apertium-tagger -g af-en.prob
^Die<det><def><sg>$ ^man<n><sg>$ ^stap<n><sg>$ ^deur<pr>$ ^die<det><def><sg>$ ^gang<n><sg>$^.<sent>$
$ echo "Die man loop deur die deur." | lt-proc af-en.automorf.bin | apertium-tagger -g af-en.prob
^Die<det><def><sg>$ ^man<n><sg>$ ^loop<vblex><pres>$ ^deur<pr>$ ^die<det><def><sg>$ ^deur<n><sg>$^.<sent>$
$ python compound-resolver.py wordlist-af nasionaleverdedigingsoorwegings
['nasionale', 'verdedigings', 'oorwegings']
This module is currently under development.
The content of this site which is Copyright © Francis Tyers is dual-licensed under the GNU General Public Licence, and the Creative Commons Attribution Share-Alike 3.0 Licence. The important thing for me is Copyleft (Kopielinks) which basically means we'll all contribute our changes and work to a common pool that everyone can use. So if you need this stuff under a licence different from any of the ones above, but still Copyleft, email me.