Erasmusgebouw, room E4.05
6525 HT Nijmegen
postal Faculty of Arts, CIW
P.O. Box 9103
NL-6500 HD Nijmegen
In my research I develop machine learning and language
technology. Most of my work involves the intersection of the two
fields: computers that learn to understand and generate natural
language. Specific interests include memory-based learning, machine
translation, the relation between written and spoken language, text
mining, the Dutch language, computational humanities, and cultural heritage. My CV has more
CLARIAH, Common Lab Research Infrastructure for the Arts and Humanities. I am a board member of this exciting new project that will continue and enlarge the digital infrastructure for the Humanities in the Netherlands.
ADNEXT, Adaptive Information Extraction over Time, a work package of the Infiniti project, part of COMMIT.
Nederlab, bringing together massive amounts of digitized Dutch texts from the Middle Ages to the present in one user-friendly and tool-enriched web interface. Funded by NWO.
Click and explore the following demos showcasing our recent work:
Aside from papers and dissertations, our projects tend to produce software. We make a point of maximizing the availability of this software by releasing the best software projects under open source licenses. Some of our software, such as Timbl and Frog, is packaged and available in Debian Science. Other packages, particularly the ones that perform some natural language processing function, are available as webservices, usually with a web interface.
Valkuil.net, our context-sensitive spelling corrector for Dutch
As part of past and ongoing projects with many colleagues I was involved in developing the following software:
Natural language processing
Frog: Dutch tagger-lemmatizer, morphological analyzer, and dependency parser. With the Frog development team.
Valkuil.net and Fowlt.net: Dutch and English context-sensitive spelling correctors. With Maarten van Gompel, Wessel Stoop, Tanja Gaustad van Zaanen, and Monica Hajek.
PBMBMT: Phrase-based memory-based machine translation. With Maarten van Gompel.
Mbt: Memory-based tagger-generator and tagger. With Ko van der Sloot, Jakub Zavrel, and Walter Daelemans.
WOPR: Memory-based word prediction, language modeling, and spelling correction. Main developer: Peter Berck.
Timbl: Tilburg memory-based learner. With Ko van der Sloot, Walter Daelemans, and Jakub Zavrel.
Dimbl: Distributed Timbl, parallel k-NN classification on multi-CPU machines. Programmer: Ko van der Sloot.
Timpute: TiMBL-wrapper for internal database correction through imputation. Programmer: Steve Hunt.
paramsearch: automatic parameter optimization for various machine
Help research and add your common sense! This crowd-sourcing experiment is run at the CLiPS research group in Antwerp. The top widget asks you to rate Dutch words on their subjectiveness and their polarity (negative to positive); the bottom one shows English words.