In my research I develop machine learning and language technology. Most of my work involves the intersection of the two fields: computers that learn to understand and generate natural language. Specific interests include memory-based learning, machine translation, the relation between written and spoken language, text mining, the Dutch language, computational humanities, and cultural heritage. My CV has more detailed information.




Since September 2011 I am full professor of Language and Speech Technology at Radboud University, in Nijmegen, the Netherlands, within the Centre for Language Studies, co-leading the Language and Speech Technology research group. I am also a member of the department of Communication and Information Sciences of the Faculty of Arts.

I am guest professor at CLiPS, the Computational Linguistics and Psycholinguistics Research Centre at the University of Antwerp. I spent many good years at the ILK Research Group at Tilburg University. I am humanities integrator at the Netherlands eScience Center, fellow of the Donders Institute, and member of the Royal Netherlands Academy of Arts and Sciences.

  • HistoInformatics-2014, 2nd International Workshop on Computational History, Barcelona, Spain, November 10, 2014. Co-chair.

Aside from papers and dissertations, our projects tend to produce software. We make a point of maximizing the availability of this software by releasing the best software projects under open source licenses. Some of our software, such as Timbl and Frog, is packaged and available in Debian Science. Other packages, particularly the ones that perform some natural language processing function, are available as webservices, usually with a web interface.

    Valkuil.net, our context-sensitive spelling corrector for Dutch

As part of past and ongoing projects with many colleagues I was involved in developing the following software:

Natural language processing

  • Frog: Dutch tagger-lemmatizer, morphological analyzer, and dependency parser. With the Frog development team.
  • Valkuil.net and Fowlt.net: Dutch and English context-sensitive spelling correctors. With Maarten van Gompel, Wessel Stoop, Tanja Gaustad van Zaanen, and Monica Hajek.
  • PBMBMT: Phrase-based memory-based machine translation. With Maarten van Gompel.
  • Mbt: Memory-based tagger-generator and tagger. With Ko van der Sloot, Jakub Zavrel, and Walter Daelemans.
  • WOPR: Memory-based word prediction, language modeling, and spelling correction. Main developer: Peter Berck.

Machine Learning

  • Timbl: Tilburg memory-based learner. With Ko van der Sloot, Walter Daelemans, and Jakub Zavrel.
  • Dimbl: Distributed Timbl, parallel k-NN classification on multi-CPU machines. Programmer: Ko van der Sloot.
  • Timpute: TiMBL-wrapper for internal database correction through imputation. Programmer: Steve Hunt.
  • paramsearch: automatic parameter optimization for various machine learning algorithms.
  • Fambl: Family-based learner, a generalized-example k-NN classifier. Reference guide.

Crowd sourcing

