Vol. 29 No. 3 (2020): Nordic Journal of African Studies
Linguistics

Computational morphology systems for Zulu – a comparison

Sonja Bosch
Department of African Languages, University of South Africa (UNISA)

Published 2020-10-26

Keywords

  • computational morphology systems,
  • morphological analyser,
  • morphological decomposer,
  • segmentation,
  • Zulu morphology

How to Cite

Bosch, S. (2020). Computational morphology systems for Zulu – a comparison. Nordic Journal of African Studies, 29(3), 28. https://doi.org/10.53228/njas.v29i3.548

Abstract

The morphological analysis of Bantu languages, particularly for those with a conjunctive orthography such as Zulu, is crucial not only for the purposes of accurate corpus searches for Bantu linguists, but also as a basic enabling application that facilitates the development of more advanced tools and practical language processing applications, such as tokenising, disambiguation, part-of-speech tagging, parsing and machine translation. In this article, a comparison is made between four freely available computational morphology systems for Zulu, namely isiZulu.net, a Zulu–English online dictionary that also offers morphological analysis; ZulMorph, a finite-state morphological analyser for Zulu, currently available as a finite-state morphology demo; an open source morphological decomposer (available as modules and data) listed as the NCHLT (National Centre for HLT) IsiZulu Morphological Decomposer; and CHIPMUNK, a morphological segmenter and stemmer that contains components for modelling Zulu morphotactics. Criteria that are considered for the purposes of this comparison are, among others, accessibility and lookup capacity, embedded lexicons, degree of granularity of morphological analysis or decomposition, and also the documentation of tagsets used for purposes of analysis. Furthermore, the results of an evaluation based on recall and precision are presented. Against this background, this first comparison of four available Zulu computational morphology systems will be presented, based on output examples of a broad range of word categories with varying morphological complexity extracted by means of random sampling from the freely available Leipzig Wortschatz Collection corpus.