Vol. 17 No. 2 (2008): Nordic Journal of African Studies
Back Issues

Experimental Bootstrapping of Morphological Analysers for Nguni Languages

Sonja Bosch
University of South Africa
Laurette Pretorius
University of South Africa and Meraka Institute, CSIR
Axel Fleisch
University of South Africa and University of Helsinki
Nordic Journal of African Studies

Published 2008-06-30

How to Cite

Bosch, S., Pretorius, L., & Fleisch, A. (2008). Experimental Bootstrapping of Morphological Analysers for Nguni Languages. Nordic Journal of African Studies, 17(2), 23. https://doi.org/10.53228/njas.v17i2.237

Abstract

This paper addresses the experimental bootstrapping of the development of broad-coverage finite-state morphological analysers for Xhosa, Swati and (Southern) Ndebele by using an existing prototype of a morphological analyser for Zulu. These languages are both morphologically complex and resource-scarce. The research question is whether bootstrapping is feasible across the language boundaries between these closely related varieties. The objective is an assessment of the recognition rates yielded by the Zulu morphological analyser for the three related languages. The strategy is to use bootstrapping techniques that consist of the following steps: applying the analyser to corpus data from all languages, identifying (types of) failures, and implementing the respective changes in the analyser. The results show that the high degree of shared typological properties and formal similarities among the Nguni varieties warrants a modular bootstrapping approach. Word forms in these languages that were recognized by the Zulu analyser were mostly adequately analysed. Therefore, the focus lies on providing the necessary adaptations based on an analysis of the failure output for each language. As a result, the development of analysers for Xhosa, Swati and Ndebele is considerably faster than the creation of the Zulu prototype. The paper concludes with comments on the feasibility of the experiment, and the results of the evaluation.