Building Algorithms and Databases for Arabic Linguistic Analysis

Faculty	Engineering	Year:	2009
Type of Publication:	Theses	Pages:	195
Authors:	Hitham Mohamed Abo Bakr Abd Rabo
		BibID	10794248
Keywords :	Computer programs
Abstract:	The approach requires an Arabic lexicon and large corpus of fully diacritized text for training purpose in order to detect diacritics. One of the main contributions in this dissertation is that we distinguish between internal and case-ending diacritization since the former requires morphological analysis while the later depends on the syntactic analysis. We have successfully solved the Arabic internal diacritization problem using three different techniques, each of which has its own strengths and weaknesses. We combined them to optimize the performance of our diacritizer and to a large extent remove ambiguities. Case-ending diacritization is treated as a post process of the internal diacritization process. We have built a novel statistical approach based on Support Vector Machine (SVM) learning algorithm for detecting case-ending diacritic signs by including combination of morphological and syntactic features. The final result is a fully diacritized Arabic statement.

PDF