Zagazig University Digital Repository
Home
Thesis & Publications
All Contents
Publications
Thesis
Graduation Projects
Research Area
Research Area Reports
Search by Research Area
Universities Thesis
ACADEMIC Links
ACADEMIC RESEARCH
Zagazig University Authors
Africa Research Statistics
Google Scholar
Research Gate
Researcher ID
CrossRef
Effects of Light Stemming on Feature Extraction and Selection for Arabic Documents Classification
Faculty
Science
Year:
2020
Type of Publication:
ZU Hosted
Pages:
Authors:
Mohamed El Sayed Ahmed Muhamed
Staff Zu Site
Abstract In Staff Site
Journal:
Studies in Computational Intelligence Spribger
Volume:
Keywords :
Effects , Light Stemming , Feature Extraction , Selection
Abstract:
This chapter aims to study the effects of the light stemming technique on feature extraction where Bag of Words (BoW) and Term frequency- Inverse Documents (TF-IDF) are employed for Arabic document classification. Moreover, feature selection methods such as Chi-square (Chi2), Information gain (IG), and singular value decomposition (SVD) are used to select the most relevant features. K-nearest Neighbor (kNN), Logistic Regression (LR), and Support Vector Machine (SVM) classifiers are used to build the classification model. Experiment are conducted using a public data collected from Arab websites, namely, BBC Arabic dataset. Experiment results show that SVM outperforms LR and KNN. Furthermore, BoW outperforms TF-IDF without using a stemming technique. Using a Robust Arabic Light Stemmer (ARLStem) as our main light stemmer shows a positive effect when combined with TF-IDF over the baseline. In the experiment where Chi2 is used as the feature selection technique, SVM resulted in 0.9568% F1-micro using BoW to extract the features from the dataset where 5000 relevant features were selected. In the experiment where IG is used as the feature selection method, SVM achieved 0.9588% F1-micro with BoW and 4000 selected features. Finally in the experiment where SVD is used as the feature selection technique, SVM reached 0.9569% F1-micro when using BoW and 5000 relevant feature were selected. The aforementioned experiments report the best results achieved where stemming is not employed.
Author Related Publications
Mohamed El Sayed Ahmed Muhamed, "A Grunwald–Letnikov based Manta ray foraging optimizer for global optimization and image segmentation", Elsevier, 2020
More
Mohamed El Sayed Ahmed Muhamed, "A novel hybrid gradient-based optimizer and grey wolf optimizer feature selection method for human activity recognition using smartphone sensors", MDPI, 2021
More
Mohamed El Sayed Ahmed Muhamed, "Efficient schemes for playout latency reduction in P2P-VOD systems", Springer, 2018
More
Mohamed El Sayed Ahmed Muhamed, "a novel algorithm for source localization based on nonnegative matrix factroization using \alpha 'beta divergence in chochleagram", WSEAS, 2013
More
Mohamed El Sayed Ahmed Muhamed, "Open cluster membership probability based on K-means clustering algorithm", Springer, 2016
More
Department Related Publications
Hany Samih Bayoumi Ibrahim, "Passive and active controllers for suppressing the torsional vibration of multiple-degree-of-freedom system", Sage, 2014
More
Ahmed Mohamed Khedr Souliman, "SEP-CS: Effective Routing Protocol for Heterogeneous Wireless Sensor Networks", Ad Hoc & Sensor Wireless Networks, 2012
More
Ahmed Mohamed Khedr Souliman, "Minimum connected cover of a query region in heterogeneous wireless sensor networks", Information Sciences, 2013
More
Ahmed Mohamed Khedr Souliman, "IBLEACH: intra-balanced LEACH protocol for wireless sensor networks", Wireless Netw, 2014
More
Ahmed Mohamed Khedr Souliman, "AGENTS FOR INTEGRATING DISTRIBUTED DATA FOR FUNCTION COMPUTATIONS", Computing and Informatics,, 2012
More
جامعة المنصورة
جامعة الاسكندرية
جامعة القاهرة
جامعة سوهاج
جامعة الفيوم
جامعة بنها
جامعة دمياط
جامعة بورسعيد
جامعة حلوان
جامعة السويس
شراقوة
جامعة المنيا
جامعة دمنهور
جامعة المنوفية
جامعة أسوان
جامعة جنوب الوادى
جامعة قناة السويس
جامعة عين شمس
جامعة أسيوط
جامعة كفر الشيخ
جامعة السادات
جامعة طنطا
جامعة بنى سويف