Effects of Light Stemming on Feature Extraction and Selection for Arabic Documents Classification

Faculty Science Year: 2020
Type of Publication: ZU Hosted Pages:
Authors:
Journal: Studies in Computational Intelligence Spribger Volume:
Keywords : Effects , Light Stemming , Feature Extraction , Selection    
Abstract:
This chapter aims to study the effects of the light stemming technique on feature extraction where Bag of Words (BoW) and Term frequency- Inverse Documents (TF-IDF) are employed for Arabic document classification. Moreover, feature selection methods such as Chi-square (Chi2), Information gain (IG), and singular value decomposition (SVD) are used to select the most relevant features. K-nearest Neighbor (kNN), Logistic Regression (LR), and Support Vector Machine (SVM) classifiers are used to build the classification model. Experiment are conducted using a public data collected from Arab websites, namely, BBC Arabic dataset. Experiment results show that SVM outperforms LR and KNN. Furthermore, BoW outperforms TF-IDF without using a stemming technique. Using a Robust Arabic Light Stemmer (ARLStem) as our main light stemmer shows a positive effect when combined with TF-IDF over the baseline. In the experiment where Chi2 is used as the feature selection technique, SVM resulted in 0.9568% F1-micro using BoW to extract the features from the dataset where 5000 relevant features were selected. In the experiment where IG is used as the feature selection method, SVM achieved 0.9588% F1-micro with BoW and 4000 selected features. Finally in the experiment where SVD is used as the feature selection technique, SVM reached 0.9569% F1-micro when using BoW and 5000 relevant feature were selected. The aforementioned experiments report the best results achieved where stemming is not employed.
   
     
 
       

Author Related Publications

  • Mohamed El Sayed Ahmed Muhamed, "A Grunwald–Letnikov based Manta ray foraging optimizer for global optimization and image segmentation", Elsevier, 2020 More
  • Mohamed El Sayed Ahmed Muhamed, "A novel hybrid gradient-based optimizer and grey wolf optimizer feature selection method for human activity recognition using smartphone sensors", MDPI, 2021 More
  • Mohamed El Sayed Ahmed Muhamed, "Efficient schemes for playout latency reduction in P2P-VOD systems", Springer, 2018 More
  • Mohamed El Sayed Ahmed Muhamed, "a novel algorithm for source localization based on nonnegative matrix factroization using \alpha 'beta divergence in chochleagram", WSEAS, 2013 More
  • Mohamed El Sayed Ahmed Muhamed, "Open cluster membership probability based on K-means clustering algorithm", Springer, 2016 More

Department Related Publications

  • Hany Samih Bayoumi Ibrahim, "Passive and active controllers for suppressing the torsional vibration of multiple-degree-of-freedom system", Sage, 2014 More
  • Ahmed Mohamed Khedr Souliman, "SEP-CS: Effective Routing Protocol for Heterogeneous Wireless Sensor Networks", Ad Hoc & Sensor Wireless Networks, 2012 More
  • Ahmed Mohamed Khedr Souliman, "Minimum connected cover of a query region in heterogeneous wireless sensor networks", Information Sciences, 2013 More
  • Ahmed Mohamed Khedr Souliman, "IBLEACH: intra-balanced LEACH protocol for wireless sensor networks", Wireless Netw, 2014 More
  • Ahmed Mohamed Khedr Souliman, "AGENTS FOR INTEGRATING DISTRIBUTED DATA FOR FUNCTION COMPUTATIONS", Computing and Informatics,, 2012 More
Tweet