Effects of Light Stemming on Feature Extraction and Selection for Arabic Documents Classification

Faculty Science Year: 2020
Type of Publication: ZU Hosted Pages:
Authors:
Journal: Studies in Computational Intelligence Spribger Volume:
Keywords : Effects , Light Stemming , Feature Extraction , Selection    
Abstract:
This chapter aims to study the effects of the light stemming technique on feature extraction where Bag of Words (BoW) and Term frequency- Inverse Documents (TF-IDF) are employed for Arabic document classification. Moreover, feature selection methods such as Chi-square (Chi2), Information gain (IG), and singular value decomposition (SVD) are used to select the most relevant features. K-nearest Neighbor (kNN), Logistic Regression (LR), and Support Vector Machine (SVM) classifiers are used to build the classification model. Experiment are conducted using a public data collected from Arab websites, namely, BBC Arabic dataset. Experiment results show that SVM outperforms LR and KNN. Furthermore, BoW outperforms TF-IDF without using a stemming technique. Using a Robust Arabic Light Stemmer (ARLStem) as our main light stemmer shows a positive effect when combined with TF-IDF over the baseline. In the experiment where Chi2 is used as the feature selection technique, SVM resulted in 0.9568% F1-micro using BoW to extract the features from the dataset where 5000 relevant features were selected. In the experiment where IG is used as the feature selection method, SVM achieved 0.9588% F1-micro with BoW and 4000 selected features. Finally in the experiment where SVD is used as the feature selection technique, SVM reached 0.9569% F1-micro when using BoW and 5000 relevant feature were selected. The aforementioned experiments report the best results achieved where stemming is not employed.
   
     
 
       

Author Related Publications

  • Mohamed El Sayed Ahmed Muhamed, "A Grunwald–Letnikov based Manta ray foraging optimizer for global optimization and image segmentation", Elsevier, 2020 More
  • Mohamed El Sayed Ahmed Muhamed, "A novel hybrid gradient-based optimizer and grey wolf optimizer feature selection method for human activity recognition using smartphone sensors", MDPI, 2021 More
  • Mohamed El Sayed Ahmed Muhamed, "Efficient schemes for playout latency reduction in P2P-VOD systems", Springer, 2018 More
  • Mohamed El Sayed Ahmed Muhamed, "a novel algorithm for source localization based on nonnegative matrix factroization using \alpha 'beta divergence in chochleagram", WSEAS, 2013 More
  • Mohamed El Sayed Ahmed Muhamed, "Open cluster membership probability based on K-means clustering algorithm", Springer, 2016 More

Department Related Publications

  • Rodyna Ahmed Mahmoud, "Pre-Open Sets with Ideal", Scientific Research Platform (SRP), 2013 More
  • Rodyna Ahmed Mahmoud, "ON BCL-ALGEBRA", Council for Innovative Research, 2013 More
  • Yasser AbdelAziz Amer Tolba, "The improved (G’/G) - expansion method for constructing exact traveling wave solutions for a nonlinear PDE of nanobiosciences", USA, 2013 More
  • Alaa Hassan Attia Hassan, "A Unified Representation of Some Starlike and Convex Harmonic Functions with Negative Coefficients", AGH University of Science and Technology Press, Krakow 2013, Poland, 2013 More
  • Alaa Hassan Attia Hassan, "Generalizations of Hadamard Procuct of Certain Meromorphic Multivalent Functions with Positive Coefficients", Istanbul Universitesi, Turkey, 2013 More
Tweet