Effects of Light Stemming on Feature Extraction and Selection for Arabic Documents Classification

Faculty Science Year: 2020
Type of Publication: ZU Hosted Pages:
Authors:
Journal: Studies in Computational Intelligence Spribger Volume:
Keywords : Effects , Light Stemming , Feature Extraction , Selection    
Abstract:
This chapter aims to study the effects of the light stemming technique on feature extraction where Bag of Words (BoW) and Term frequency- Inverse Documents (TF-IDF) are employed for Arabic document classification. Moreover, feature selection methods such as Chi-square (Chi2), Information gain (IG), and singular value decomposition (SVD) are used to select the most relevant features. K-nearest Neighbor (kNN), Logistic Regression (LR), and Support Vector Machine (SVM) classifiers are used to build the classification model. Experiment are conducted using a public data collected from Arab websites, namely, BBC Arabic dataset. Experiment results show that SVM outperforms LR and KNN. Furthermore, BoW outperforms TF-IDF without using a stemming technique. Using a Robust Arabic Light Stemmer (ARLStem) as our main light stemmer shows a positive effect when combined with TF-IDF over the baseline. In the experiment where Chi2 is used as the feature selection technique, SVM resulted in 0.9568% F1-micro using BoW to extract the features from the dataset where 5000 relevant features were selected. In the experiment where IG is used as the feature selection method, SVM achieved 0.9588% F1-micro with BoW and 4000 selected features. Finally in the experiment where SVD is used as the feature selection technique, SVM reached 0.9569% F1-micro when using BoW and 5000 relevant feature were selected. The aforementioned experiments report the best results achieved where stemming is not employed.
   
     
 
       

Author Related Publications

  • Mohamed El Sayed Ahmed Muhamed, "A Grunwald–Letnikov based Manta ray foraging optimizer for global optimization and image segmentation", Elsevier, 2020 More
  • Mohamed El Sayed Ahmed Muhamed, "A novel hybrid gradient-based optimizer and grey wolf optimizer feature selection method for human activity recognition using smartphone sensors", MDPI, 2021 More
  • Mohamed El Sayed Ahmed Muhamed, "Efficient schemes for playout latency reduction in P2P-VOD systems", Springer, 2018 More
  • Mohamed El Sayed Ahmed Muhamed, "a novel algorithm for source localization based on nonnegative matrix factroization using \alpha 'beta divergence in chochleagram", WSEAS, 2013 More
  • Mohamed El Sayed Ahmed Muhamed, "Open cluster membership probability based on K-means clustering algorithm", Springer, 2016 More

Department Related Publications

  • Rodyna Ahmed Mahmoud, "Proximity structures and grill", ijser, 2013 More
  • Heba Ibrahim Mustafa, "On rough approximations via ideal", Elsevier, 2013 More
  • Heba Ibrahim Mustafa, "Soft Generalized Closed Sets with Respect to an Ideal in Soft Topological Spaces", Natural science publishing USA, 2014 More
  • Heba Ibrahim Mustafa, "Hybridizing Rough Sets and Double Sets (An approach for increasing decision accuracy)", Acta Zhengzhou University Overseas, 2013 More
  • Alaa Hassan Attia Hassan, "On subordination results for certain new classes of analytic functions defined by using Salagean operator", Universiteti i Prishtines, Prishtine, Kosove, 2012 More
Tweet