Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering

Faculty Computer Science Year: 2019
Type of Publication: ZU Hosted Pages:
Authors:
Journal: IAENG International Journal of Computer Science IAENG International Journal of Computer Science Volume:
Keywords : Determining Extractive Summary , , Single Document Based    
Abstract:
This paper presents a new unsupervised algorithm for determining extractive summary for a single document using term frequency prediction, which is obtained from memory-based collaborative filtering (CF) approach, and Mean Shift Clustering algorithm. The new algorithm uses Term-Sentence Collaborative Filtering (TSCF) for predicting term frequency. These term frequencies are used in sentence ranking according to the presence percentage of each word/term in each sentence. TSCF computes term frequencies for either terms present or missing (sparse) in a sentence via collaborative filtering prediction algorithm. The new algorithm uses Mean Shift Clustering algorithm as a final framework to group sentences according to their ranks to get more coherent summaries. Experiments show the effect of using different weighting functions including: Term Frequency (TF), Term Frequency Inverse Document Frequency (TFIDF) and binary TF. In addition, they show the effect of using different distance metrics that support sparse matrices representations including: Cosine, Euclidean and Manhattan. Experiments also, show the effect of using L1 and L2 normalization. ROUGE is used as a fully automatic metric in text summarization on DUC2002 datasets. Results show ROUGE-1, ROUGE-2, ROUGE-L and ROUGE-SU4 average recall, precision and f-measure scores, which show the effectiveness of the new algorithm. Results show that the proposed TSCF algorithm has promising results and outperforms related baseline techniques in many ROUGE scores.
   
     
 
       

Author Related Publications

    Department Related Publications

    • Ahmed Salah Mohamed Mostafa, "A data parallel strategy for aligning multiple biological sequences on multi-core computers", Computers in Biology and Medicine, 2013 More
    • Doaa El-Shahat Barakat Mohammed, "Energy-aware whale optimization algorithm for real-time task scheduling in multiprocessor systems", Elsevier, 2020 More
    • Wael Said AbdelMageed Mohamed, "Proof of Credibility: A Blockchain Approach for Detecting and Blocking Fake News in Social Networks", Science and Information Organization, 2019 More
    Tweet