A Concurrent Tree-Based Clustering Approach for Big Data Applications

Faculty Computer Science Year: 2024
Type of Publication: ZU Hosted Pages:
Authors:
Journal: Volume:
Keywords : , Concurrent Tree-Based Clustering Approach , , Data Applications    
Abstract:
Clustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications face many challenges and difficulties. Since Big Data refers to an enormous amount of data, most traditional clustering algorithms come with high computational costs. Hence, the research question is how to handle this volume of data and get accurate results at a critical time. Despite ongoing research work to develop different algorithms to facilitate complex clustering processes, there are still many difficulties that arise while dealing with a large volume of data. Tasks such as clustering and classification assume the existence of a similarity measure to assess the similarity (or dissimilarity) of a pair of observations or clusters. The key difference between most clustering methods is in their similarity measures. In other direction, finding the rare association rule is of higher importance than the frequent itemset. Unique rules represent rare cases, activities, or events in real-world applications. It is essential to extract exceptional critical activity from vast routine data. In this thesis, we start to review the most relevant clustering algorithms in a categorized manner, provide a comparison of clustering methods for large-scale data and explain the overall challenges based on clustering type. The key idea is to highlight the main advantages and disadvantages of clustering algorithms for dealing with big data in a scalable approach behind the different other features. In Addition, we propose a new similarity measure function called PWO ``Probability of the Weights between Overlapped items'' which could be used in clustering categorical dataset, proves that PWO is a metric, presents a framework implementation to detect the best similarity value for different datasets, and improves the F-Tree clustering algorithm with Semi-supervised method called SF-Tree to refine the results. The experimental evaluation on real categorical datasets shows that PWO is more effective in measuring the similarity between categorical data than state-of-the-art algorithms, Clustering based on PWO with a pre-defined number of clusters results in good separation of classes with a high purity of average 80\% coverage of real classes, and the overlap estimator perfectly estimates the value of the overlap threshold using a small sample of a dataset of around 5\% of data size. Lastly, We propose a new algorithm called FR-Tree to mine the association rules and produce essential rules. This work aims to demonstrate that this algorithm is suitable for extracting rare association rules with high confidence. The proposed algorithm generates, filters, and classifies the all-important rules, either frequent or rare. The rare rules were produced without needing to set an additional threshold. Therefore, the proposed algorithm has an advantage incomparable with the other rare association rule techniques. The generated rules were tested using well-known datasets, and the performance was compared with the other rare association rule techniques. The results proved that our method outperformed the existing rare association rule techniques in term of speed and number of generated rules.
   
     
 
       

Author Related Publications

  • Mahmoud Abdel Moneim Mahdi Abdul Rahman, "Scalable Clustering Algorithms for Big data: A Review", IEEE, 2021 More
  • Mahmoud Abdel Moneim Mahdi Abdul Rahman, "FR-Tree: A novel rare association rule for big data problem", scinapse, 2022 More
  • Mahmoud Abdel Moneim Mahdi Abdul Rahman, "A PROACTIVE INTELLIGENT E-COMMERCE ENVIRONMENT", 2024 More

Department Related Publications

  • Ahmed Salah Mohamed Mostafa, "Lazy-Merge: A Novel Implementation for Indexed Parallel K-Way In-Place Merging", IEEE, 2016 More
  • Mohammed Abdel Basset Metwally Attia, "A Review on the Applications of Neutrosophic Sets", Source: Journal of Computational and Theoretical Nanoscience, Volume 13, Number 1, January 2016, pp. 936-944(9), 2016 More
  • Mohammed Abdel Basset Metwally Attia, "A Review on the Applications of Neutrosophic Sets", Source: Journal of Computational and Theoretical Nanoscience, Volume 13, Number 1, January 2016, pp. 936-944(9), 2016 More
  • Ibrahiem Mahmoud Mohamed Elhenawy, "A Review on the Applications of Neutrosophic Sets", Source: Journal of Computational and Theoretical Nanoscience, Volume 13, Number 1, January 2016, pp. 936-944(9), 2016 More
  • Mohammed Abdel Basset Metwally Attia, "A comparative study of cuckoo search and flower pollination algorithm on solving global optimization problems", emerald insight, 2017 More
Tweet