| Journal: |
Scientific Reports
NATURE PORTFOLIO
|
Volume: |
|
| Abstract: |
Air pollution poses a significant threat to public health and environmental sustainability, necessitating accurate predictive models for effective air quality management. This study uses machine learning techniques to forecast air quality through utilizing the annual AQI dataset obtained from the U.S. Environmental Protection Agency (EPA). Feature selection (FS) was conducted using Binary version of Grey Wolf Optimizer (BGWO), Particle Swarm Optimization (BPSO), Whale Optimization Algorithm (BWAO), and a novel hybrid BPSO-BWAO approach to identify the most relevant features for AQI prediction. Among the feature selection methods, BPSO achieved the best Mean Squared Error (MSE) score of 53.56, but with high variance, while BWAO demonstrated lower variance and consistent results. The hybrid BPSO-BWAO method emerged as the optimal solution, achieving an MSE of 53.93 with improved stability and feature set balance, selecting key features such as ‘Days with AQI,’ ‘Median AQI,’ ‘Days CO,’ ‘Days NO2,’ ‘Days PM2.5,’ ‘Good_Days_Percent,’ and ‘Unhealthy_Days_Percent.’ Machine learning models, including Random Forest (RF), Gradient Boosting (GB), K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), Support Vector Machine (SVM), and Linear Regression (LR), were evaluated before and after feature selection. The Random Forest model achieved the best performance after feature selection with an MSE of 53.93, R² of 0.9710, and reduced fitted time. Further optimization using novel hybrid PSO-WAO enhanced RF performance, achieving an improved MSE of 51.82 and R² of 0.9821, demonstrating the efficacy of hyperparameter tuning. The study concludes that feature selection and hyperparameter optimization significantly improve model accuracy and computational efficiency, offering a robust framework for air quality forecasting.
|
|
|