A Distributed Feature Selection Approach over Hadoop for Accurate Classification based on Grasshopper Algorithm and Rough Sets

Document Type : Scientific

Author

Department of Computer Science, Faculty of Computers and Information, Damanhour University, Damanhour, 22511, Egypt

Abstract

In the specialized field of data analysis, precise feature selection has become paramount, especially given the extensive and in- tricate datasets available. Many of these datasets house a plethora of features, of which a substantial number may be redundant, leading to potential inaccuracies and increased computational demands. Although the Rough Set (RS) and Multigranular Rough Set (MGRS) models have demonstrated efficacy in feature selection, their computational complexities can be limiting. To address this, we introduce an innovative solution, integrating the MGRS with the Grasshopper Optimization Algorithm (GOA)-a meta- heuristic technique derived from grasshopper foraging behaviors. To manage large-scale data, we employ the Hadoop framework for streamlined distributed processing. By distributing the enhanced GOA tasks within Hadoop, we aspire to efficiently process large-scale datasets. The proposed algorithm's efficacy is assessed using dedicated datasets, benchmarked via classifiers such as Random Forest and K-Nearest Neighbor. Preliminary results highlight the superior performance of our approach compared to prevalent metaheuristic strategies, with the MGRS model enhancing performance notably when employed as an objective function.

Keywords

Main Subjects