This work is licensed under a Creative Commons Attribution 4.0 International License.
Feature Reduction of Lung Cancer Microarray Data Using Mutual Information Selection and PyCaret-Supported Recursive Feature Elimination
Corresponding Author(s) : Andrew Jonathan Brahms Simangunsong
Nusantara Science and Technology Proceedings,
Multi-Conference Proceeding Series E
Abstract
Lung cancer remains a leading cause of cancer-related mortality worldwide, and Indonesia's ever-increasing amount of pollution signals an urgency for improvement in lung cancer early detection. One of the methods to detect lung cancer is molecular diagnosis using DNA microarray, which has been proven to be effective. However, the complexity of microarray data with a vast number of features hinders the timely and accurate detection of lung cancer. This study seeks to optimize the features of the data to improve classification performance. Our approach combines Mutual Information Feature Selection with Recursive Feature Elimination, leveraging the PyCaret library to train and evaluate machine learning models. The process involves initial feature reduction using Mutual Information to enhance computational efficiency, followed by training machine learning models with PyCaret. The two best-performing models for each dataset are used to perform recursive feature elimination to search for the most optimal feature. A support vector machine is also used for comparison. The final output will be three subsets of features and another subset that consists of combined features of the rest of other subsets. Finally, PyCaret will be utilized again to train machine learning models with all feature subsets. The study shows that other models can select fewer features compared to the Support Vector Machine and still maintain a powerful predictive power with high accuracy (95% - 98%). In conclusion, our research offers a new approach to selecting optimal features for microarray analysis, with implications for more effective and timely cancer diagnosis.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX