Feature Reduction of Lung Cancer Microarray Data Using Mutual Information Selection and PyCaret-Supported Recursive Feature Elimination

Authors

  • Andrew Jonathan Brahms Simangunsong Universitas Indonesia, Depok 16911, Indonesia
  • Valha Tsabita Hidayat Universitas Indonesia, Depok 16911, Indonesia

DOI:

https://doi.org/10.11594/nstp.2023.3701

Keywords:

Lung cancer, microarray data, feature reduction, mutual information feature selection, recursive feature elimination, PyCaret

Abstract

Lung cancer remains a leading cause of cancer-related mortality worldwide, and Indonesia's ever-increasing amount of pollution signals an urgency for improvement in lung cancer early detection. One of the methods to detect lung cancer is molecular diagnosis using DNA microarray, which has been proven to be effective. However, the complexity of microarray data with a vast number of features hinders the timely and accurate detection of lung cancer. This study seeks to optimize the features of the data to improve classification performance. Our approach combines Mutual Information Feature Selection with Recursive Feature Elimination, leveraging the PyCaret library to train and evaluate machine learning models. The process involves initial feature reduction using Mutual Information to enhance computational efficiency, followed by training machine learning models with PyCaret. The two best-performing models for each dataset are used to perform recursive feature elimination to search for the most optimal feature. A support vector machine is also used for comparison. The final output will be three subsets of features and another subset that consists of combined features of the rest of other subsets. Finally, PyCaret will be utilized again to train machine learning models with all feature subsets. The study shows that other models can select fewer features compared to the Support Vector Machine and still maintain a powerful predictive power with high accuracy (95% - 98%). In conclusion, our research offers a new approach to selecting optimal features for microarray analysis, with implications for more effective and timely cancer diagnosis.

Downloads

Download data is not yet available.

Downloads

Published

21-12-2023

Conference Proceedings Volume

Section

Articles

How to Cite

Simangunsong, A. J. B. ., & Hidayat, V. T. . (2023). Feature Reduction of Lung Cancer Microarray Data Using Mutual Information Selection and PyCaret-Supported Recursive Feature Elimination. Nusantara Science and Technology Proceedings, 2023(36), 1-6. https://doi.org/10.11594/nstp.2023.3701

Similar Articles

1-10 of 635

You may also start an advanced similarity search for this article.