Unveiling Machine
Learning Algorithms for predicting Drug Activity against Lung Cancer Cell Lines
Kanagasabapathy Gokulakrishnan, Krishnamoorthy Hema Nandini Rajendran, Veerappapillai
Shanthi, Pachaiappan Jayakrishnan and Karuppasamy Ramanathan
Res. J. Biotech.; Vol. 20(11); 68-75;
doi: https://doi.org/10.25303/2011rjbt068075; (2025)
Abstract
Lung cancer remains a significant global health concern, posing a substantial burden
on both patients and healthcare systems. As a result, there is an urgent need for
innovative therapeutic interventions to manage lung cancer more effectively. In
this study, we developed classification models using machine learning algorithms
to predict drug responses in lung cancer cell lines. A diverse dataset was retrieved,
consisting of 692 active and 1,071 inactive compounds tested against five major
lung cancer cell lines: CaLu-06, HCC-78, NCI-H322, NCI-H358 and NCI-H522. Drug-like
properties of these compounds were generated and employed as descriptors for model
development.
The proposed method utilised techniques such as z-score, correlation analysis, recursive
feature elimination with cross-validation and SMOTE to pre-process the data and
identify key features. Further, hyperparameter optimisation was conducted using
Optuna to fine-tune model parameters and enhance performance. The results revealed
that Random Forest reached an accuracy of 0.80 and an AUC of 0.85. This positions
it as the best model, with significant implications for drug discovery and personalised
lung cancer therapies. The implementation materials alongside python code are accessible
freely at https://github.com/Gokulakrish13/Machine-Learning-Classifiers-for-Predicting-Active
Molecules-Against-Lung-Cancer-Cells.git.