literature-review/report.tex

\documentclass[a4paper, final]{article}
%\usepackage{literat} % Нормальные шрифты
\usepackage[14pt]{extsizes} % для того чтобы задать нестандартный 14-ый размер шрифта
\usepackage{tabularx}
\usepackage[T2A]{fontenc}
\usepackage[utf8]{inputenc}
% \usepackage[russian]{babel}
\usepackage{amsmath}
\usepackage[left=25mm, top=20mm, right=20mm, bottom=20mm, footskip=10mm]{geometry}
\usepackage{ragged2e} %для растягивания по ширине
\usepackage{setspace} %для межстрочного интервала
\usepackage{moreverb} %для работы с листингами
\usepackage{indentfirst} % для абзацного отступа
\usepackage{moreverb} %для печати в листинге исходного кода программ
\usepackage{graphicx}

\usepackage{pdfpages}
\usepackage{tikz}

\usepackage{array}
\usepackage{multirow}

\renewcommand\verbatimtabsize{4\relax}
\renewcommand\listingoffset{0.2em} %отступ от номеров строк в листинге
\renewcommand{\arraystretch}{1.4} % изменяю высоту строки в таблице
\usepackage[font=small, singlelinecheck=false, justification=centering, format=plain, labelsep=period]{caption} %для настройки заголовка таблицы
\usepackage{listings} %листинги
\usepackage{xcolor} % цвета
\usepackage{hyperref}% для гиперссылок
\usepackage{enumitem} %для перечислений
\newtheorem{theorem}{Теорема} % Создание нового окружения для теорем
\setlist[enumerate,itemize]{leftmargin=1.2cm} %отступ в перечислениях

\hypersetup{colorlinks,
    allcolors=[RGB]{010 090 200}} %красивые гиперссылки (не красные)

% подгружаемые языки — подробнее в документации listings (это всё для листингов)
\lstloadlanguages{ C++}
% включаем кириллицу и добавляем кое−какие опции
\lstset{tabsize=2,
    breaklines,
    basicstyle=\footnotesize,
    columns=fullflexible,
    flexiblecolumns,
    numbers=left,
    numberstyle={\footnotesize},
    keywordstyle=\color{blue},
    inputencoding=cp1251,
    extendedchars=true
}
\lstdefinelanguage{MyC}{
    language=C++,
%    ndkeywordstyle=\color{darkgray}\bfseries,
%    identifierstyle=\color{black},
%    morecomment=[n]{/**}{*/},
%    commentstyle=\color{blue}\ttfamily,
%    stringstyle=\color{red}\ttfamily,
%    morestring=[b]",
%    showstringspaces=false,
%    morecomment=[l][\color{gray}]{//},
    keepspaces=true,
    escapechar=\%,
    texcl=true
}

\textheight=24cm % высота текста
\textwidth=16cm % ширина текста
\oddsidemargin=0pt % отступ от левого края
\topmargin=-1.5cm % отступ от верхнего края
\parindent=24pt % абзацный отступ
\parskip=5pt % интервал между абзацами
\tolerance=2000 % терпимость к "жидким" строкам
\flushbottom % выравнивание высоты страниц


% Настройка листингов
\lstset{
    language=C++,
    extendedchars=\true,
    inputencoding=utf8,
    keepspaces=true,
    % captionpos=b,
}

\begin{document} % начало документа

    % НАЧАЛО ТИТУЛЬНОГО ЛИСТА
    \begin{center}
        \hfill \break
        \hfill \break
        \normalsize{MINISTRY OF SCIENCE AND HIGHER EDUCATION OF THE RUSSIAN FEDERATION\\
        Federal State Autonomous Educational Institution of Higher Education Peter the Great St. Petersburg Polytechnic University\\[10pt]}
        \normalsize{Institute of Computer Science and Cybersecurity}\\[10pt]
        \normalsize{Higher School of Artificial Intelligence Technology}\\[10pt]
        \normalsize{Direction 02.03.01 Mathematics and computer Science}\\

        \hfill \break
        \hfill \break
        \hfill \break
        \hfill \break
        \large{\textbf{Literature Review}}\\
        \large{\textit{Machine learning approaches for assessing drug resistance in cancer treatment}}\\

        \hfill \break
        \hfill \break
    \end{center}

    \small{
        \begin{tabular}{lrrl}
            \!\!\!Student, & \hspace{2cm} & & \\
            \!\!\!group 5130201/20102 & \hspace{2cm} & \underline{\hspace{3cm}} &Tishenko А. А. \\\\
            \!\!\!Supervisor, Ph. D. & \hspace{2cm} &  \underline{\hspace{3cm}} &  Motorin D. E. \\\\
            &&\hspace{4cm}
        \end{tabular}
        \begin{flushright}
            <<\underline{\hspace{1cm}}>>\underline{\hspace{2.5cm}} 2024г.
        \end{flushright}
    }

    \hfill \break
    % \hfill \break
    \begin{center} \small{Saint-Petersburg, 2024} \end{center}
    \thispagestyle{empty} % выключаем отображение номера для этой страницы

    % КОНЕЦ ТИТУЛЬНОГО ЛИСТА
    \newpage

    % \tableofcontents
    % \newpage

    \section*{Introduction}
    \addcontentsline{toc}{section}{Introduction}
    Progress has been made in chemotherapy drugs, but drug resistance remains a major challenge in cancer treatment and the main cause of cancer progression and even death. However, there are no clear indicators for predicting the risk of drug resistance in patients. Existing drug sensitivity assessment methods has limitations such as low modeling success rates, high cost, and time-consuming process. Machine learning is both an expanding and evolving field of computing, and it seems that it can significantly help in solving chemotherapy resistance problem. Here we provide an overview of how different studies apply machine learning algorithms to predict and understand chemotherapy resistance in various cancer types. Also we consider the strengths and limitations of each approach and discuss obtained results.

    \newpage

    \section{Machine learning and chemotherapy resistance}
    Machine learning has been widely applied to various classification, regression, feature extraction and many other problems in the field of biology and medicine. The field of cancer treatment has also not been left aside, in particular, machine learning has recently been actively used in research related to the problem of cancer cell chemotherapy resistance.

    Authors of~\cite{paclitaxel} applied and compared five different machine learning algorithms to classify cancer cells based on their level of drug resistance. They extracted 112 morphological features from dataset of nearly 3000 single-cell quantitative phase images of epithelial ovarian cancer (EOC) cells. After that, authors employed five supervised machine learning algorithms, Tree, Naive Bayes, K-nearest neighbors (KNN), support vector machine (SVM), and neural network (NN), to perform multi-classification on four types of drug-resistant cancer cells. The optimal classification algorithm was determined by comparing the classification testing accuracy for each cell type and the confusion matrix. The chosen trained model was then used for further interpretable analysis.

    Another study aims to evaluate the potential of mitochondria-related chemoradiotherapy (CRT) resistance (MRCRTR) genes in predicting esophageal cancer prognosis using machine learning \cite{mitochondria}. Authors used machine learning algorithms for both classification and regression tasks. For classification they applied seven algorithms: generalized linear model (GLM), K-nearest neighbor (KNN), least absolute shrinkage and selection operator (LASSO) regression, neural network (NN), random forest (RF), support vector machine (SVM), extreme gradient
    boosting (XGB). They applied those algorithms to pretty similiar task as in~\cite{paclitaxel}, but in this paper authors identified only two classes -- CRT response and CRT non-response. The authors did not stop at classification alone, but also trained 10 machine learning algorithms, including random survival forest (RSF), elastic network (Enet), LASSO, ridge, stepwise Cox, Coxboost, partial least squares regression for Cox (plsRcox), supervised principal components (SuperPC), generalized boosted regression modeling (GBM), and survival support vector machine (survival-SVM), to build consensus prognostic model to predict MRCRTR score. Using the leave-one-out cross-validation (LOOCV) framework, a total of 101 algorithm combinations were applied to match prognostic models.

    Machine learning algorithms also was successfully applied for same classification task as in~\cite{paclitaxel} and~\cite{mitochondria} by authors of~\cite{sers}. They employed robust machine learning algorithm based on principal component analysis and linear discriminant analysis (PCA-LDA)  to extract the feature of blood-SERS data and establish an effective predictive model for identifying the radiotherapy resistance subjects from sensitivity ones, and for identifying the nasopharyngeal cancer (NPC) subjects from healthy ones.

    The authors of article~\cite{heterogeneity} chose a different approach by applying machine learning algorithms from the specialized software CellProfiler~\cite{cellprofile} to extract quantitative image features. They subsequently used bioinformatics analysis to explore the relationship between these features of intra-tumor heterogeneity (ITH) and drug resistance. Notably, the authors did not aim to train new models but instead utilized pre-trained algorithms from CellProfiler. Unlike studies \cite{paclitaxel}, \cite{mitochondria}, and \cite{sers}, where algorithms were employed for regression and classification tasks, this research focused specifically on extracting quantitative features from images. Based on CellProfiler, the authors constructed a pipeline for the extraction and analysis of these features, which enabled them to draw conclusions regarding the connection between these features and drug resistance in cancer cells.

    In~\cite{platinum}, the authors performed differential protein analysis on the expression profiles of 745 proteins related to platinum-based chemotherapy resistance. They used LASSO regression to select 10 proteins linked to chemotherapy outcomes, followed by univariate logistic regression on nine clinical factors. Variables with p < 0.1 were included in a multivariate logistic regression analysis, resulting in four significant variables: three proteins and one clinical parameter (postoperative residual tumor). This analysis enabled the construction of a predictive machine-learning model for chemotherapy resistance in patients with EOC.

    The authors of article~\cite{kras} applied machine learning algorithms for two goals. Firstly, they used algorithms to extract genes highly related with therapy resistance. Each sample of their data contained the expression of 8687 genes and only a small portion was correlated with targeted therapy resistance. To extract highly related genes in this study authors attempted seven algorithms, including Least Absolute Shrinkage and Selection Operator (LASSO), Light Gradient Boosting Machine (LightGBM), Monte Carlo Feature Selection (MCFS), Minimum Redundancy Maximum Relevance (mRMR), Random Forest (RF) -based, Categorical Boosting (CATBoost), and eXtreme Gradient Boosting (XGBoost). Secondly, they selected four algorithms to perform binary classification (resistant vs sensitive) of tumor cells based on extracted features, namely, random forest (RF), support vector machine (SVM), K-Nearest Neighbors (KNN), and decision tree (DT).

    The authors of article~\cite{glut} took an alternative approach: instead of directly predicting chemotherapy resistance, they constructed the machine learning-derived immunosenescence-related score (MLIRS) score. Patients with high MLIRS scores had a worse prognosis. In contrast, the low MLIRS score group demonstrated greater sensitivity to both chemotherapy and immunotherapy. To obtain an optimal hazard scoring system, they trained a total of 101 combined machine learning algorithms (based on 10-fold cross-validation) across 10 basal categories: survival support vector machine (survival-SVM), CoxBoost, random survival forest (RSF), Lasso, stepwise Cox, partial least squares regression for Cox (plsRcox), Ridge, supervised principal components (SuperPC), elastic network (Enet), and generalized boosted regression modeling (GBM). In this study, these algorithms were applied to a regression task, allowing the authors to compute the coefficients for the MLIRS formula:
    $$
    \text{MLIRS} = (\text{expr}_{\text{gene1}} \times \text{coff}_{\text{gene1}}) + (\text{expr}_{\text{gene2}} \times \text{coff}_{\text{gene2}}) + \ldots + (\text{expr}_{\text{gene}_n} \times \text{coff}_{\text{gene}_n})
    $$
    where: \(\text{expr}_{\text{gene}}\) denotes the expression level of each gene, \(\text{coff}_{\text{gene}}\) represents the coefficient for each gene, as determined by the model. Authors derived the C-index value of each machine learning algorithm in each dataset and identified the algorithm with the largest mean C-index as the optimal hazard scoring algorithm.

    The authors of~\cite{tabular} developed a machine learning model to predict cisplatin sensitivity based on gene expression changes induced by cisplatin treatment. They combined gene expression data from sensitive ovarian cancer cell lines and patients with specific signaling alterations to identify a gene signature. Using this signature, they trained TabNet, an interpretable deep learning algorithm for tabular data, to perform binary classification of sensitivity to cisplatin. Also several other machine learning algorithms, including Ridge, LASSO, Elastic Net, Nu-Support Vector Classification (Nu-SVC), XGBoost, and Random Forest, were applied to the same task for comparission with TabNet.


    \section{Datasets}
    Data plays a crucial role in machine learning, serving as the foundation for model training and evaluation. The quality and quantity of data directly influence the performance and generalizability of machine learning algorithms. In the fields of biology and medicine, data collection is often costly and time-consuming. Additionally, the complexity and variability inherent in biological systems further complicate data acquisition and interpretation. In cancer research, these challenges are even more pronounced due to the heterogeneity of tumors and the intricate nature of cancer biology. However, there are valuable resources available, such as the Gene Expression Omnibus (GEO) database~\cite{geo} and The Cancer Genome Atlas (TCGA) database~\cite{tcga}, which provide researchers with access to extensive datasets. Moreover, nonprofit organizations like the American Type Culture Collection (ATCC)~\cite{atcc} enable researchers to obtain biological materials, including cancer cells.

    In articles~\cite{paclitaxel}, \cite{sers} and \cite{cervical} authors decided to prepare their own datasets specifically for their research.

    In~\cite{paclitaxel} four kinds of epithelial ovarian cancer cells with different drug sensitivity (SKOV3, SKOV3\_Ta\_2\textmu M, SKOV3\_Ta\_8\textmu M, and SKOV3\_Ta\_20\textmu M) were studied. The SKOV3 cells were sourced from the ATCC~\cite{atcc} and preserved at the Obstetrics and Gynecology Laboratory of Peking University People’s Hospital. The drug-resistant characteristics of SKOV3\_Ta\_2\textmu M, SKOV3\_Ta\_8\textmu M, and SKOV3\_Ta\_20\textmu M were acquired by progressively exposing SKOV3 cells to varying concentrations of paclitaxel. After approximately ten months, all the drug-resistant cancer cells were acquired. They then utilized Digital Holographic Flow Cytometry (DHFC), an advanced technology for label-free, high-throughput cell detection. Using DHFC along with additional post-processing, the authors generated a dataset comprising approximately 3000 a quantitative phase images (QPIs) of EOC cells, each sized at 300 by 300 pixels.  Fig.~\ref{fig:skov3} presents the reconstructed QPIs of EOC cells with various degrees of drug resistance.

    \begin{figure}[h]
       \centering
       \includegraphics[width=1\linewidth]{img/skov3.png}
       \caption{Reconstructed QPIs of EOC cells used by authors of~\cite{paclitaxel}.}
       \label{fig:skov3}
    \end{figure}

    The dataset in~\cite{sers} was based on clinical plasma samples from 60 healthy volunteers which were used as a control group, and 60 nasopharyngeal cancer patients (30 plasma samples from radiotherapy sensitivity patients and 30 plasma samples from radiotherapy resistance patients). All plasma samples were obtained from Fujian Provincial Cancer Hospital. As well as in~\cite{paclitaxel}, authors used unique method called surface enhanced Raman spectroscopy (SERS) to extract molecular profiles of patients plasma. Authors even claim that SERS based on surface plasmon resonance was used for this task for the first time. The SERS spectra were processed by deducting the fluorescence background signal using a fifth-order polynomial fitting method, and then the SERS signals were peak normalized, after which the spectra of the same plasma sample were averaged to represent the final SERS data for that sample.

    In~\cite{cervical}, authors prepared dataset with 259 samples. They choosed 259 patients at the People’s Hospital of Gansu Province and the First and Second Hospital of Lanzhou University who were diagnosted with locally advanced cervical cancer (LACC), applied neoadjuvant chemotherapy (NACT) to them and extracted their whole blood genomic DNA. After that 24 SNPs from PTEN/PI3K/AKT pathway: PTEN, PIK3CA, Akt1, and Akt2 were selected. 70 features were generated from 24 SNPs in the raw data using the one-hot encoding method resulting in 259x70 dataset. Clinical examination, colposcopy, and abdominal computer tomography were used to estimate the change of tumor size in all patients before and after each NACT cycle. In this study, patients with a complete response and partial response were classified as NACT effective group, and patients with stable disease and progressive disease were considered NACT ineffective group.

    Authors of articles~\cite{heterogeneity}, \cite{mitochondria}, \cite{kras}, \cite{glut} and~\cite{tabular} turned to open databases to prepare datasets for their research. Authors of~\cite{heterogeneity} downloaded frozen histopathologic images of 494 ovarian and 70 paracarcinoma tissues with hematoxylin–eosin (HE) staining from TCGA~\cite{tcga}. The corresponding clinical information, genomics, and transcriptomics profiles required for this study were also obtained from this database. Authors of~\cite{mitochondria} also used TCGA. They downloaded  information on 183 esophageal cancer patients (95 squamous cell carcinomas and 88 adenocarcinomas) was obtained, including mRNA expression profiles, clinical features such as survival time and status, age, gender, and pathological stage (T, N, and M). Additionally authors used Gene Expression Omnibus (GEO) database~\cite{geo}. RNA sequencing (RNA-seq) for GSE45670 was downloaded from it.  GSE45670 includes a total of 17 esophageal squamous cell carcinomas (ESCC) that did not respond to preoperative CRT, 11 ESCC that responded to preoperative CRT, and 10 samples from normal esophageal epithelium. The GEO dataset GSE53625 comprises 358 samples, including 179 ESCC tissue samples and an equal number of samples of adjacent normal tissues, along with detailed clinical data for the 179 ESCC patients. The GEO dataset GSE19417 contains data from 76 esophageal adenocarcinoma patients, offering detailed clinical data for 48 of these patients. Authors of~\cite{kras} also took gene expression profile data from GEO database, specifically from accession number GSE137912. Their analysis involved 7612 samples treated with KRAS G12C inhibitors. Among these samples, 4297 were tumor cells that persisted in proliferation, whereas 3315 were tumor cells that had ceased proliferating. Each sample contained the expression of 8687 genes. In~\cite{glut}, authors used datasets from both TCGA and GEO and also from European Genome-Phenome Archive (EGA)~\cite{ega}. Authors of~\cite{tabular} used GSE47856, GSE15622 and GSE146965 from the GEO database and RNAseq data from TCGA.

    In article~\cite{platinum}, authors prepared their own dataset and also used open databases. In this study, 4D data-independent acquisition (DIA) proteomic sequencing was performed on tissue-derived extracellular vesicles (tsEVs) obtained from 58 platinum-sensitive and 30 platinum-resistant patients with EOC. Also authors used the GSE15372, GSE33482, GSE26712 and GSE63885 microarray datasets from the Gene Expression Omnibus database~\cite{geo}. GSE15372 and GSE33482 represent EOC cell line-derived RNA microarray datasets, comprising 5 and 5 and 6 and 6 platinum-sensitive and resistant cell line samples, respectively. GSE26712 and GSE63885 involve clinical and sequencing data for 195 and 101 EOC patients, respectively. Additionally, transcriptomic sequencing data and clinical information from the tumour tissues of 379 patients with EOC, sourced from the TCGA database~\cite{tcga}, was used.

    % \section{Feature analysis}

    \section{Results}
    In all works, the construction of machine learning models is essentially a secondary result. First of all, studies show the applicability of these methods to tasks related to the problems of cancer cell resistance to chemotherapy. Also, using machine learning methods, the authors test their hypotheses, confirm or discover links between various characteristics of cancer cells, patient clinical data and drug resistance.

    In articles \cite{paclitaxel}, \cite{sers}, \cite{platinum}, \cite{kras}, \cite{cervical}, \cite{tabular}, the authors try to solve the problem of determining drug resistance directly. In \cite{sers}, \cite{platinum}, \cite{kras}, \cite{cervical}, \cite{tabular}, the problem of binary classification (drug resistant vs drug sensitive) is solved, and in \cite{paclitaxel}, cells are classified into 4 classes, which constitute a gradation of the level of resistance of cancer cells to chemotherapy.

    In \cite{paclitaxel}, five different machine learning algorithms were compared, the best results were achieved using support vector machine (accuracy of 93.4\%) and neural network (accuracy of 94.5\%). The classification was based on morphological features and, by constructing effective classifiers, the authors demonstrated that these features are directly related to the level of resistance of cancer cells to chemotherapy. Also, using SHapley Additive exPlanations authors showed that only a 25 of 112 features are really important for the classification.

    The authors of \cite{sers}, applied robust machine learning algorithm based on principal component analysis and linear discriminant analysis and established an effective predictive model with the accuracy of 96.7\% for identifying the radiotherapy resistance subjects from sensitivity ones, and 100\% for identifying the NPC subjects from healthy ones. Also authors showed the importance of the separation of plasma into upper and lower plasma by comparing model results, e. g. for upper plasma and radiotherapy resistance vs. radiotherapy sensitivity classification task their model achieved 98.7\% accuracy while for lower plasma it is only at level of 93.9\%.

    LASSO-based classifier was built by authors of~\cite{platinum}. Their model achieved Area Under Curve (AUC) of 0.864. By analysing their model and its results authors found that three immune-related proteins—CCR1, IGHV3-35, and CD72—along with the presence of postoperative residual tumors, are strong predictors of platinum resistance in EOC patients.

    In \cite{kras}, authors firstly applied machine learning algorithms to extract most important features and created seven feature lists, after that they applied four classification algorithms. Their best result was achieved with CATBoost feature list and support vector machine as classification algorithms (accuracy of 93.1\%). Also after analysing recieved feature lists authors were able to identify top genes associated with tumor progression and drug resistance (H2AFZ, CKS1B, TUBA1B, RRM2, BIRC5).

    The study \cite{cervical} employed a Random Forest model utilizing genomic features. The model successfully predicted the response to platinum-based neoadjuvant chemotherapy in patients with locally advanced cervical cancer (LACC). However, the main focus of the study was not on building the model but on analyzing feature importance to identify key genes associated with chemoresistance. Through importance analysis, the authors identified that the top three significant single nucleotide polymorphisms (SNPs)—rs4558508, rs1130233, and rs7259541—were all located within the Akt gene family. Specifically, patients carrying the heterozygous GA genotype in Akt2 rs4558508 had a significantly increased risk of chemoresistance compared to those with GG or AA genotypes.

    The authors of \cite{tabular} developed a deep learning model using the TabNet algorithm to predict cisplatin sensitivity based on cisplatin-perturbed gene expression data. Their model achieved over 80\% accuracy, surpassing a variety of other machine learning algorithms such as ridge regression, lasso, elastic net, Nu-SVC, XGBoost, and random forest. The TabNet model consistently demonstrated strong predictive performance with an average AUC of 0.808 across 500 different sample splits. By analyzing feature importance, the authors identified several key genes contributing to cisplatin resistance, most notably BCL2L1. The upregulation of BCL2L1, along with genes like CCND1 and PLK2, was associated with poor survival in ovarian cancer patients, highlighting potential targets for overcoming drug resistance. These findings are in line with the results of \cite{kras}, where important genes associated with tumor progression and drug resistance were also identified using machine learning feature selection techniques.

    In articles \cite{heterogeneity}, \cite{mitochondria}, \cite{glut}, the authors used machine learning for a different tasks. In~\cite{heterogeneity} used machine learning algorithms from the specialized software CellProfiler~\cite{cellprofile} to extract quantitative image features and then performed statistical analysis of feature importance. The authors of~\cite{mitochondria} and~\cite{glut} applied machine learning algorithms for the regression task and proposed their own scores, mitochondria related chemoradiotherapy resistance (MRCRTR) score and machine learning-derived immunosenescence-related score (MLIRS), respectively.

    The study \cite{heterogeneity} demonstrated that specific computational pathomic signatures extracted from histopathological images can effectively predict drug resistance in ovarian cancer patients. By analyzing 1212 statistical image features derived from whole-slide images, the authors identified 26 key features related to patient survival. Among these, the Perimeter.sd feature, which measures the standard deviation of nuclear perimeter, stood out as the most significant predictor. A higher Perimeter.sd value was positively correlated with increased intra-tumor heterogeneity and was associated with a higher risk of platinum-based chemotherapy resistance.

    The authors of \cite{mitochondria} developed a prognostic model based on mitochondria-related chemoradiotherapy resistance (MRCRTR) genes to predict survival outcomes in esophageal cancer patients. They identified six key genes (CTSL, TBL1X, CLN8, MMP1, PDPN, and MRPL37) that have high diagnostic value for chemoradiotherapy resistance. The MRCRTR score derived from these genes showed that patients with high scores had significantly lower survival rates than those with low scores (log-rank test, $p < 0.001$). Cox regression analyses confirmed the MRCRTR score as an independent prognostic factor. Additionally, the MRCRTR score was significantly correlated with increased expression of immune checkpoints and higher angiogenesis, epithelial-mesenchymal transition (EMT), and cancer-associated fibroblast (CAF) scores.

    The authors of \cite{glut} identified two immunosenescence-associated phenotypes (IMSP1 and IMSP2) with significant differences in prognosis and immune cell infiltration. The authors constructed a Machine-Learning Immunosenescence-Related Scoring (MLIRS) system using a combination of stepwise Cox regression and generalized boosted regression modeling (GBM), integrating multiple machine learning algorithms across 101 cross-validation methods. Their MLIRS model demonstrated robust prognostic performance with an Area Under Curve (AUC) of 0.91. They found that patients with high MLIRS scores had worse prognosis and lower abundance of immune cell infiltration, whereas those with low MLIRS scores showed better sensitivity to chemotherapy and immunotherapy.

    \addtocounter{table}{1}
    \includepdf[pages={1}, fitpaper, pagecommand={
        \thispagestyle{empty}
            \begin{tikzpicture}[remember picture, overlay]
                \node at (current page.north) [anchor=north, yshift=-25pt] {
                    \begin{minipage}{3.38\textwidth}
                        Table 1. Methods used in research papers. Abbreviations: Epithelial Ovarian Cancer (EOC), ESophageal Cancer (ESC), NaSopharyngeal Cancer (NSC), Lung Cancer (LC), Pancreatic Cancer (PC), Cervical Cancer (CC), Decision Tree (DT), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Neural Network (NN), Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), Principal Component Analysis - Linear Discriminant Analysis (PCA-LDA), eXtreme Gradient Boosting (XGB), Generalized Linear Model (GLM), Logistic Regression (LR), Cox Regression based algorithms including stepwise Cox, Coxboost, plsRcox (Cox), Supervised Principal Components (SuperPC), Elastic Network (Enet), Gradient Boosting Machine (GBM).
                    \end{minipage}
                };
            \end{tikzpicture}
    }]{methods_table/methods.pdf}


    \newpage
    \begin{table}[h!]
        \centering
        \caption{Methods used in research papers.}
        \footnotesize
        \begin{tabularx}{\textwidth}{|X|p{2cm}|X|X|X|}
        \hline
        \textbf{Article} & \textbf{Cancer type} & \textbf{Machine learning algorithms} & \textbf{Datasets} & \textbf{Feature importance analysis} \\
        \hline
        Classification of paclitaxel-resistant ovarian cancer cells using holographic flow cytometry through interpretable machine learning~\cite{paclitaxel} & Epithelial ovarian cancer (EOC) & Tree, Naive Bayes, K-nearest neighbors
        (KNN), support vector machine (SVM), and neural network (NN) & Self-produced dataset of 2998 quantitative phase images (QPIs) of EOC cells & SHapley Additive
        exPlanations (SHAP), Pearson coefficient, Kruskal-Wallis test \\
        \hline
        Heterogeneity of computational pathomic signature predicts drug resistance and intra-tumor heterogeneity of ovarian cancer~\cite{heterogeneity} & Epithelial ovarian cancer (EOC) & CellProfiler~\cite{cellprofile}, least absolute shrinkage and selection operator (LASSO) regression & 494  ovarian and 70 paracarcinoma tissues images from The Cancer Genome Atlas (TCGA) database~\cite{tcga} & Statistical analysis using R~\cite{r-lang}. Various visualizations, including heatmaps, Venn diagrams, ROC curves, and survival curves. \\
        \hline
        Mitochondria-related chemoradiotherapy resistance genes-based machine learning model associated with immune cell infiltration on the prognosis of esophageal cancer and its value in pan-cancer~\cite{mitochondria} & Esophageal cancer & Generalized linear model (GLM), K-nearest neighbor (KNN),  least absolute shrinkage and selection operator (LASSO) regression, neural network (NN), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB) & Nearly 500 tissue samples, RNA-sequences and some other clinical data from Gene Expression Omnibus (GEO) database~\cite{geo}, information on 183 esophageal cancer patients from The Cancer Genome Atlas (TCGA) database~\cite{tcga} & Statistical analysis using DALEX package~\cite{dalex} for~R~\cite{r-lang} \\
        \hline
        A Predictive Model for Initial Platinum-Based Chemotherapy Efficacy in Patients with Postoperative Epithelial Ovarian Cancer Using Tissue-Derived Small Extracellular Vesicles~\cite{platinum} & Epithelial ovarian cancer (EOC) & Least absolute shrinkage and selection operator (LASSO) regression, logistic regression (LR) & Nearly 300 tissue samples, and other clinical data from Gene Expression Omnibus (GEO) database~\cite{geo}, transcriptomic sequencing data and clinical information from tumor tissues of 379 EOC patients from The Cancer Genome Atlas (TCGA) database~\cite{tcga} & \\
        \hline
    \end{tabularx}
\end{table}

\newpage
\addtocounter{table}{-1}
\begin{table}[h!]
    \centering
    \caption{Methods used in research papers (continued).}
    \footnotesize
    \begin{tabularx}{\textwidth}{|X|p{2cm}|X|X|X|}
        \hline
        \textbf{Article} & \textbf{Cancer type} & \textbf{Machine learning algorithms} & \textbf{Datasets} & \textbf{Feature importance analysis} \\
        \hline
        Molecular separation-assisted label-free SERS combined with machine learning for nasopharyngeal cancer screening and radiotherapy resistance prediction~\cite{sers} & Nasopharyng-eal cancer &  Principal component analysis and linear discriminant analysis (PCA-LDA) & Self-produced dataset of 120 plasma samples, 60 of which from healthy volunteers, 30 from radiotherapy sensitivity patients and 30 from radiotherapy resistance patients & \\
        \hline
        Identifying genes associated with resistance to KRAS G12C inhibitors via machine learning methods~\cite{kras} & Lung cancer & Random forest (RF), support vector machine (SVM), K-nearest neighbors (KNN), decision tree (DT) & 7612 sample of gene expression profile data from Gene Expression Omnibus (GEO) database~\cite{geo}. Each sample contained the expression of 8687 genes & Seven feature ranking algorithms were applied, including least absolute shrinkage and selection operator (LASSO), light gradient boosting machine (LightGBM), monte carlo feature selection (MCFS), minimum redundancy maximum relevance (mRMR), random forest (RF) - based, categorical boosting (CATB), and extreme gradient boosting (XGB) \\
        \hline
        Turning to immunosuppressive tumors: Deciphering the immunosenescence-related microenvironment and prognostic characteristics in pancreatic cancer, in which GLUT1 contributes to gemcitabine resistance~\cite{glut} & Pancreatic cancer & Support vector machine (SVM), CoxBoost, random forest (RF), least absolute shrinkage and selection operator (LASSO), stepwise Cox, partial least squares regression for Cox (plsRcox), Ridge, supervised principal components (SuperPC), elastic network (Enet), generalized boosted regression modeling (GBM) & Nearly 1000 samples from 12 datasets from The Cancer Genome Atlas (TCGA)~\cite{tcga}, Gene Expression Omnibus (GEO)~\cite{geo} and The European Genome-phenome Archive (EGA)~\cite{ega} & The univariate Cox regression analysis was used to identify immunosenescence-related genes with prognostic significance in pancreatic cancer. Genes with a p-value of less than 0.01 were selected as meaningful features for subsequent analysis \\
        \hline
        \end{tabularx}
    \end{table}


    \includepdf[pages={1}, fitpaper, pagecommand={
    \thispagestyle{empty}
        \begin{tikzpicture}[remember picture, overlay]
            \node at (current page.north) [anchor=north, yshift=-20pt] {
                \begin{minipage}{1.63\textwidth}
                    Table 1. Methods used in research papers. Abbreviations: Area under curve (AUC), Root Mean Squared Error (RMSE), Receiver Operating Characteristic (ROC), Matthew's correlation coefficient (MCC).
                \end{minipage}
            };
        \end{tikzpicture}
    }]{results_table/results.pdf}


    % \section*{Conclusion}
    % \addcontentsline{toc}{section}{Conclusion}
    % Conclusion text

    \newpage
    % \section*{Literature}
    % \addcontentsline{toc}{section}{Literature}

    \vspace{-1.5cm}
    \begin{thebibliography}{0}
        \bibitem{paclitaxel}
        Lu Xin, Wen Xiao, Huanzhi Zhang, Yakun Liu, Xiaoping Li, Pietro Ferraro, Feng Pan, Classification of paclitaxel-resistant ovarian cancer cells using holographic flow cytometry through interpretable machine learning, 2024.
        \bibitem{heterogeneity}
        Qiuli Zhu, Hua Dai, Feng Qiu, Weiming Lou, Xin Wang, Libin Deng, Chao Shi, Heterogeneity of computational pathomic signature predicts drug resistance and intra-tumor heterogeneity of ovarian cancer, 2024.
        \bibitem{mitochondria}
        Ziyu Liu, Zahra Zeinalzadeh, Tao Huang, Yingying Han, Lushan Peng, Dan Wang, Zongjiang Zhou, DIABATE Ousmane, Junpu Wang, Mitochondria-related chemoradiotherapy resistance genes-based machine learning model associated with immune cell infiltration on the prognosis of esophageal cancer and its value in pan-cancer, 2024.
        \bibitem{sers}
        Jun Zhang, Youliang Weng, Yi Liu, Nan Wang, Shangyuan Feng, Sufang Qiu, Duo Lin, Molecular separation-assisted label-free SERS combined with machine learning for nasopharyngeal cancer screening and radiotherapy resistance prediction, 2024.
        \bibitem{platinum}
        Shen S, Wang C, Gu J, Song F, Wu X, Qian F, Chen X, Wang L, Peng Q, Xing Z, Gu L, Wang F, Cheng X. A Predictive Model for Initial Platinum-Based Chemotherapy Efficacy in Patients with Postoperative Epithelial Ovarian Cancer Using Tissue-Derived Small Extracellular Vesicles, 2024.
        \bibitem{kras}
        Xiandong Lin, QingLan Ma, Lei Chen, Wei Guo, Zhiyi Huang, Tao Huang, Yu-Dong Cai, Identifying genes associated with resistance to KRAS G12C inhibitors via machine learning methods, 2023.
        \bibitem{glut}
        Si-Yuan Lu, Qiong-Cong Xu, De-Liang Fang, Yin-Hao Shi, Ying-Qin Zhu, Zhi-De Liu, Ming-Jian Ma, Jing-Yuan Ye, Xiao Yu Yin, Turning to immunosuppressive tumors: Deciphering the immunosenescence-related microenvironment and prognostic characteristics in pancreatic cancer, in which GLUT1 contributes to gemcitabine resistance, 2024.
        \bibitem{cervical}
        Lu Guo, Wei Wang, Xiaodong Xie, Shuihua Wang, Yudong Zhang, Machine learning-based models for genomic predicting neoadjuvant chemotherapeutic sensitivity in cervical cancer, 2023.
        \bibitem{tabular}
        Ahmad Nasimian, Mehreen Ahmed, Ingrid Hedenfalk, Julhash U. Kazi, A deep tabular data learning model predicting cisplatin sensitivity identifies BCL2L1 dependency in cancer, 2023.
        \bibitem{cellprofile}
        T. Misteli, C. McQuin, A. Goodman, V. Chernyshev, L. Kamentsky, B.A. Cimini, et al., CellProfiler 3.0: next-generation image processing for biology, 2018.
        \bibitem{tcga}
        The Cancer Genome Atlas (TCGA) database. Available at \url{https://www.cancer.gov/ccg/research/genome-sequencing/tcga}. Accessed October 8, 2024.
        \bibitem{geo}
        Gene Expression Omnibus (GEO) database. Available at \url{https://www.ncbi.nlm.nih.gov/geo/}. Accessed October 8, 2024.
        \bibitem{ega}
        The European Genome-phenome Archive (EGA). Available at \url{https://ega-archive.org/}. Accessed October 8, 2024.
        \bibitem{atcc}
        American Type Culture Collection (ATCC). Available at \url{https://www.atcc.org/}. Accessed October 8, 2024.
        \bibitem{r-lang}
        The R Project for Statistical Computing. Available at \url{https://www.r-project.org/}. Accessed October 8, 2024.
        \bibitem{dalex}
        DALEX: explainers for complex predictive models, Przemyslaw Biecek, 2018.
    \end{thebibliography}
\end{document}