dataset section
This commit is contained in:
22
report.tex
22
report.tex
@@ -143,12 +143,28 @@
|
||||
|
||||
The authors of article~\cite{heterogeneity} chose a different approach by applying machine learning algorithms from the specialized software CellProfiler~\cite{cellprofile} to extract quantitative image features. They subsequently used bioinformatics analysis to explore the relationship between these features of intra-tumor heterogeneity (ITH) and drug resistance. Notably, the authors did not aim to train new models but instead utilized pre-trained algorithms from CellProfiler. Unlike studies \cite{paclitaxel}, \cite{mitochondria}, and \cite{sers}, where algorithms were employed for regression and classification tasks, this research focused specifically on extracting quantitative features from images. Based on CellProfiler, the authors constructed a pipeline for the extraction and analysis of these features, which enabled them to draw conclusions regarding the connection between these features and drug resistance in cancer cells.
|
||||
|
||||
\section{Feature analysis}
|
||||
|
||||
\section{Datasets}
|
||||
Data plays a crucial role in machine learning, serving as the foundation for model training and evaluation. The quality and quantity of data directly influence the performance and generalizability of machine learning algorithms. In the fields of biology and medicine, data collection is often costly and time-consuming. Additionally, the complexity and variability inherent in biological systems further complicate data acquisition and interpretation. In cancer research, these challenges are even more pronounced due to the heterogeneity of tumors and the intricate nature of cancer biology. However, there are valuable resources available, such as the Gene Expression Omnibus (GEO) database~\cite{geo} and The Cancer Genome Atlas (TCGA) database~\cite{tcga}, which provide researchers with access to extensive datasets. Moreover, nonprofit organizations like the American Type Culture Collection (ATCC)~\cite{atcc} enable researchers to obtain biological materials, including cancer cells.
|
||||
|
||||
Authors of~\cite{paclitaxel} prepared their dataset specifically for their research. Four kinds of epithelial ovarian cancer cells with different drug sensitivity (SKOV3, SKOV3\_Ta\_2\textmu M, SKOV3\_Ta\_8\textmu M, and SKOV3\_Ta\_20\textmu M) were studied in this work. The SKOV3 cells were sourced from the ATCC~\cite{atcc} and preserved at the Obstetrics and Gynecology Laboratory of Peking University People’s Hospital. The drug-resistant characteristics of SKOV3\_Ta\_2\textmu M, SKOV3\_Ta\_8\textmu M, and SKOV3\_Ta\_20\textmu M were acquired by progressively exposing SKOV3 cells to varying concentrations of paclitaxel. After approximately ten months, all the drug-resistant cancer cells were acquired. They then utilized Digital Holographic Flow Cytometry (DHFC), an advanced technology for label-free, high-throughput cell detection. Using DHFC along with additional post-processing, the authors generated a dataset comprising approximately 3000 a quantitative phase images (QPIs) of EOC cells, each sized at 300 by 300 pixels. Fig.~\ref{fig:skov3} presents the reconstructed QPIs of EOC cells with various degrees of drug resistance.
|
||||
|
||||
\section{Results}
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{img/skov3.png}
|
||||
\caption{Reconstructed QPIs of EOC cells used by authors of~\cite{paclitaxel}.}
|
||||
\label{fig:skov3}
|
||||
\end{figure}
|
||||
|
||||
In article~\cite{sers}, same as in~\cite{paclitaxel}, authors choosed the approach of collecting their own dataset. Their dataset was based on clinical plasma samples from 60 healthy volunteers which were used as a control group, and 60 nasopharyngeal cancer patients (30 plasma samples from radiotherapy sensitivity patients and 30 plasma samples from radiotherapy resistance patients). All plasma samples were
|
||||
obtained from Fujian Provincial Cancer Hospital. As well as in~\cite{paclitaxel}, authors used unique method called surface enhanced Raman spectroscopy (SERS) to extract molecular profiles of patients plasma. Authors even claim that SERS based on
|
||||
surface plasmon resonance was used for this task for the first time. The SERS spectra were processed by deducting the fluorescence background signal using a fifth-order polynomial fitting method, and then the SERS signals were peak normalized, after which the spectra of the same plasma sample were averaged to represent the final SERS data for that sample.э
|
||||
|
||||
Authors of articles~\cite{heterogeneity} and~\cite{mitochondria} turned to open databases to prepare datasets for their research. Authors of~\cite{heterogeneity} downloaded frozen histopathologic images of 494 ovarian and 70 paracarcinoma tissues with hematoxylin–eosin (HE) staining from TCGA~\cite{tcga}. The corresponding clinical information, genomics, and transcriptomics profiles required for this study were also obtained from this database. Authors of~\cite{mitochondria} also used TCGA. They downloaded information on 183 esophageal cancer patients (95 squamous cell carcinomas and 88 adenocarcinomas) was obtained, including mRNA expression profiles, clinical features such as survival time and status, age, gender, and pathological stage (T, N, and M). Additionally authors used Gene Expression Omnibus (GEO) database~\cite{geo}. RNA sequencing (RNA-seq) for GSE45670 was downloaded from it. GSE45670 includes a total of 17 esophageal squamous cell carcinomas (ESCC) that did not respond to preoperative CRT, 11 ESCC that responded to preoperative CRT, and 10 samples from normal esophageal epithelium. The GEO dataset GSE53625 comprises 358 samples, including 179 ESCC tissue samples and an equal number of samples of adjacent normal tissues, along with detailed clinical data for the 179 ESCC patients. The GEO dataset GSE19417 contains data from 76 esophageal adenocarcinoma patients, offering detailed clinical data for 48 of these patients
|
||||
|
||||
% \section{Feature analysis}
|
||||
|
||||
% \section{Results}
|
||||
|
||||
|
||||
|
||||
@@ -223,6 +239,8 @@
|
||||
The Cancer Genome Atlas (TCGA) database. Available at \url{https://www.cancer.gov/ccg/research/genome-sequencing/tcga}. Accessed October 8, 2024.
|
||||
\bibitem{geo}
|
||||
Gene Expression Omnibus (GEO) database. Available at \url{https://www.ncbi.nlm.nih.gov/geo/}. Accessed October 8, 2024.
|
||||
\bibitem{atcc}
|
||||
American Type Culture Collection (ATCC). Available at \url{https://www.atcc.org/}. Accessed October 8, 2024.
|
||||
\bibitem{r-lang}
|
||||
The R Project for Statistical Computing. Available at \url{https://www.r-project.org/}. Accessed October 8, 2024.
|
||||
\bibitem{dalex}
|
||||
|
||||
Reference in New Issue
Block a user