Automating cancer diagnosis using advanced deep learning techniques for multi-cancer image classification
TL;DR Summary
This paper automates multi-cancer diagnosis by evaluating ten CNN models on segmented medical images across seven cancer types. DenseNet121 proved most effective, achieving 99.94% validation accuracy, demonstrating deep learning's high potential for efficient and accurate multi-c
Abstract
Automating cancer diagnosis using advanced deep learning techniques for multi-cancer image classification Yogesh Kumar 1 , Supriya Shrivastav 2 , Kinny Garg 5 , Nandini Modi 1 , Katarzyna Wiltos 3 , Marcin Woźniak 3 & Muhammad Fazal Ijaz 4 Cancer detection poses a significant challenge for researchers and clinical experts due to its status as the leading cause of global mortality. Early detection is crucial, but traditional cancer detection methods often rely on invasive procedures and time-consuming analyses, creating a demand for more efficient and accurate solutions. This paper addresses these challenges by utilizing automated cancer detection through AI-based techniques, specifically focusing on deep learning models. Convolutional Neural Networks (CNNs), including DenseNet121, DenseNet201, Xception, InceptionV3, MobileNetV2, NASNetLarge, NASNetMobile, InceptionResNetV2, VGG19, and ResNet152V2, are evaluated on image datasets for seven types of cancer: brain, oral, breast, kidney, Acute Lymphocytic Leukemia, lung an
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
- Title: Automating cancer diagnosis using advanced deep learning techniques for multi-cancer image classification
- Authors: Yogesh Kumar, Supriya Shrivastav, Kinny Garg, Nandini Modi, Katarzyna Wiltos, Marcin Wozniak, and Muhammad Fazal Ijaz. The authors are affiliated with various institutions, including universities in India and Poland, indicating a collaborative international research effort.
- Journal/Conference: The paper was published in a journal, as indicated by the "Received," "Accepted," and "Published online" dates (23 October 2024). The publisher is noted as Springer Nature, a reputable publisher of scientific journals. The specific journal name is not provided in the text, but the context suggests a peer-reviewed publication in the field of computational medicine or engineering.
- Publication Year: 2024
- Abstract: The paper tackles the challenge of cancer detection, a leading cause of global mortality. It proposes an automated multi-cancer diagnosis system using deep learning to overcome the limitations of traditional invasive and time-consuming methods. The authors evaluate ten different Convolutional Neural Network (CNN) models on image datasets for seven types of cancer: brain, oral, breast, kidney, Acute Lymphocytic Leukemia (ALL), lung and colon, and cervical cancer. The methodology involves image segmentation and contour feature extraction. The models are evaluated on metrics like accuracy, loss, and Root Mean Square Error (RMSE). The key finding is that the
DenseNet121model performed the best, achieving a validation accuracy of 99.94%, a loss of 0.0017, and low RMSE values, demonstrating the high potential of AI for accurate cancer detection. - Original Source Link: The paper is available at
/files/papers/68ebccbf00e47ee3518bc8a5/paper.pdf. It is an Open Access article published under a Creative Commons license.
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: Traditional cancer diagnosis methods are often invasive, slow, and prone to human error, particularly when dealing with multiple cancer types simultaneously. Early and accurate detection is critical for improving patient outcomes, creating a pressing need for more efficient and reliable diagnostic tools.
- Importance & Gaps: Cancer remains a primary cause of death worldwide. Manual interpretation of medical images (like histopathology slides) is subjective and cannot easily scale. Previous AI-based studies often focused on a single cancer type or used smaller, less diverse datasets, limiting their real-world applicability (generalizability). This paper addresses the gap by developing a framework for multi-cancer classification using a variety of publicly available datasets.
- Innovation: The paper's novelty lies in its comprehensive comparative analysis of ten well-established deep learning models on a combined dataset of seven different cancers. It also employs a detailed image preprocessing pipeline, including advanced segmentation and feature extraction techniques, to enhance the models' ability to identify cancerous regions accurately.
-
Main Contributions / Findings (What):
- Primary Contributions:
- Development and evaluation of a deep learning framework using 10 different transfer learning models for classifying seven types of cancer from histopathology and medical images.
- Implementation of an advanced image preprocessing pipeline, including grayscale conversion, Otsu binarization, noise removal, watershed transformation for segmentation, and contour feature extraction.
- A rigorous comparative analysis of the models based on multiple performance metrics (accuracy, loss, RMSE, precision, recall, F1-score).
- Key Conclusions:
- The study demonstrates that deep learning models can achieve exceptionally high accuracy in multi-cancer classification.
- The
DenseNet121model emerged as the most effective classifier, achieving the highest validation accuracy (99.94%) and the lowest validation loss (0.0017). - The results confirm that AI-based techniques, particularly deep learning, are powerful tools that can support clinicians by providing early, accurate, and automated cancer diagnoses, potentially reducing mortality rates.
- Primary Contributions:
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Histopathology Images: These are microscopic images of tissue that have been stained to highlight different cellular structures. Pathologists examine these images to diagnose diseases like cancer. In this paper, they are the primary input data for the AI models.
- Deep Learning (DL): A subfield of machine learning based on artificial neural networks with many layers (hence "deep"). DL models can automatically learn complex patterns and features from large amounts of data, like images, without being explicitly programmed.
- Convolutional Neural Network (CNN): A specialized type of deep learning network designed for processing grid-like data, such as images. CNNs use special layers called convolutional layers to automatically and adaptively learn spatial hierarchies of features, from simple edges and textures to complex objects. They are the standard for image classification tasks.
- Transfer Learning: A technique where a model developed for one task is reused as the starting point for a model on a second task. In this paper, the authors use models (like
VGG19,DenseNet121) that were pre-trained on a massive image dataset (ImageNet) and fine-tune them on the specific cancer datasets. This saves significant training time and often improves performance, especially with limited medical data.
-
Previous Works: The paper reviews several prior studies that used deep learning for diagnosing specific cancers. The "Background" section and
Table 1summarize this landscape, highlighting the progress and remaining gaps.-
Acute Myelogenous Leukemia (ALL): Anilkumar et al. (2022) used models like
AlexNetand achieved 94.12% accuracy, but on a small dataset. -
Brain Tumors: Saedi et al. (2023) used a 2D CNN on MRI images, reaching 96.47% accuracy, but again with a limited dataset. Mohsen et al. (2018) also worked on brain tumors with a CNN, achieving 96.97% accuracy on an even smaller set of 66 images.
-
Lung and Colon Cancer: Kwon et al. (2023) used a
DNNon liquid biopsies, showing high AUC but on a small patient cohort. Hadiyoso et al. (2023) usedVGG16on a large histopathological dataset (25,000 images), achieving a high accuracy of 98.96%. -
Breast Cancer: Abunasser et al. (2023) proposed a
BCCNNmodel on theBreakHisdataset, reaching 98.28% accuracy but noted it was time-consuming. -
Cervical Cancer: Kalbhor and Shinde (2023) used
ResNetandGoogleNeton theHerlevdataset, achieving up to 96.01% accuracy but noted a lack of generalization.Below is a transcription of the comparative analysis from the paper's
Table 1.
*
Table 1. Comparative analysis.** (Manual Transcription)
Author's Name Cancer Dataset Techniques Outcomes Limitations Anilkumar et al. (2022)10 ALL Images collected fromAmerican Society ofHaematology (ASH) AlexNet, LuekNet Accuracy=94.12% Small dataset Saedi et al. (2023)13 Brain 3264 Magnetic ResonanceImaging (MRI) brain images 2D CNN Accuracy = 96.47%AUCROC=0.99 Limited dataset Autoencoder Accuracy=95.63%UCROC=1 Kwon et al. (2023)25 Lung and colon 95 lung cancers96 colorectal cancer DNN CNN Accuracy=0.92 The parameterso the modelneed to finetuned Bansal et al. (2022)28 Oral Real time Images are oflow resolutionwhich hinderedthe performanceof he odels Histopathological dataset Accuracy=92.41%Los=0.03RMSE=0.09 Raza et al. (2023)17 BUSI dataset DeepBraestCancerNet DL model Accuracy=99.63%F1 Score =99.50Recall = 100%Precision=99.50% More imagesshould be taken Uhm et al. (2021)24 Kidney TCIA dataset Nawaz et al. (2018)18 Anisotropicdiffusion methoddoes not addressall types of noiseiin images 4. Methodology (Core Technology & Implementation) -
Principles: The core idea is to automate the classification of seven different types of cancer from medical images using deep learning. The methodology is built on the principle that pre-trained CNNs (
transfer learning) can effectively learn the distinguishing visual features of different cancer cells after a systematic preprocessing and segmentation pipeline enhances the relevant parts of the images. -
Steps & Procedures: The overall workflow, as illustrated in
Fig. 1, consists of several key stages:
该图像是一个示意图,展示了用于多癌症分类的系统设计流程。包括多癌症数据集的预处理、图像分割(含多步骤细节)、特征提取(如面积、周长、像素强度等)、多种深度学习分类器模型的应用,以及基于准确率、损失、RMSE等指标的评估过程,最终实现癌症类型预测。- Data Collection: Images for seven cancer types are collected from various public datasets.
- Data Pre-processing and Segmentation: Raw images are processed to isolate and highlight the regions of interest (cancerous cells).
- Feature Extraction: Geometric and textural features are computed from the segmented regions.
- Classification: Ten different CNN models are trained and evaluated on the processed images.
- Evaluation: The models' performance is measured using various metrics to identify the best-performing architecture.
-
Datasets: The study utilizes seven publicly available datasets for different cancer types.
Table 2provides a breakdown of the image distribution for training, validation, and testing (a 70%/15%/15% split).-
Acute Lymphocytic Leukemia (ALL): 3,256 images of peripheral blood smears from Taleqani Hospital, Iran.
-
Brain Tumor: 3,064 T1-weighted MRI images from 233 patients, covering glioma, meningioma, and pituitary tumors.
-
Lung and Colon Cancer: 25,000 histopathological images, generated and augmented from 750 original images of lung and colon tissue.
-
Kidney Tumor: 12,446 DICOM images from PACS in Dhaka, Bangladesh, representing cysts, normal cases, stones, and tumors.
-
Cervical Cancer: 4,049 images of isolated cells from the SIPaKMeD Database.
-
Oral Cancer: 5,192 histopathological images, including normal and Oral Squamous Cell Carcinoma (OSCC) cases.
-
Breast Cancer: 7,909 histopathology images from 82 patients, including benign and malignant cases.
*
Table 2. Dataset distribution of various types of cancer. (Manual Transcription)
Cancer Type Training Set Validation Set Test Set Total Images Lung and Colon Cancer 17,500 3750 3750 25,000 Breast Cancer 5536 1186 1186 7909 Brain Cancer 2144 459 459 3064 Kidney Cancer 8712 1866 1866 12,446 Cervical Cancer 2834 607 607 4049 Acute Lymphocytic Leukemia 2280 488 488 3256 Oral Cancer 4946 120 126 5192 -
-
Data Pre-processing and Segmentation: This crucial pipeline, visualized in
Fig. 2, prepares the images for the deep learning models.
该图像是一个示意图,展示了七种不同癌症(急性淋巴细胞白血病、脑癌、乳腺癌、宫颈癌、肾癌、肺癌与结肠癌、口腔癌)图像的分割处理流程。每类癌症分别呈现了六个步骤的图像处理结果,包括原始图像、灰度图、Otsu二值化、噪声去除、距离变换和分水岭变换,直观地反映了多步骤的图像分割技术在癌症检测中的应用过程。-
Grayscale Conversion: The RGB (color) images are converted to single-channel grayscale images to reduce computational complexity while retaining essential luminance information.
- : Intensity of the pixel in the grayscale image.
- : Intensities of the red, green, and blue channels of the original pixel.
-
Otsu Binarization: This is an automatic thresholding method that converts a grayscale image into a binary (black and white) image, separating the foreground (e.g., cells) from the background. It works by finding a threshold that maximizes the variance between the two classes of pixels.
- : The between-class variance for a given threshold .
- : Probabilities of a pixel belonging to the background and foreground, respectively.
- : Mean intensity values of the background and foreground pixels.
-
Noise Removal: A Gaussian filter is applied to smooth the image and remove random variations in intensity (noise).
- : The new intensity value of the pixel at (x, y).
- : Intensity of a neighboring pixel.
- : Standard deviation of the Gaussian function, controlling the amount of smoothing.
- The double summation represents a convolution operation over a neighborhood of size .
-
Distance Transform: This technique calculates, for each pixel, the distance to the nearest non-zero pixel. It helps in separating touching objects. The paper provides a formula for this step, though it appears to be non-standard for a typical distance transform and may represent a different calculation.
- Critique: This formula seems incorrect for a standard Euclidean distance transform. It calculates the L2-norm of pixel intensities in a local neighborhood, not the distance to a specific feature. A true distance transform calculates each pixel's distance to the closest "object" pixel (e.g., a white pixel in a binary image).
-
Watershed Transformation: This is an advanced segmentation algorithm often used to separate touching or overlapping objects in an image, such as cells in a cluster. It treats the image like a topographic map, where pixel intensities are elevations. "Flooding" the map from its local minima (markers) allows the algorithm to find the boundaries (watershed lines) that separate different catchment basins (objects).
-
-
Feature Extraction: After segmentation, contour features are extracted to quantify the shape and properties of the detected regions. These features, summarized in
Table 3, includeArea,Perimeter,Width,Height,Aspect Ratio, and pixel intensity values (Min/Max/Mean Color). These quantitative descriptors can help the model differentiate between the morphologies of various cancer types.*
Table 3. Contour characteristics of images of various classes of cancer dataset. (Manual Transcription)
Parameters ALL Kidney Cervical Brain Breast Oral Lung and Colon Area 2.0 2.0 4.0 6.5 0.0 2.0 0.5 Perimeter 6.0 6.82 7.65 13.07 10.0 5.65 3.414 Epsilon 0.60 0.682 0.765 1.307 1.0 0.565 0.3414 Width 3 2 3 6 1 3 2 Height 2 4 4 4 6 3 2 Aspect Ratio 1.5 0.5 0.75 1.5 0.16 1.0 1 Extent 0.33 0.25 0.33 0.27 0.0 0.22 0.125 Diameter 1.59 1.59 2.25 2.87 0.0 1.59 0.79 Min Value 128.0 128.0 127.0 128.0 128.0 98.0 128.0 Max Value 130.0 166.0 136.0 137.0 135.0 184.0 134.0 Min Value Loc (12,510) (262,480) (420,502) (316,317) (311,506) (90,437) (105,511) Max Value Loc (11,510) (261,478) (421,502) (317,317) (311,511) (91,437) (106,511) Mean Color (129.16) (148.5) (130.625) (132.0) (132.0) (137.4) (131.0) Extreme leftmost point (11,510) (261,477) (419,502) (315,316) (311,506) (89,437) (105,510) Extreme rightmost (13,511) (262,480) (421,502) (320,319) (311,506) (91,437) (106,511) Extreme topmost (11,510) (261,477) (420,502) (315,316) (311,506) (90,436) (105,510) Extreme bottommost (11,511) (262,480) (420,504) (318,319) (311,511) (90,438) (105,511) -
Applied Classifiers: Ten different pre-trained CNN architectures were used.
Table 4in the paper details their layered structure and parameter counts. The hyperparameters (like learning rate) were optimized using techniques like grid search and early stopping to prevent overfitting.*
Table 4. Layered architecture of applied classifiers.** (Manual Transcription)
Models Layer Output Shape Param# DenseNet121 Densenet121 (None, 1,1,1024) 7,037,504 Flatten_1 (None, 1024) 0 Dense_1 (None, 60) 61,500 Dense_2 (None, 10) 610 Total parameters 7,099,614 Trainable parameters 7,015,966 Non-trainable parameters 83,648 DenseNet201 Densenet201 (None, 1, 1, 1920) 18,321,984 Flatten_2 (None, 1920) 0 Dense_3 (None, 60) 115,260 Dense_4 (None, 10) 610 Total parameters 18,437,854 Trainable parameters 18,208,798 Non-trainable parameters 229,056 InceptionResNetV2 InceptionResNetV2 (None, 8, 8, 1536) 54,336,736 Flatten_3 (None, 98304) 0 Dense_5 (None, 60) 5,898,300 Dense_6 (None, 10) 610 Total parameters 60,235,646 Trainable parameters 60,175,102 Non-trainable parameters 60,544 InceptionV3 InceptionV3 (None, 8, 8, 2048) 21,802,784 Flatten_4 Dense_13 (None, 60) 3,104,700 Dense_14 (None, 10) 610 Total parameters 7,375,026 Trainable parameters 7,338,288 Non-trainable parameters 36,738 0 Dense_16 (None, 60) Dense_18 (None, 60) 1,505,340
-
Similar papers
Recommended via semantic vector search.