Comparative analysis of deep learning architectures for performance of image classification in pineal region tumors

Thara Tunthanathip; Anukoon Kaewborisutsakul; Suchada Supbumrung

doi:10.21037/jmai-24-120

Original Article

Comparative analysis of deep learning architectures for performance of image classification in pineal region tumors

Thara Tunthanathip , Anukoon Kaewborisutsakul, Suchada Supbumrung

Division of Neurosurgery, Department of Surgery, Faculty of Medicine, Prince of Songkla University, Songkhla, Thailand

Contributions: (I) Conception and design: All authors; (II) Administrative support: T Tunthanathip; (III) Provision of study materials or patients: T Tunthanathip, S Supbumrung; (IV) Collection and assembly of data: T Tunthanathip, S Supbumrung; (V) Data analysis and interpretation: T Tunthanathip, A Kaewborisutsakul; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Thara Tunthanathip, MD, PhD. Division of Neurosurgery, Department of Surgery, Faculty of Medicine, Prince of Songkla University, Kanchanavanit Road, Songkhla 90110, Thailand. Email: tsus4@hotmail.com.

Background: Differentiation between germinoma and non-germinoma based on their magnetic resonance imaging (MRI) images is a challenge, particularly those concerning the pineal region. Because pineal region surgery is complicated and postoperative complications are a concern, preoperative diagnosis is useful for resection planning and postoperative management. The present study aimed to evaluate the diagnostic performance of several deep learning (DL) architectures in distinguishing between pure and non-pure germinoma in the pineal region.

Methods: This study collected the multi-modal MRI scans of 52 patients with pathologically diagnosed pure germinoma and non-pure germinoma. Following image pooling, preoperative MRI scans were divided into a training dataset (N=3,200) and a validation dataset (N=800) for model development. Therefore, the model was validated with the unseen images (N=1,000) and area under the receiver operating characteristic curve (AUC); other diagnostic performances were calculated to assess the discrimination performance.

Results: The typical convolutional neural network (CNN) architecture had an AUC of 0.96 [95% confidence interval (CI): 0.94–0.97] when the model was validated by unseen images. The AUC of the LeNet architecture was 0.95 (95% CI: 0.94–0.97), while the Visual Geometry Group architecture had an AUC of 0.96 (95% CI: 0.94–0.97). Moreover, Densely Connected Convolutional Network, Residual Neural Network, and Vision Transformer architectures had AUC of 0.87 (95% CI: 0.85–0.88), 0.54 (95% CI: 0.50–0.56), and 0.80 (95% CI: 0.78–0.83), respectively.

Conclusions: Using typical CNN, VGG, and LeNet architectures for image classification is a viable way to distinguish between pure and non-pure germinoma. In the future, transfer learning will be a beneficial approach to enhancing performance when additional images of pineal tumors are collected.

Keywords: Image classification; deep learning (DL); convolutional neural network (CNN); Vision Transformer (ViT); germinoma

Received: 25 April 2024; Accepted: 19 September 2024; Published online: 18 November 2024.

doi: 10.21037/jmai-24-120

Highlight box

Key findings

• The typical convolutional neural network architectures provide useful diagnostic information to help doctors distinguish between pure and non-pure germinoma.

What is known and what is new?

• The surgical procedures in the pineal area are intricate and have been associated with documented complications.

• Using artificial intelligence to distinguish between germinoma and other malignancies in the pineal region tumor can assist neurosurgeons in preparing surgical techniques.

What is the implication, and what should change now?

• Deep learning models can improve preoperative tumor resection planning.

Introduction

Pineal area tumors are uncommon, accounting for about 0.4% to 1% of all intracranial malignancies, and tumors in this region account for 1% to 4% of adult brain tumors and 10% of pediatric brain tumors (1,2). The pineal region is characterized by a wide variety of histoanatomical structures, which results in a variety of neoplasms developing in this area, such as germ cell tumors, pineal parenchymal tumors, gliomas, miscellaneous tumors, and cysts (3).

The management of pineal region tumors for histological diagnosis still requires neurosurgical procedures due to the diverse variety of histological cancer subtypes that have been found in this area. In addition, some types of these tumors can be treated with radiation and/or chemotherapy (4). However, the surgical approach in the pineal region remains complicated, with related mortality remaining at around 2% and morbidity reaching nearly 20% in some series (5). Bruce et al. studied postoperative complications of microsurgery in 160 patients with various pineal tumors and found postoperative extraocular muscle dysfunction in 16%, postoperative hemorrhage in 6%, and altered mental status in 5% (6). Konovalov et al. reported postoperative complications of various microsurgical approaches in pineal tumors with extraocular muscle dysfunction, postoperative hemorrhage, and hemianopsia at 31%,11%, and 4%, respectively (7). Moreover, endoscopic and stereotactic intraventricular biopsies are currently the two main minimally invasive methods available for tissue biopsy in the pineal region. From the literature review, previous studies have reported postoperative complications with a new neurological deficit of about 2.9–8.2%, postoperative hemorrhage of between 2.9–19%, and perioperative mortality of 0% (8,9), whereas morbidity, postoperative hemorrhage, and mortality following stereotactic biopsy were found in 0.8–8%, 4.4–5.7%, and 1.3–1.9%, respectively (10,11).

In recent years, several techniques have been utilized to distinguish germinoma from other malignancies, such as advanced magnetic resonance imaging (MRI) scans, texture analysis, and machine learning (ML). Chen et al. (12) used texture analysis from MRI scans with ML for differentiation between germinoma and craniopharyngioma in the suprasellar region and reported that the area under the receiver operating characteristic curve (AUC) was 0.91, while Ye et al. extracted radiomics features with ML model training from multi-modal MR images and the highest AUC of 0.88 [95% confidence interval (CI): 0.81–0.96] (13).

The introduction of deep learning (DL) marks the start of a paradigm shift in image classification. This approach automatically incorporates the feature extraction and model development processes (14-17). Hence, the use of DL for image classification has been studied to distinguish various types of brain tumors from the literature review. Primary central nervous system lymphoma and glioblastoma were categorized using the DL model by McAvoy et al. and the AUC of classification ranged from 0.94–0.95 (14). Tariciotti et al. used the DL for image classification among primary central nervous system lymphoma, glioblastoma, and brain metastasis and found that the AUC of classification among those with tumors was 0.81–0.98 (15).

Currently, several DL architectures have been investigated for brain tumor classification. Özkaraca et al. used typical convolutional neural network (CNN), Densely Connected Convolutional Network (DenseNet), and Visual Geometry Group (VGG)-16 architectures to classify types of brain tumors (glioma, meningioma, pituitary adenoma, and no tumor) from brain MRI images. The sensitivities of the typical CNN architecture for glioma, no tumor, meningioma, and pituitary adenoma were 0.90, 0.98, 0.85, and 0.97, respectively. The VGG-16 architecture’s sensitivity for glioma, no tumor, meningioma, and pituitary adenoma was 0.89, 0.96, 0.67, and 0.94, respectively. In contrast, the DenseNet architecture demonstrated sensitivities of 0.97, 0.98, 0.91, and 0.99 for the same four conditions (16). For the LeNet architecture, Aluri et al. classified brain tumors using the LeNet model; the model’s sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were, in that order, 0.880, 0.888, 0.821, 0.907, and 0.896 (17). The Residual Neural Network (ResNet) model was utilized by Oladimeji et al. to classify brain cancers with a sensitivity of 0.990, PPV of 0.987, accuracy of 0.994, and AUC of 0.992 (18).

A review of the literature reveals a lack of evidence from studies using DL to distinguish between germinoma and non-germinoma in the pineal area. Furthermore, neurosurgical approaches for obtaining tissue specimens in the pineal region have been challenged. Therefore, image classification using artificial intelligence (AI) could impact neurosurgeons’ decisions about the extent of resection and treatment techniques. The present study aimed to evaluate the diagnostic performance of several DL architectures in distinguishing between pure and non-pure germinoma in the pineal region. We present this article in accordance with the STROBE reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-120/rc).

Methods

Study design and study population

This is a historical cohort study of patients with pineal tumors who had confirmed diagnoses from tissue specimens by a pathologist. In detail, the study population was separated into two groups: the first cohort was gathered between January 2010 and December 2021 from a previous study by Supbumrung et al. (19). Furthermore, 14 individuals were diagnosed with pineal area tumors between January 2022 and December 2023 will have their MRI scans pooled. Therefore, patients were excluded as follows: (I) patients with unavailable T1-weighted (T1), T1-weighted gadolinium-enhanced (T1-Gd), T2-weighted (T2), fluid-attenuated inversion recovery (FLAIR) MRI scans; (II) patients with inadequate MRI scans or scans with movement artifacts. Tumor classification was divided into two groups: pure germinoma and non-pure germinoma. In addition, non-pure germinoma was characterized as a mixture of germinoma with other malignancies. Following these criteria, 50 patients were selected. Descriptive statistics were used to calculate clinical features and imaging findings. Mean and standard deviation (SD) were used for continuous variables, while categorical data were described using percentages. Statistical analyses were performed by the R version 4.4.0 software (R Foundation, Vienna, Austria).

Data gathering and preprocessing

The workflow of the present study is shown in Figure 1. Initially, T1, T1-Gd, T2, and FLAIR MRI scans of pineal tumors were collected in axial, coronal, and sagittal planes. As a result, a total of 5,000 photos were gathered for the dataset, of which 2,500 MRI scans were of pure germinoma and the remaining 2,500 images were non-pure germinoma.

Figure 1 Workflow of image classification in pineal region tumors. AUC, area under receiver operating characteristic curve; CNN, convolutional neural network; FLAIR, fluid-attenuated inversion recovery; DenseNet, Densely Connected Convolutional Network; ResNet, Residual Neural Network.

Twenty percent of the total MRI scans were randomly divided into 1,000 unseen images, which made up the test dataset. This was done to verify the training models’ classification performances. The remaining 4,000 images were used for the training classification model. In the training processes, the rest of the images were randomly split into training and validation datasets in an 80/20 ratio.

Because various architectures are specifically created and optimized for this dimension, the input image were resized to 244×244 pixels to guarantee that they were of a comparable and compatible dimension for the model. In addition, images were also normalized to contain pixel values between 0 and 1.

Model architectures

Typical CNN architecture

A typical CNN model was constructed using the Keras application library (TensorFlow software). The model started with an input layer that takes input images of shape [224, 224, 3]. For the convolutional layers, the input layer was followed by a convolutional layer with 32 filters of size [3, 3] and a Rectified Linear Unit (ReLU) activation function. This layer extracted features from the input images. Next, a max-pooling layer with a pool size of [2, 2] was applied to reduce the spatial dimensions of the feature maps. Therefore, another convolutional layer with 64 filters of size [3, 3] and a ReLU activation function further extracted higher-level features from the input. For the flattening layer, the output from the last convolutional layer was flattened into a one-dimensional vector to be fed into the fully connected layers. Hence, the flattened output was then passed through a fully connected dense layer with 64 units and a ReLU activation function. Finally, an output layer with a softmax activation function was added. This layer produced the final probabilities for each class (Non_germinoma and Germinoma_pure).

DenseNet architecture

The DenseNet-121 base model was utilized without the top layer (fully connected layer). Following the DenseNet feature extractor, a Global Average Pooling 2D layer was employed. This layer reduced the spatial dimensions of the output from the DenseNet-121 convolutional base, summarizing each feature map to a single number and thus minimizing the model parameters and computational complexity. For the dense layer, a dense layer with 64 neurons and ReLU activation provided the ability to learn non-linear relationships in the features extracted by DenseNet-121. Therefore, the output layer was a dense layer equipped with a softmax activation function, tailored to the number of classes in the dataset.

LeNet architecture

Images were inputted with the same shape as the training data, ensuring consistency in feature extraction. The LeNet-5 started with a convolutional layer with six filters of size 5×5, followed by a ReLU activation function. Therefore, the max-pooling layers with a 2×2 pooling size was applied after each convolutional layer, reducing spatial dimensions and extracting dominant features. The output from the last convolutional layer was flattened into a one-dimensional vector to be passed to fully connected layers. Therefore, two dense layers with 120 and 84 neurons, respectively, and ReLU activation functions, were employed to learn complex patterns in the flattened features. Finally, the final dense layer with softmax activation produced class probabilities.

ResNet architecture

The ResNet-50 architecture, pre-trained on ImageNet, was utilized as the backbone for feature extraction. It was composed of numerous residual blocks that allowed for effective propagation of gradients and alleviated the vanishing gradient problem. Images were resized to 224×224 pixels with RGB color channels to match the input shape expected by the model. ResNet50 was used to extract features, which were then combined using global average pooling to reduce spatial dimensions while maintaining crucial information. Therefore, a dense layer with 64 neurons and ReLU activation was added to capture intricate patterns in the aggregated features. Finally, the final dense layer with softmax activation produced class probabilities.

VGG architecture

The VGG-16 architecture was loaded from the Keras applications library with weights pre-trained on the ImageNet dataset. Additional layers were added on top of the pre-trained VGG-16 base model, including a flattened layer, a fully connected dense layer with 256 units and ReLU activation, and a dropout layer with a dropout rate of 0.5 to prevent overfitting. Therefore, the output layer was a dense layer with softmax activation, producing class probabilities.

Vision Transformer (ViT) architecture

Using the “vit_b16” function, the ViT model was instantiated with the following parameters: 224×224 image size, pre-trained weights enabled, and top layers excluded for customization. All layers in the base ViT model were frozen to retain pre-trained weights and facilitate transfer learning. Therefore, a classification head was added to the ViT basic model, consisting of a flattening layer followed by a dense layer using softmax activation to generate class predictions.

Model training

The model was compiled with the Adam optimizer, categorical cross-entropy loss function, and additional evaluation metrics including accuracy. The training was performed for 10 epochs with a batch size of 32 and a learning rate of 0.001. Therefore, cross-entropy loss and accuracy of the training and validation were plotted across epochs.

Statistical analysis

Therefore, the trained model was evaluated on the test dataset to generate predictions. Performance metrics including confusion matrix, sensitivity (recall), specificity, PPV, NPV, accuracy, F1-score, receiver operating characteristic (ROC) curve, and AUC were calculated. Additionally, 95% CI were computed for the performance metrics. In addition, we compared the model’s discrimination of four models using the AUC. Acceptable discrimination would correspond to an AUC of 0.7, whereas high and outstanding discrimination would correspond to an AUC of 0.8 and 0.9, respectively (20,21). The DL architectures were developed and validated using Python software with Keras version 2.15.0 (Python Software Foundation) via Google Colaboratory (Google).

Ethical considerations

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Human Research Ethics Committee of the Faculty of Medicine, Prince of Songkla University (REC.65-431-10-1). Because it was a retrospective analysis, the current study did not require patients’ informed consent. However, the identity numbers of patients were encoded before the training process.

Results

Characteristics of patients in the present cohort

The 52 patients with pineal tumors in the present study included 25 (48.1%) pure germinoma and 27 (51.9%) non-pure germinoma, as shown in Table 1. The mean ages of those with pure germinoma and non-pure germinoma groups were 18.16 and 18.00 years, respectively. Male predominance was observed in both groups. The pure germinoma group had a mean maximum tumor diameter of 3.13 centimeters (SD, 1.36), whereas the non-pure germinoma group had an average tumor size of 3.53 centimeters (SD, 1.50).

Table 1

Clinical characteristics of patients

Factor	Pure germinoma (N=25)	Non-pure germinoma (N=27)	P value
Gender			0.33
Male	22 (88.0)	21 (77.8)
Female	3 (12.0)	6 (22.2)
Age (years)	18.16±6.42	18.00±15.45	0.96
Age group			0.65
≤18 years	16 (64.0)	19 (70.4)
>18 years	9 (36.0)	8 (29.6)
Number of tumor			0.12
Single	18 (72.0)	24 (88.9)
Multiple	7 (28.0)	3 (11.1)
Suprasellar/pineal bifocal tumor	5 (20.0)	1 (3.7)	0.06
Size of tumor (cm)	3.13±1.36	3.53±1.50	0.32
Preoperative leptomeningeal dissemination	5 (20.0)	5 (18.5)	–
Preoperative hydrocephalus	16 (64.0)	20 (74.1)	0.43
Operation			0.08
Endoscopic approach	10 (40.0)	4 (14.8)
Microscopic approach	15 (60.0)	23 (85.2)
Histological diagnosis			–
Pure germinoma	25 (100.0)	–
Mature teratoma	–	4 (14.8)
Yolk sac tumor	–	3 (11.1)
Pineoblastoma	–	3 (11.1)
Anaplastic astrocytoma	–	3 (11.1)
Diffuse astrocytoma	–	2 (7.4)
Atypical meningioma	–	1 (3.7)
Immature teratoma	–	1 (3.7)
Mixed germinoma with mature teratoma	–	1 (3.7)
Mixed germinoma with immature teratoma, choriocarcinoma, and embryonal carcinoma	–	1 (3.7)
Mixed germinoma with choriocarcinoma	–	1 (3.7)
Mixed germinoma with yolk sac tumor	–	1 (3.7)
Mixed mature teratoma with embryonal carcinoma	–	1 (3.7)
Mixed mature teratoma with choriocarcinoma	–	1 (3.7)
Pineal parenchymal tumor of intermediate differentiation	–	1 (3.7)
Epidermoid cyst	–	1 (3.7)
Benign cyst	–	1 (3.7)
Metastatic clear cell tumor	–	1 (3.7)

Data are presented as mean ± SD or n (%). SD, standard deviation.

Suprasellar/pineal bifocal characteristic was observed in 11.5% of the present cohort, while 19.2% of total cases had preoperative leptomeningeal dissemination. Of those, 69.2% of cases presented hydrocephalus before tumor operation. In pure germinoma, microscopic surgery was performed in 60.0%, while 40.0% of pure germinoma underwent surgery by an endoscopic approach. In the non-pure germinoma group, microscopic surgery was a major operation in 85.2% of cases and mature teratoma was a common tumor in the present cohort.

DL model development and validation

The models were trained for a total of 10 epochs, and the accuracy rate and loss function were evaluated, as shown in Figure 2. In typical CNN, DenseNet, and LeNet architectures, train accuracy was higher than validation accuracy, while loss function values of validation decreased. The accuracy and loss function values of the remaining architectures were closely comparable in both the training and validation periods.

Figure 2 Training and validation accuracy and loss function plots among architectures. (A) Typical CNN; (B) DenseNet; (C) LeNet; (D) ResNet; (E) VGG architecture; (F) Vision Transformation architecture. CNN, convolutional neural network; DenseNet, Densely Connected Convolutional Network; ResNet, Residual Neural Network; VGG, Visual Geometry Group.

The performances of DL architectures for the diagnosis of pure germinoma were assessed using unseen MRI scans from the test dataset without image augmentation, as shown in Table 2 and Figure 3. In terms of results, the typical CNN architecture had sensitivity, specificity, PPV, NPV, accuracy, and F1-score values of 0.98, 0.94, 0.94, 0.97, 0.95, and 0.95, respectively. The DenseNet yielded the following results: sensitivity of 0.95, specificity of 0.78, PPV of 0.81, NPV of 0.94, accuracy of 0.87, and F1-score of 0.88. In LeNet architecture, results showed that the model exhibited sensitivity, specificity, PPV, NPV, accuracy, and F1-score values of 0.96, 0.95, 0.95, 0.96, 0.95, and 0.95, respectively. As a result, the sensitivity, specificity, PPV, NPV, accuracy, and F1 scores of the ResNet were 0.99, 0.08, 0.52, 0.90, 0.54, and 0.69, respectively. Moreover, the ViT architecture had sensitivity, specificity, PPV, NPV, accuracy, and F1-score values of 0.84, 0.77, 0.79, 0.83, 0.80, and 0.81, respectively. Compared to the training without image augmentation, Table 3 illustrates an insignificant decrease in the performance of DL architectures with image augmentation.

Table 2

Diagnostic performances without image augmentation for pure germinoma among various deep learning architectures

Architectures	Sensitivity (95% CI)	Specificity (95% CI)	PPV (95% CI)	NPV (95% CI)	Accuracy (95% CI)	F1-score (95% CI)	AUC (95% CI)
Test dataset
Typical CNN	0.98 (0.97–0.99)	0.94 (0.91–0.95)	0.94 (0.92–0.96)	0.97 (0.96–0.99)	0.95 (0.94–0.97)	0.95 (0.89–1.00)	0.96 (0.94–0.97)
DenseNet	0.95 (0.93–0.97)	0.78 (0.75–0.82)	0.81 (0.78–0.85)	0.94 (0.91–0.96)	0.87 (0.85–0.86)	0.88 (0.81–0.94)	0.87 (0.85–0.88)
LeNet	0.96 (0.94–0.98)	0.95 (0.93–0.97)	0.95 (0.93–0.97)	0.96 (0.94–0.98)	0.95 (0.94–0.97)	0.95 (0.89–0.97)	0.95 (0.94–0.97)
ResNet	0.99 (0.98–1.00)	0.08 (0.05–0.10)	0.52 (0.49–0.55)	0.90 (0.82–0.99)	0.54 (0.53–0.55)	0.69 (0.62–0.74)	0.54 (0.50–0.56)
VGG	0.97 (0.96–0.98)	0.95 (0.93–0.97)	0.95 (0.93–0.97)	0.97 (0.95–0.98)	0.96 (0.95–0.98)	0.96 (0.90–1.00)	0.96 (0.94–0.97)
Vision Transformer	0.84 (0.80–0.87)	0.77 (0.74–0.81)	0.79 (0.75–0.82)	0.83 (0.79–0.86)	0.80 (0.77–0.84)	0.81 (0.75–0.87)	0.80 (0.78–0.83)

AUC, area under the receiver operating characteristic curve; CI, confidence interval; CNN, convolutional neural network; DenseNet, Densely Connected Convolutional Network; PPV, positive predictive value; NPV, negative predictive value; ResNet, Residual Neural Network; VGG, Visual Geometry Group.

Figure 3 Confusion matrix among deep learning architectures. (A) Typical CNN; (B) DenseNet; (C) LeNet; (D) ResNet; (E) VGG architecture; (F) Vision Transformation architecture. CNN, convolutional neural network; DenseNet, Densely Connected Convolutional Network; ResNet, Residual Neural Network; VGG, Visual Geometry Group.

Table 3

Diagnostic performances with image augmentation for pure germinoma among various deep learning architectures

Architectures	Sensitivity (95% CI)	Specificity (95% CI)	PPV (95% CI)	NPV (95% CI)	Accuracy (95% CI)	F1-score (95% CI)	AUC (95% CI)
Test dataset
Typical CNN	0.97 (0.96–0.98)	0.93 (0.90–0.95)	0.93 (0.91–0.95)	0.96 (0.95–0.98)	0.94 (0.93–0.96)	0.94 (0.88–0.99)	0.95 (0.93–0.96)
DenseNet	0.95 (0.93–0.97)	0.78 (0.75–0.82)	0.81 (0.78–0.85)	0.94 (0.91–0.96)	0.87 (0.85–0.86)	0.88 (0.81–0.94)	0.87 (0.85–0.88)
LeNet	0.95 (0.93–0.98)	0.94 (0.92–0.96)	0.94 (0.92–0.97)	0.95 (0.93–0.97)	0.94 (0.92–0.96)	0.95 (0.89–0.97)	0.94 (0.93–0.96)
ResNet	0.98 (0.97–0.99)	0.08 (0.05–0.10)	0.52 (0.49–0.55)	0.89 (0.81–0.98)	0.54 (0.53–0.55)	0.68 (0.61–0.73)	0.54 (0.50–0.56)
VGG	0.97 (0.96–0.98)	0.95 (0.93–0.97)	0.95 (0.93–0.97)	0.97 (0.95–0.98)	0.96 (0.95–0.98)	0.96 (0.90–1.00)	0.96 (0.94–0.97)
Vision Transformer	0.84 (0.80–0.87)	0.77 (0.74–0.81)	0.79 (0.75–0.82)	0.83 (0.79–0.86)	0.80 (0.77–0.84)	0.81 (0.75–0.87)	0.80 (0.78–0.83)

AUC, area under the receiver operating characteristic curve; CI, confidence interval; CNN, convolutional neural network; DenseNet, Densely Connected Convolutional Network; PPV, positive predictive value; NPV, negative predictive value; ResNet, Residual Neural Network; VGG, Visual Geometry Group.

As shown in Figure 4, the AUC of the typical CNN and LeNet architectures had outstanding levels of discrimination at 0.96 (95% CI: 0.94–0.97) and 0.95 (95% CI: 0.94–0.97), while the DenseNet and ViT architectures had discrimination at high level.

Figure 4 ROC curves among architectures. (A) Typical CNN; (B) DenseNet; (C) LeNet; (D) ResNet; (E) VGG architecture; (F) Vision Transformation architecture. ROC, receiver operating characteristic; CNN, convolutional neural network; DenseNet, Densely Connected Convolutional Network; ResNet, Residual Neural Network; VGG, Visual Geometry Group.

Discussion

Because pineal tumors are rare, the use of ML or DL for image classification was mentioned as a limitation of the study. Ye et al. (13) studied the image classification of germinoma and non-germinoma in the pineal region by texture analysis with various ML algorithms, and the AUC was 0.88, while another study reported that the k-nearest neighbors algorithm with the histogram of oriented gradients had AUC of 0.845. Because the ML-based model that utilizes feature extraction necessitates the involvement of professionals who possess knowledge in selecting the region of interest. Hence, DL models may have issues with improving the level of differentiation among pineal region tumors. Various DL architectures incorporate the processes of automatic feature extraction and model development inside their structures (19,22).

Based on the AUCs in the test dataset, the typical CNN and LeNet architectures highly performed in the classification between pure germinoma and non-pure germinoma. These findings are consistent with those of previous research. Das et al. used a typical CNN model for classifying brain tumors in T1-weighted contrast-enhanced MRI images: accuracy of 0.943, average precision of 0.933, and an average recall of 0.930 using the CNN model (23). Özkaraca et al. employed CNN models such as typical CNN, DenseNet, and VGG-16 architectures to classify multiple types of brain malignancies, namely glioma, meningioma, pituitary adenoma, and the absence of a tumor (16). As a result, the present study’s CNN models demonstrated high levels of diagnostic performance, which is consistent with previous studies. In addition to the CNN model, ViT architecture has been studied to classify brain tumors. Asiri et al. trained various ViT models from 4,855 MRI scans and reported overall accuracy ranging from 0.903–0.982, while the discrimination performance of the ViT model in the present study was 0.800 (24). This may be explained by the fact that germinoma and non-germinoma contain similar features that make them challenging to categorize. Furthermore, prior studies have demonstrated that to train the ViT model, the ViT architecture requires a more extensive image dataset and higher memory demands compared with CNN (25,26). The number of trained images utilized to develop the ViT model may be limited in the present study, which could result in a decline in accuracy.

The typical CNN, VGG, and LeNet architectures in the present study demonstrated both high sensitivity and specificity. Therefore, the model was a challenge to apply as a screening tool in clinical practice (27-29). Because operations in the pineal region are complicated and postoperative complications have been a concern, preoperative diagnosis is useful for the management of pineal tumors. In pineal malignancies with positive tumor markers, a biopsy is not required (30,31). Choriocarcinoma is considered to have a high level of beta-human chorionic gonadotropin and high alpha-fetoprotein found in yolk sac tumors or mixed germ cell tumors. Therefore, a pineal tumor with negative tumor markers is a challenge to biopsy for histological diagnosis (32). Image classification based on AI both DL and ML may support a neurosurgeon with preoperative planning such as the extent of resection strategies or a re-operation decision in a patient who could not be diagnosed from a prior operation. From the literature review, Choi et al. proposed treatment strategies for pineal tumors. In detail, surgery after a trial of chemotherapy and/or radiotherapy is suggested for a pineal tumor with negative tumor markers and suspected germinoma based on MRI (31). Hence, AI-based diagnosis may assist physicians in proposing treatment options in the future.

To the authors’ knowledge, the present study was the first paper that used DL for image classification between germinoma and non-germinoma in the pineal region. However, the limitations of the present study should be noted. Multi-center studies should resolve any problems and provide additional images of the specific tumor location for training the DL model (33). In addition, the additional unseen images will boost the performance of a prior DL model through the transfer learning method or pre-train models (34). Moreover, data augmentation is one of the approaches for training performance enhancement. Jacaruso et al. presented data augmentation that increased the accuracy of electrocardiogram images from 82% to 88–90% compared to the accuracy of the baseline model (35). However, this technique did not improve diagnostic performance in the present study. This can be explained by the notion that flipping, rotating, and zooming images did not give them more important features for training models; however, a multi-center trial will collect more unseen images in the future to increase performance (35-37).

Conclusions

The use of DL architectures for image classification is a potential method to build a diagnostic tool to classify pure germinoma and non-pure germinoma. When more images of pineal tumors are gathered in the future, transfer learning will be a valuable technique for performance enhancement.

Acknowledgments

Funding: None.

Footnote

Reporting Checklist: The authors have completed the STROBE reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-120/rc

Data Sharing Statement: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-120/dss

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-120/prf

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-120/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Human Research Ethics Committee of the Faculty of Medicine, Prince of Songkla University (REC.65-431-10-1). Because it was a retrospective analysis, the current study did not require patients’ informed consent. However, the identity numbers of patients were encoded before the training process.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Hirato J, Nakazato Y. Pathology of pineal region tumors. J Neurooncol 2001;54:239-49. [Crossref] [PubMed]
Mottolese C, Szathmari A, Beuriat PA. Incidence of pineal tumours. A review of the literature. Neurochirurgie 2015;61:65-9. [Crossref] [PubMed]
Fedorko S, Zweckberger K, Unterberg AW. Quality of life following surgical treatment of lesions within the pineal region. J Neurosurg 2019;130:28-37. [Crossref] [PubMed]
Pettorini BL, Al-Mahfoud R, Jenkinson MD, et al. Surgical pathway and management of pineal region tumours in children. Childs Nerv Syst 2013;29:433-9. [Crossref] [PubMed]
Balossier A, Blond S, Touzet G, et al. Endoscopic versus stereotactic procedure for pineal tumour biopsies: Comparative review of the literature and learning from a 25-year experience. Neurochirurgie 2015;61:146-54. [Crossref] [PubMed]
Bruce JN, Ogden AT. Surgical strategies for treating patients with pineal region tumors. J Neurooncol 2004;69:221-36. [Crossref] [PubMed]
Konovalov AN, Pitskhelauri DI. Principles of treatment of the pineal region tumors. Surg Neurol 2003;59:250-68. [Crossref] [PubMed]
Song JH, Kong DS, Shin HJ. Feasibility of neuroendoscopic biopsy of pediatric brain tumors. Childs Nerv Syst 2010;26:1593-8. [Crossref] [PubMed]
Pople IK, Athanasiou TC, Sandeman DR, et al. The role of endoscopic biopsy and third ventriculostomy in the management of pineal region tumours. Br J Neurosurg 2001;15:305-11. [Crossref] [PubMed]
Regis J, Bouillot P, Rouby-Volot F, et al. Pineal region tumors and the role of stereotactic biopsy: review of the mortality, morbidity, and diagnostic rates in 370 cases. Neurosurgery 1996;39:907-12; discussion 912-4. [PubMed]
Kreth FW, Schätz CR, Pagenstecher A, et al. Stereotactic management of lesions of the pineal region. Neurosurgery 1996;39:280-9; discussion 289-91. [Crossref] [PubMed]
Chen B, Chen C, Zhang Y, et al. Differentiation between Germinoma and Craniopharyngioma Using Radiomics-Based Machine Learning. J Pers Med 2022;12:45. [Crossref] [PubMed]
Ye N, Yang Q, Liu P, et al. A comprehensive machine-learning model applied to MRI to classify germinomas of the pineal region. Comput Biol Med 2023;152:106366. [Crossref] [PubMed]
McAvoy M, Prieto PC, Kaczmarzyk JR, et al. Classification of glioblastoma versus primary central nervous system lymphoma using convolutional neural networks. Sci Rep 2021;11:15219. [Crossref] [PubMed]
Tariciotti L, Caccavella VM, Fiore G, et al. A Deep Learning Model for Preoperative Differentiation of Glioblastoma, Brain Metastasis and Primary Central Nervous System Lymphoma: A Pilot Study. Front Oncol 2022;12:816638. [Crossref] [PubMed]
Özkaraca O, Bağrıaçık Oİ, Gürüler H, et al. Multiple Brain Tumor Classification with Dense CNN Architecture Using Brain MRI Images. Life (Basel) 2023;13:349. [Crossref] [PubMed]
Aluri S, Imambi SS. Brain tumour classification using MRI images based on lenet with golden teacher learning optimization. Network 2024;35:27-54. [Crossref] [PubMed]
Oladimeji OO, Ibitoye AOJ. Brain tumor classification using ResNet50-convolutional block attention module. Applied Computing and Informatics 2023. doi: 10.1108/ACI-09-2023-0022.10.1108/ACI-09-2023-0022
Supbumrung S, Kaewborisutsakul A, Tunthanathip T. Machine learning-based classification of pineal germinoma from magnetic resonance imaging. World Neurosurg X 2023;20:100231. [Crossref] [PubMed]
Taweesomboonyat C, Tunthanathip T, Sae-Heng S, et al. Diagnostic Yield and Complication of Frameless Stereotactic Brain Biopsy. J Neurosci Rural Pract 2019;10:78-84. [Crossref] [PubMed]
Tunthanathip T, Duangsuwan J, Wattanakitrungroj N, et al. Comparison of intracranial injury predictability between machine learning algorithms and the nomogram in pediatric traumatic brain injury. Neurosurg Focus 2021;51:E7. [Crossref] [PubMed]
Jaruenpunyasak J, Duangsoithong R, Tunthanathip T. Deep learning for image classification between primary central nervous system lymphoma and glioblastoma in corpus callosal tumors. J Neurosci Rural Pract 2023;14:470-6. [Crossref] [PubMed]
Das S, Aranya OFMRR, Labiba NN. Brain Tumor Classification Using Convolutional Neural Network. Brain Tumor Classification Using Convolutional Neural Network. 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT); Dhaka, Bangladesh. IEEE Xplore; 2019:1-5.
Asiri AA, Shaf A, Ali T, et al. Advancing Brain Tumor Classification through Fine-Tuned Vision Transformers: A Comparative Study of Pre-Trained Models. Sensors (Basel) 2023;23:7913. [Crossref] [PubMed]
Maurício J, Domingues I, Bernardino J. Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review. Appl Sci 2023;13:5521. [Crossref]
Rustamy F. Vision Transformers vs. Convolutional Neural Networks [Internet]. [cited April 15, 2024]. Available online: https://medium.com/@faheemrustamy/vision-transformers-vs-convolutional-neural-networks-5fe8f9e18efc
Zhang Y, Liang K, He J, et al. Deep Learning With Data Enhancement for the Differentiation of Solitary and Multiple Cerebral Glioblastoma, Lymphoma, and Tumefactive Demyelinating Lesion. Front Oncol 2021;11:665891. [Crossref] [PubMed]
Xia W, Hu B, Li H, et al. Deep Learning for Automatic Differential Diagnosis of Primary Central Nervous System Lymphoma and Glioblastoma: Multi-Parametric Magnetic Resonance Imaging Based Convolutional Neural Network Model. J Magn Reson Imaging 2021;54:880-7. [Crossref] [PubMed]
Pewsner D, Battaglia M, Minder C, et al. Ruling a diagnosis in or out with "SpPIn" and "SnNOut": a note of caution. BMJ 2004;329:209-13. [Crossref] [PubMed]
Azab WA, Nasim K, Salaheddin W. An overview of the current surgical options for pineal region tumors. Surg Neurol Int 2014;5:39. [Crossref] [PubMed]
Choi JU, Kim DS, Chung SS, et al. Treatment of germ cell tumors in the pineal region. Childs Nerv Syst 1998;14:41-8. [Crossref] [PubMed]
Cho BK, Wang KC, Nam DH, et al. Pineal tumors: experience with 48 cases over 10 years. Childs Nerv Syst 1998;14:53-8. [Crossref] [PubMed]
Tunthanathip T, Sae-Heng S, Oearsakul T, et al. Economic impact of a machine learning-based strategy for preparation of blood products in brain tumor surgery. PLoS One 2022;17:e0270916. [Crossref] [PubMed]
TensorFlow. Transfer learning [internet]. 2022 [cited 2024 April 1]. Available online: https://www.tensorflow.org/tutorials/images/transfer_learning
Jacaruso LC. Accuracy improvement for Fully Convolutional Networks via selective augmentation with applications to electrocardiogram data. Inform Med Unlocked 2021;26:100729. [Crossref]
Fangmongkol S, Posai V. A Discriminant Analysis of the Factors Affecting Abnormalities in Chest Computed Tomographies of 608-Group Patients with COVID-19 Pneumonia. J Health Sci Med Res 2023;41:e2023952. [Crossref]
Trakulpanitkit A, Tunthanathip T. Comparison of intracranial pressure prediction in hydrocephalus patients among linear, non-linear, and machine learning regression models in Thailand. Acute Crit Care 2023;38:362-70. [Crossref] [PubMed]

doi: 10.21037/jmai-24-120
Cite this article as: Tunthanathip T, Kaewborisutsakul A, Supbumrung S. Comparative analysis of deep learning architectures for performance of image classification in pineal region tumors. J Med Artif Intell 2025;8:13.

Comparative analysis of deep learning architectures for performance of image classification in pineal region tumors

Highlight box

Introduction

Methods

Study design and study population

Data gathering and preprocessing

Model architectures

Typical CNN architecture

DenseNet architecture

LeNet architecture

ResNet architecture

VGG architecture

Vision Transformer (ViT) architecture

Model training

Statistical analysis

Ethical considerations

Results

Characteristics of patients in the present cohort

Table 1

DL model development and validation

Table 2

Table 3

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share