Introduction
Tumor Type | Origin | Grade/Severity | Symptoms |
---|---|---|---|
Gliomas (arising from glial cells) | Brain tissue | Varies (I-IV) | Headaches, seizures, vision problems, weakness, personality changes |
Meningioma | Membranes surrounding the brain | Usually, benign | Headaches, seizures, vision problems, numbness, weakness |
Schwannoma | Nerve cells | Usually, benign | Hearing loss, tinnitus, dizziness, facial weakness |
Pituitary Adenoma | Pituitary gland | Varies (benign, aggressive) | Vision problems, headaches, fatigue, excessive thirst, milk production (galactorrhea) |
Medulloblastoma | Embryonic cells | Malignant | Headaches, nausea, vomiting, balance problems, difficulty walking |
Craniopharyngioma | Near the pituitary gland | Usually, benign | Vision problems, headaches, fatigue, hormonal imbalances |
Metastatic Brain Tumors | Spread from other cancers | Any grade | Varies depending on primary cancer |
-
The primary objective of this research is to develop and validate a federated learning-based CNN model for the classification of brain tumors from MRI images.
-
This model aims to enhance classification accuracy while addressing data privacy concerns, a significant step forward in medical imaging and diagnostics.
Related work
-
Traditional Image Analysis Techniques: While pioneering, traditional image processing techniques such as edge detection and region-based segmentation have been limited by their reliance on manual intervention and the potential for subjective interpretations. These methods laid the groundwork for automated analysis but often fell short in handling the complex and varied morphology of brain tumors, necessitating the development of more sophisticated, automated systems.
-
Machine Learning Approaches: The advent of machine learning brought about a significant improvement in automated classification with algorithms like Support Vector Machines (SVM) and Random Forests. However, these approaches required extensive feature engineering to capture the nuances of tumor morphology, a process that is both labor-intensive and potentially limiting in capturing the full complexity of the data. Moreover, classical machine learning methods sometimes struggled to manage the high-dimensional nature of MRI data effectively [3].
-
Deep Learning Developments: Deep learning models diagnose by analyzing patterns in vast datasets during training, where they adjust internal parameters through iterative processes to minimize errors between predicted and actual diagnoses. Once trained, these models apply learned patterns to new data to make diagnoses. However, understanding the precise reasoning behind each diagnosis can be challenging as deep learning models often operate as “black boxes,” lacking transparent decision-making processes. Despite their impressive accuracy, efforts to enhance interpretability, such as attention mechanisms and saliency maps, aim to shed light on the features or patterns influencing the models’ diagnoses, thereby improving trust and understanding in their clinical applications [4].
-
Federated Learning in Medical Imaging: Federated learning emerges as a promising solution to some of these challenges, especially in addressing data privacy and scarcity. By enabling models to be trained across multiple decentralized datasets, federated learning circumvents the need for data centralization, thus preserving privacy. However, one critical challenge in federated learning is the potential introduction of communication overhead between devices, which can impact its efficiency [5, 6].
-
Multi-Task Learning and Transfer Learning: Some studies have explored multi-task learning and transfer learning to improve the efficiency and generalizability of models. These models, while delivering impressive performance, often lack transparency in their decision-making process, making it challenging for clinicians and researchers to understand how they arrive at diagnoses [7, 8].
Study | Accuracy | Summary |
---|---|---|
Pedada, Kameswara Rao, et al. [9] | 93.40% and 92.20% | Use of U-Net Model for the segmentation on Brats 2017 and 2018 dataset. |
Saeedi, Soheila, et al. [10] | 96.47% | 2D CNN employed with ensemble techniques of machine learning. |
Mahmud, Md Ishtyaq, Muntasir Mamun, and Ahmed Abdelgawad. [11] | 93.3% | Redefined CNN Model with modified classification. |
Wang, Nathan, et al. [12] | 94.90% | Deep CNN on OCT Images. |
Prakash, R. Meena, et al. [13] | 97.39% | Hyperparameter tuning of dense net. |
Senan, Ebrahim Mohammed, et al. [14] | 95.10% | Alexnet + SVM |
Haq, Amin ul, et al. [15] | 97.40% | CNN with Transfer Learning |
Rasool, Mohammed, et al. [16] | 98.1% | GoogleNet along with SVM as classifier |
Khan, Abdul Hannan, et al. [17] | 94.84% | Hierarchical Deep Learning-Based Brain Tumor (HDL2BT) classification |
Gaur, Loveleen, et al. [18] | 94.64% | CNN with Gaussian Noise |
Vidyarthi, Ankit, et al. [19] | 95.86% | CNN with NN Classifier |
Lamrani, Driss, et al. [20] | 96% | CNN with Enhanced Classifiers |
Methodology
A. Dataset description and preparation
Type | Training | Testing |
---|---|---|
Glioma | 1321 | 300 |
Meningioma | 1339 | 306 |
No Tumor | 1595 | 405 |
Pituitary | 1457 | 300 |
Image preprocessing techniques
-
Normalization: Following augmentation, a critical preprocessing step involved normalizing the images’ pixel values. This normalization procedure standardized the pixel intensity values, scaling them between 0 and 1. Such normalization is imperative for optimizing CNN model training. It aids in stabilizing and accelerating the training process by ensuring consistent data ranges across the entire dataset, thereby preventing certain features from disproportionately influencing the learning process.
-
Resizing: Consistency in input dimensions is pivotal for Convolutional Neural Networks (CNNs) to effectively process images. Hence, all images underwent resizing to adhere to a uniform dimension of 128 × 128 pixels. This standardization ensures that the model receives inputs of a consistent size, facilitating uniform processing and enabling CNN to extract relevant features from the images consistently.
B. Convolutional neural network (CNN) model architecture
C. Model architecture
-
Input Layer: The model commences with an input layer designed to accept images of 128 × 128 pixels, embracing three color channels (Red, Green, Blue - RGB). This layer serves as the gateway for the images to traverse through the network.
-
Base Model: The cornerstone of the architecture is the incorporation of the pre-trained VGG16 base model, shorn of its original top layers. Retaining the base layers while discarding the classification head allows the model to retain its proficiency in extracting intricate features from images while enabling its adaptation to the specific task at hand - brain tumor classification. These base layers come equipped with weights learned from the vast and diverse ImageNet dataset, serving as a valuable foundation for discerning pertinent features in our dataset.
-
Flattening Layer: Following the convolutional layers, a flattening layer is introduced. This layer transforms the two-dimensional output from the last convolutional layer into a one-dimensional array, preparing the data for processing through subsequent fully connected layers.
-
Dense Layers: The architecture incorporates several dense layers, often termed as fully connected layers, leveraging Rectified Linear Unit (ReLU) activation functions. These layers are pivotal in capturing and comprehending the intricate, non-linear relationships embedded within the data. By sequentially connecting these densely connected layers, the model can learn hierarchical representations of the input data, crucial for discerning complex patterns associated with brain tumor classification.
-
Dropout Layers: To combat overfitting, a common concern in neural network models, dropout layers have been strategically incorporated. During the training phase, these layers randomly deactivate a fraction of input units, mitigating the reliance on specific neurons and preventing the network from overfitting to the training data. This regularization technique promotes the model’s ability to generalize well to unseen data.
-
Output Layer: The architecture culminates in an output layer comprising a dense layer with a SoftMax activation function. This final layer is responsible for the classification task, assigning probabilities to each class (glioma, meningioma, no tumor, pituitary). The SoftMax function normalizes these probabilities, ensuring they sum up to one, thereby facilitating the categorization of the input image into one of the four distinct tumor categories based on the highest probability.
-
Fine-Tuning and Training: The base layers of the VGG16 model were fine-tuned on our specific dataset. By making these layers trainable, the model could adapt and learn more relevant features related to brain tumor classification. This process involves updating the weights of these layers during the training phase, thereby tailoring the network’s representations to the intricacies and complexities inherent in our dataset.
D. Federated learning implementation
-
Client Selection: The federated learning process initiates by randomly selecting a subset of clients for model training at each iteration or training round. In this scenario, approximately 50% of the total clients, totaling ten clients, were chosen for participation in each training round.
-
Local Training: Upon selection, each client involved in the federated learning process receives a copy of the global model. This global model serves as the initial framework derived from the modified VGG16 architecture. Each client then proceeds to train this model locally on their respective datasets.
-
The decentralized nature of federated learning allows each client to leverage their local dataset without transmitting any raw or identifiable patient data outside their environment. This local training process occurs autonomously at each client’s end, enabling them to iteratively update the model based on the unique characteristics and nuances within their dataset.
-
Model Aggregation: Following the local training phase, the models from each client are aggregated to update the global model. This aggregation involves averaging the weights of the models obtained from the various clients. By amalgamating the locally trained models through weight averaging, the global model is iteratively refined and enhanced.
E. Training and evaluation
Training batches
-
Image Processing and Augmentation: The training process begins with data processing and augmentation. The dataset is loaded and shuffled to ensure randomness, crucial for robust model training. Augmentation techniques, such as brightness and contrast adjustments, are applied to diversify the dataset and mitigate overfitting. Images are preprocessed to ensure uniformity in size and pixel values, enhancing the model’s ability to learn from various samples.
-
Batch Processing for Efficient Learning: The model receives data in batches, a practice vital for efficient training. This strategy aids in managing memory resources and facilitates parallel processing, enabling the model to learn iteratively in manageable chunks rather than processing the entire dataset at once. The code segments demonstrate the creation of a data generator function that yields batches of images and labels, allowing the model to learn from a subset of data in each iteration. In Fig. 6 the training loss can be observed.
-
Validation set: A crucial step in model development is the validation phase, where a subset of the dataset is reserved exclusively for validation purposes. This set acts as unseen data for the model, enabling the assessment of its generalization capabilities. Validation occurs iteratively during model training to monitor its performance on data it has not been trained on, safeguarding against overfitting, and ensuring the model’s ability to generalize to new, unseen samples.
Hyperparameter tuning:
-
Optimization for Enhanced Performance: The success of the model hinges on optimal hyperparameter configuration. Hyperparameters, such as learning rate and dropout rates, significantly influence the model’s learning process. Tweaking these parameters is critical for achieving superior performance and preventing issues like underfitting or overfitting. The code illustrates setting these hyperparameters and their values for fine-tuning.
-
Iterative Optimization: Hyperparameter tuning is an iterative process aimed at finding the most suitable values that optimize the model’s learning without compromising its ability to generalize. This iterative approach involves adjusting hyperparameters, training the model, evaluating its performance on the validation set, and iteratively refining the parameters to achieve the best possible model performance [17].
Results and discussions
-
Accuracy: Accuracy, which is calculated in Eq. 3, in the context of model evaluation, quantifies the proportion of correctly classified images out of the total number of images. While it offers a quick glimpse into the model’s overall performance, accuracy might not provide a complete picture, especially when dealing with imbalanced datasets. For instance, if one class dominates the dataset, the model might achieve high accuracy by simply predicting the majority class most of the time, neglecting the classification of minority classes [18].
-
Precision: Precision that is calculated in Eq. 4 delves deeper into the model’s performance by assessing the correctness of positive predictions. Specifically, it measures the ratio of correctly predicted positive instances (true positives) to the total number of instances predicted as positive (true positives + false positives). In the context of brain tumor classification, precision signifies how accurately the model identifies a specific tumor type when it makes a positive prediction for that class.
-
Recall: Recall that is calculated using Eq. 5, also known as sensitivity or true positive rate, signifies the model’s ability to correctly identify all instances of a particular class among all the instances that belong to that class. It quantifies the ratio of correctly predicted positive instances (true positives) to the total number of actual positive instances (true positives + false negatives). In the context of brain tumor classification, recall emphasizes the model’s capability to detect and not miss instances of a particular tumor type.
-
F1-Score: The F1-score that is calculated using Eq. 6 offers a harmonized measure that balances both precision and recall. It represents the harmonic means of precision and recall, providing a single metric to evaluate the model’s performance considering both false positives and false negatives. This metric is particularly useful when there is an imbalance between the classes or when both precision and recall are crucial for the classification task. In brain tumor classification, where each tumor type’s identification is vital, the F1-score becomes a critical metric to assess overall performance.
A. Model performance
Tumor | Precision | Recall | F1-score |
---|---|---|---|
Glioma | 0.99 | 0.94 | 0.96 |
Meningioma | 0.94 | 0.96 | 0.95 |
No tumor | 1 | 1 | 1 |
Pituitary | 0.97 | 0.99 | 0.98 |
-
Accuracy Scores: The model demonstrated an overall accuracy of 98%, indicating its effectiveness in correctly identifying the presence and type of brain tumors in MRI images.
-
Confusion Matrix: A confusion matrix was generated to provide a visual representation of the model’s performance [23‐24]. The matrix highlighted the true positives, false positives, true negatives, and false negatives for each tumor category. The high number of true positives and true negatives, along with the small number of false positives and false negatives, underscored the model’s accuracy which can be observed in Fig. 8.
B. Analysis of results
-
High Precision and Recall: The model’s high precision indicates a low rate of false positives, which is crucial in medical diagnostics to avoid unnecessary treatments. Similarly, the high recall scores suggest a low rate of false negatives, ensuring that the presence of tumors is accurately identified.
-
Effectiveness in Classifying Tumor Types: The near-perfect F1-scores across all tumor types reflect the model’s exceptional ability to differentiate between glioma, meningioma, no tumor, and pituitary cases. This is particularly significant given the challenges associated with distinguishing between these tumor types using traditional methods.
-
Generalization Capability: The high overall accuracy score demonstrates the model’s capability to generalize well across the diverse dataset. This suggests that the model can be reliably used in different clinical settings and with varying MRI image qualities.
-
Federated Learning Impact: The implementation of federated learning contributed to the model’s robustness and accuracy. By training across multiple decentralized datasets, the model benefited from a wider variety of data, enhancing its ability to generalize and perform accurately on unseen data.
C. Comparison with existing methods
Study | Accuracy | Technique |
---|---|---|
Pedada, Kameswara Rao, et al. [9] | 93.40% and 92.20% | Use of U-Net Model for the segmentation on Brats 2017 and 2018 dataset. |
Saeedi, Soheila, et al. [10] | 96.47% | 2D CNN employed with ensemble techniques of machine learning. |
Mahmud, Md Ishtyaq, Muntasir Mamun, and Ahmed Abdelgawad. [11] | 93.3% | Redefined CNN Model with modified classification. |
Khan, Abdul Hannan, et al. [19] | 94.84% | Hierarchical Deep Learning-Based Brain Tumor (HDL2BT) classification |
Gaur, Loveleen, et al. [20] | 94.64% | CNN with Gaussian Noise |
Vidyarthi, Ankit, et al. [21] | 95.86% | CNN with NN Classifier |
Lamrani, Driss, et al. [22] | 96% | CNN with Enhanced Classifiers |
Islam, Moinul, et al. [23] | 91.05% | Federated Learning |
Alshammari, Abdulaziz. [24] | 93.74% | VGG-16 with Integration of CNN |
Proposed Model | 98% | VGG with Federated Learning |
D. Challenges and limitations
-
Data Biases and Diversity: The model’s performance is contingent on the diversity and quality of the data on which it is trained. Biases in the dataset, such as overrepresentation of certain tumor types or imaging styles, could potentially skew the model’s learning and prediction accuracy [25].
-
Federated Learning Complexities: While federated learning offers benefits in data privacy and diversity, it also introduces complexities in model training and aggregation. Ensuring consistent model performance across different clients with potentially non-IID (independently and identically distributed) data is a challenge.
-
Scalability and Computational Resources: The scalability of the federated learning approach and the computational resources required for training and aggregating models across multiple clients are significant considerations, especially in resource-constrained setting [26].
E. Future directions
-
Expanding Dataset Diversity: Testing the model on a more diverse set of MRI images, including those from different demographics and with varying imaging conditions, would improve its robustness and generalizability.
-
Cross-Institutional Collaboration: Implementing the model across multiple medical institutions would not only evaluate its scalability but also enrich the training data, potentially leading to improved model accuracy.
-
Broader Medical Imaging Applications: The success of this model in brain tumor classification opens avenues for applying similar federated learning-based deep learning approaches to other areas of medical imaging, such as detecting tumors in other organs or diagnosing different neurological disorders [27].
-
Model Optimization and Efficiency: Ongoing research to optimize the model’s computational efficiency and training time can make the approach more feasible for real-world medical settings [28].