Use of artificial intelligence and deep learning in fetal ultrasound imaging
TL;DR Summary
This review explores deep learning's application in fetal ultrasound imaging, highlighting its potential to enhance diagnostic accuracy affected by operator experience, while covering areas like fetal anatomy identification and biometry measurement.
Abstract
Deep learning is considered the leading artificial intelligence tool in image analysis in general. Deep-learning algorithms excel at image recognition, which makes them valuable in medical imaging. Obstetric ultrasound has become the gold standard imaging modality for detection and diagnosis of fetal malformations. However, ultrasound relies heavily on the operator’s experience, making it unreliable in inexperienced hands. Several studies have proposed the use of deep-learning models as a tool to support sonographers, in an attempt to overcome these problems inherent to ultrasound. Deep learning has many clinical applications in the field of fetal imaging, including identification of normal and abnormal fetal anatomy and measurement of fetal biometry. In this Review, we provide a comprehensive explanation of the fundamentals of deep learning in fetal imaging, with particular focus on its clinical applicability.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
The central topic of this paper is the Use of artificial intelligence and deep learning in fetal ultrasound imaging. It specifically focuses on how these advanced computational techniques are being applied and can be applied to improve the accuracy and efficiency of prenatal ultrasound examinations.
1.2. Authors
The authors are R. RAMIREZ ZEGARRA and T. GHI. They are affiliated with the Department of Medicine and Surgery, Obstetrics and Gynecology Unit, University of Parma, Parma, Italy. T. Ghi is the corresponding author.
1.3. Journal/Conference
This paper was published as a State-of-the-Art Review in Ultrasound in Obstetrics & Gynecology. This journal is a highly reputable and influential publication in the field of maternal-fetal medicine and ultrasound imaging, known for publishing high-impact research, guidelines, and reviews. Its standing reinforces the academic rigor and clinical relevance of the content.
1.4. Publication Year
The paper was published in 2022. The exact publication date is given as 2022-11-27T00:00:00.000Z.
1.5. Abstract
This paper reviews the application of deep learning (DL), a leading artificial intelligence (AI) tool, in fetal ultrasound imaging. It highlights that while obstetric ultrasound is the gold standard for detecting fetal malformations, its reliability is heavily dependent on the operator's experience. To address this, various studies have proposed DL models as support tools for sonographers. The review focuses on the clinical applicability of DL in fetal imaging, covering areas such as the identification of normal and abnormal fetal anatomy and the automatic measurement of fetal biometry. The authors aim to provide a comprehensive explanation of the fundamentals and clinical relevance of DL in this field.
1.6. Original Source Link
The original source link is /files/papers/691aa4e8110b75dcc59ae3e2/paper.pdf. This link points to the PDF of the paper, indicating it is an officially published work.
2. Executive Summary
2.1. Background & Motivation
The core problem the paper aims to solve revolves around the inherent limitations of conventional obstetric ultrasound, despite its status as the gold standard for detecting fetal malformations. These limitations are primarily:
-
Operator Dependency: The accuracy and reliability of ultrasound examinations heavily rely on the sonographer's experience, skill, and extensive knowledge of fetal anatomy. This makes it
unreliable in inexperienced hands. -
Subjectivity and Variability: Human analysis introduces subjectivity and
interobserver variability, particularly in tasks like biometric measurements, which can lead to significant errors in estimating fetal weight and classifying fetal growth. -
Low Detection Rates: Even with advancements, the overall detection rates for fetal malformations
remain low, partly due to the human factor andproblems inherent to ultrasounditself (e.g.,acoustic shadows,speckle noise,motion blurring,unclear boundaries). -
Time-Consuming Workflow: Acquiring correct scanning planes and performing detailed assessments is
laborious and time-consuming.The paper's entry point is the recognition of deep learning (DL) as a powerful artificial intelligence (AI) tool, particularly adept at
image recognitionandclassification. DL models have demonstrated the ability tomatch or even exceed human capabilityin image analysis tasks. Therefore, the innovative idea is to leverage DL as apotential supporting tool for cliniciansin fetal imaging to overcome these challenges, reduce examination times, improve reliability, and potentially aid in the training of new doctors.
2.2. Main Contributions / Findings
The paper provides a comprehensive review of the state-of-the-art applications of deep learning in fetal ultrasound imaging. Its primary contributions and key findings include:
-
Comprehensive Overview of DL Fundamentals: The paper explains
what deep learning is, how it works, and why it is particularly suitable for fetal imaging, laying a foundational understanding for clinicians. -
Detailed Clinical Applications: It systematically describes the diverse clinical applications of DL across various aspects of fetal imaging, categorizing them into:
- Automatic Measurement of Fetal Structures: DL models can automate
biometric measurements(e.g.,head circumference,femur length,abdominal circumference,crown-rump length,nuchal translucency), significantlyreducing interobserver variabilityandexamination times. - Identification of Normal and Abnormal Fetal Anatomy: DL algorithms can be trained to:
- Accurately
detect different fetal standard planes(brain, heart, face, abdomen). Localize and label fetal anatomical structureson various planes.Differentiate between normal and abnormal anatomy, acting as ascreening tool(e.g., normal/abnormal classification) or adiagnostic tool(e.g., localizing and classifying specific malformations).
- Accurately
- Specific Fetal Systems: The review details applications for the
fetal central nervous system (CNS),fetal heart(includingcongenital heart disease (CHD)detection),placenta(biometry, lacunae), andother fetal structures(face, spine, kidneys, lungs, adipose tissue, sexual organs). - Intrapartum Ultrasound: Discusses the emerging role of DL in assessing
fetal head station,flexion, andpositionduring labor.
- Automatic Measurement of Fetal Structures: DL models can automate
-
Identification of Limitations and Future Directions: The paper critically discusses the
limitations of deep learningin this field, such as theamount of data required,inherent bias of image recognition, theinterpretability of results(black boxproblem), andethical challenges. It also outlinesfuture perspectives, emphasizing the need formultitasking DL modelsandprospective validationin real-life clinical scenarios.The key conclusions are that DL holds
considerable potential as a support toolfor antenatal ultrasound, offering advantages inobjectivity, reproducibility, speed, and accuracy. It is envisioned not as a replacement for experts but as a tool tosupport them and improve workflow, ultimately enhancing healthcare services, especially inrural areas or low-income countrieswith limited access to expert sonographers.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully understand this paper, a reader needs to grasp several fundamental concepts related to artificial intelligence, machine learning, deep learning, and ultrasound imaging.
-
Artificial Intelligence (AI):
- Conceptual Definition: AI is a broad field of computer science that enables machines to perform tasks typically associated with human intelligence. This includes capabilities like
learning,decision-making,visual perception,speech recognition, andproblem-solving. The goal is to create systems that can reason and act intelligently. - In this paper: AI algorithms are highlighted for their ability to
identify complex patterns within dataand providequantitative solutionsautomatically, often more accurately and reproducibly than humans.
- Conceptual Definition: AI is a broad field of computer science that enables machines to perform tasks typically associated with human intelligence. This includes capabilities like
-
Machine Learning (ML):
- Conceptual Definition: ML is a subset of AI that allows computers to
learn from dataand improve their performance on a specific taskwithout being explicitly programmedfor that task. Instead of writing fixed rules, ML algorithms build a model from example data, calledtraining data, to make predictions or decisions. - In this paper: ML is presented as the overarching approach that enables computers to gain experience and improve. Deep learning is then introduced as a prominent type of ML.
- Conceptual Definition: ML is a subset of AI that allows computers to
-
Deep Learning (DL):
- Conceptual Definition: DL is a specialized subset of machine learning that uses
artificial neural networkswith multiple layers (hence "deep") to learn representations of data with multiple levels of abstraction. These networks are inspired by the structure and function of the human brain. Each layer processes the output from the previous layer, extracting increasingly complex features from the raw input data. - In this paper: DL is considered the
leading artificial intelligence tool in image analysis. Its architecture is described ascomplex, involvingmultiple deep layers of artificial neural networks.
- Conceptual Definition: DL is a specialized subset of machine learning that uses
-
Artificial Neural Networks (ANNs):
- Conceptual Definition: ANNs are computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers: an
input layer, one or morehidden layers, and anoutput layer. Each connection between neurons has aweight, and each neuron has anactivation function. Duringtraining, these weights are adjusted to minimize the difference between the network's output and the desired output. - In this paper: DL models are built upon ANNs, with the "deep" aspect referring to the
multiple hidden layers.
- Conceptual Definition: ANNs are computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers: an
-
Convolutional Neural Networks (CNNs):
- Conceptual Definition: CNNs are a class of deep neural networks specifically designed for processing structured grid-like data, such as images. They are particularly effective for
image recognitiontasks. Key components of CNNs includeconvolutional layers(which apply filters to detect features like edges, textures, or patterns),pooling layers(which reduce spatial dimensions, making the model more robust to variations), andfully connected layers(which perform classification based on the learned features). - In this paper: CNNs are highlighted as the
most commonly useddeep neural network in medical imaging due to their excellence inimage recognitionandclassification.
- Conceptual Definition: CNNs are a class of deep neural networks specifically designed for processing structured grid-like data, such as images. They are particularly effective for
-
Supervised Learning:
- Conceptual Definition: A type of machine learning where an algorithm learns from a
labeled dataset, meaning each input data point is paired with a corresponding correct output. The algorithm uses thisground-truth datato learn a mapping function from inputs to outputs. The goal is to predict the output for new, unseen input data. - In this paper:
Supervised DL modelsare described as requiringlabeled data or 'ground-truth' data as input for the neural networks during the training phase. This is themost common typeof DL model discussed.
- Conceptual Definition: A type of machine learning where an algorithm learns from a
-
Unsupervised Learning:
- Conceptual Definition: A type of machine learning where an algorithm learns from
unlabeled data, meaning there are no pre-defined correct outputs. The algorithm's goal is to discover hidden patterns, structures, or relationships within the input data on its own. Common tasks includeclusteringanddimensionality reduction. - In this paper:
Unsupervised learning techniquesare mentioned as not requiring labels, with the DL modelsearching for the main patterns and similarities within the data.
- Conceptual Definition: A type of machine learning where an algorithm learns from
-
Obstetric Ultrasound Imaging:
- Conceptual Definition: A medical imaging technique that uses high-frequency sound waves (ultrasound) to create real-time images of the fetus inside the mother's womb. It is non-invasive and does not involve radiation. It is crucial for monitoring fetal growth, assessing fetal anatomy, and detecting potential malformations throughout pregnancy.
- In this paper: It is described as the
gold standard imaging modality for detection and diagnosis of fetal malformations.
-
Fetal Biometry:
- Conceptual Definition: The measurement of specific fetal body parts using ultrasound to estimate gestational age, monitor growth, and assess fetal well-being. Common measurements include
head circumference (HC),biparietal diameter (BPD),abdominal circumference (AC),femur length (FL),crown-rump length (CRL), andnuchal translucency (NT). - In this paper:
Measurement of fetal biometryis identified as a key clinical application for DL to improve accuracy and reduce variability.
- Conceptual Definition: The measurement of specific fetal body parts using ultrasound to estimate gestational age, monitor growth, and assess fetal well-being. Common measurements include
3.2. Previous Works
The paper, being a review, references numerous prior studies throughout its discussion of DL applications in fetal imaging. Here’s a summary of key areas and examples of prior work mentioned:
-
General AI/DL in Medical Imaging:
- Studies by Litjens et al. (2017) and Liu et al. (2019) are cited to establish the dominance of DL in medical image analysis, noting that over 80% of AI studies in medical imaging use a DL approach and that DL can match or exceed human capability.
- Drukker et al. (2020) and Fiorentino et al. (2023) are broader reviews on AI/DL in obstetrics and fetal ultrasound, providing context for the current paper's focus.
-
Automatic Measurement of Fetal Structures:
- For
fetal head biometry(head circumference, occipitofrontal diameter, biparietal diameter), works by Sinclair et al. (2018), Li et al. (2020), and Rasuli et al. (2021) are referenced. - For
femur length, Zhu et al. (2021) is cited. - Automatic
abdominal circumferencemeasurement, being more challenging, saw proposals forobject detectionorsegmentationof landmarks, as noted by Chen et al. (2018) and Jang et al. (2018). - Multitasking DL models for simultaneous biometric measurements in standard planes are mentioned, with examples from Plotka et al. (2021) and Ghelich Oghli et al. (2021).
- For
CRLandNTin the first trimester, Ryou et al. (2019), Cengiz et al. (2021), and Chen et al. (2017) utilized 3D imaging and segmentation.
- For
-
Identification of Normal and Abnormal Fetal Anatomy:
- Standard Plane Detection: Burgos-Artizzu et al. (2020) compared 19 DL algorithms for assigning four anatomical standard planes (abdomen, brain, femur, thorax), finding the best models performed similarly to trained sonographers but 25 times faster. Other works (Baumgartner et al. 2017; Chen et al. 2017; Yaqub et al. 2017) are cited for automatic detection of various standard planes.
- Structural Segmentation: Minae et al. (2022) are cited for showing segmentation DL models outperform humans and other AI models for structural segmentation tasks.
- Fetal Central Nervous System (CNS) Anomalies: Lin et al. (2022) developed a DL algorithm that localized and classified nine different brain malformations with 99% accuracy. Xie et al. (2020) and Xie et al. (2020) also developed models for classifying normal/abnormal brain images.
- Fetal Heart Anomalies (CHD): Arnaout et al. (2021) developed an ensemble of neural networks for expert-level prenatal detection of complex CHD. Dozen et al. (2020) and Nurmaini et al. (2020) are mentioned for detecting specific CHDs like hypoplastic left heart syndrome (
HLHS) andventricular septal defects (VSD).
-
Placenta: Hu et al. (2019) and Looney et al. (2018) explored automated placenta segmentation and volume estimation. Qi et al. (2018) worked on lacunae localization.
-
Intrapartum Ultrasound: Lu et al. (2022) and Ramirez Zegarra et al. (2022) developed DL models for assessing fetal occiput position during labor.
The paper does not present new formulas for these prior works, as it is a review. Instead, it summarizes their achievements and contributions to the field.
3.3. Technological Evolution
The field of medical imaging, and particularly fetal ultrasound, has seen a rapid technological evolution, with AI, and specifically DL, representing the latest significant leap.
-
Early Stages (Pre-AI): Ultrasound technology itself evolved from basic 2D imaging to more advanced 3D and 4D (real-time 3D) capabilities. However, image interpretation remained largely manual, requiring extensive human expertise. This led to the issues of operator dependency and interobserver variability highlighted in the paper.
-
Emergence of Traditional Machine Learning (Early 2000s): Initial attempts to automate medical image analysis involved traditional machine learning algorithms. These methods typically required
feature engineering, where human experts manually designed algorithms to extract relevant features (e.g., edges, textures) from images before feeding them to classifiers. While useful, these methods were often limited by the quality of feature engineering and struggled with the complexity and variability of medical images. -
Rise of Deep Learning (2010s onwards): The advent of
deep learningrevolutionized image analysis. Unlike traditional ML, DL models (especially CNNs) canautomatically learn hierarchical featuresdirectly from raw image data, eliminating the need for manual feature engineering. This ability, coupled with increased computational power (GPUs) and access to large datasets, led to breakthroughs in tasks likeimage classification,object detection, andsegmentation.- General Medical Imaging: DL quickly demonstrated
human-level performanceor even superiority in various medical imaging domains, as noted by Litjens et al. (2017) and Liu et al. (2019). - Fetal Imaging Specifics: In fetal ultrasound, DL has specifically evolved to address the unique challenges:
- Standardization: DL models emerged to automate the difficult task of acquiring and recognizing
fetal standard planes, a fundamental step for consistent examinations. - Biometry Automation: Moving beyond manual caliper placement, DL models advanced to
automatically measure fetal structures, reducing human error and time. - Anomaly Detection: From simple classification (normal/abnormal) to complex
object detectionandsegmentationof specific malformations in the brain, heart, and other organs. - 3D/4D Integration: DL is increasingly integrated with 3D ultrasound to select optimal planes and enhance analysis.
- Standardization: DL models emerged to automate the difficult task of acquiring and recognizing
- General Medical Imaging: DL quickly demonstrated
-
Current State: The paper highlights the current
surgein DL applications in fetal imaging, with ongoing efforts to developmultitasking DL modelsthat can integrate various steps of an ultrasound examination, from plane detection to anomaly identification. The focus is now shifting towardsprospective validationand addressing ethical considerations for clinical implementation.This paper's work fits squarely within the current state of technological evolution, synthesizing the cutting-edge applications and critically evaluating the path towards routine clinical integration.
3.4. Differentiation Analysis
As a State-of-the-Art Review, this paper does not propose a new method or algorithm; instead, its innovation lies in its comprehensive synthesis and critical analysis of the existing landscape of deep learning in fetal ultrasound imaging.
Compared to primary research papers that introduce specific DL models for particular tasks (e.g., a CNN for fetal brain anomaly detection), this review differentiates itself by:
-
Breadth over Specificity: It covers a wide array of DL applications across
all major fetal systems(CNS, heart, placenta, other structures) and tasks (biometry, plane detection, anomaly detection, intrapartum use). This contrasts with individual research papers that often focus on a single task or organ system. -
Educational and Foundational Focus: It provides a
comprehensive explanation of the fundamentals of deep learningin fetal imaging, making it accessible to clinicians and researchers who may be new to the field. This includes defining core concepts like AI, ML, DL, and various DL tasks. -
Clinical Applicability Emphasis: The review has a
particular focus on its clinical applicability, bridging the gap between theoretical DL capabilities and their practical utility in improving obstetric ultrasound workflow, accuracy, and patient care. -
Critical Evaluation: Beyond simply listing achievements, the paper dedicates significant sections to discussing the
limitationsof current DL models (data requirements, bias, interpretability) andethical challenges. It also outlinesfuture perspectives, including the need formultitasking modelsandprospective validation. This critical lens is often absent in individual research papers focused on presenting novel methods. -
Synthesizing Knowledge: It brings together findings from numerous disparate studies, providing a unified view of the progress and challenges, which is invaluable for researchers and practitioners looking for an overview of the field.
In essence, while primary research papers expand the frontier of what DL can do in fetal imaging, this review paper maps that frontier, explains its terrain, and points out the remaining unexplored territories and potential pitfalls.
4. Methodology
This section details the fundamental concepts of deep learning as presented in the paper, explaining what deep learning is and how it works in the context of fetal imaging. The paper primarily describes various deep learning tasks and how they are applied, rather than a single overarching methodology for a new model.
4.1. Principles
The core idea behind implementing deep learning in fetal imaging stems from its exceptional capabilities in image recognition and classification. Deep learning models, particularly convolutional neural networks (CNNs), are designed to analyze large amounts of data in a layered, non-linear manner. They use pattern recognition to automatically extract highly representative image features from raw ultrasound data. This feature extraction allows the models to then label or classify an image (e.g., as normal or abnormal, or identifying specific anatomical planes).
The theoretical basis and intuition are that by exposing a deep learning model to a vast number of ultrasound images (often labeled by human experts, known as ground-truth data), the model learns to identify intricate patterns and relationships within the pixels that humans might miss or find difficult to consistently interpret. This learning process allows the model to develop an 'experience' that enables it to perform tasks such as image classification, detection, and segmentation, often matching or exceeding human performance. The goal is to overcome subjectivity, interobserver variability, and examination times inherent to human operators.
4.2. Core Methodology In-depth (Layer by Layer)
The paper explains deep learning by contrasting it with broader AI and machine learning concepts and then detailing the specific tasks DL models perform in fetal imaging.
4.2.1. Defining Deep Learning within AI and Machine Learning
The paper positions deep learning as a specialized tool within a broader hierarchy:
-
Artificial Intelligence (AI): This is the broadest concept, referring to a computer's ability to perform tasks requiring human-like intelligence, such as
learning,decision-making,visual perception, andspeech recognition. AI algorithms are designed toidentify complex patterns within datatoprovide automatically a quantitative solution to a problem. -
Machine Learning (ML): A subset of AI, ML enables computers to
learn and improve their performance with 'experience'(i.e., using available data)without being programmed explicitly to do so. This is achieved by building models from data. -
Deep Learning (DL): The most important algorithm type within ML for medical imaging. DL models are characterized by their
complex architecture, involvingmultiple deep layers of artificial neural networks.The most common type of deep neural network mentioned for medical imaging is the
convolutional neural network (CNN), though other types exist (Figure 1).
The following figure (Figure 1 from the original paper) provides an overview of main types of deep-learning algorithm based on training techniques.
该图像是一个示意图,展示了基于训练技术的主要深度学习算法类型,包括监督学习和无监督学习的不同算法,如卷积神经网络(CNN)、递归神经网络(RNN)和生成对抗网络(GAN)。
4.2.2. Training Approaches for Deep Learning Models
DL models can be developed using two primary learning approaches:
- Supervised Learning:
- This is the
most common type. - It
requires the use of labeled data or 'ground-truth' data as input for the neural networks during the training phase. This means each ultrasound image used for training must be pre-annotated by a human expert with the correct classification or identification (e.g., "normal brain scan" or "scan with ventriculomegaly"). - After training, the model's performance is
tested on unlabeled data, where it makes aprediction (output)andclassifies the image.
- This is the
- Unsupervised Learning:
- These techniques
do not require labels. - The DL model
searches for the main patterns and similarities within the data (input)on its ownin order to classify the images (output). This approach is useful when labeled data is scarce or difficult to obtain.
- These techniques
4.2.3. Deep Learning Tasks in Fetal Imaging
For applying DL in obstetrics (e.g., identification of normal/abnormal anatomy, biometric measurements), DL models perform one or a combination of up to four specific tasks, depending on the required output:
-
Classification:
- Purpose: Assigns a
binary 'class label'to an image. - Examples: Labeling an image as
normal/abnormalor identifying a specific anatomical plane likefour-chamber vieworleft ventricular outflow tract. - Illustration (from Box 1): If a classification DL model is given an image of
ventriculomegalyin the axial transventricular view of the fetal brain, it will analyze the image and classify it as'anomalous', but itdoes not provide information regarding where the anomaly is located.
- Purpose: Assigns a
-
Localization:
- Purpose: Provides the
precise location of any given objectin an image, typically indicating its position with abounding box. This task assists in identifyinganatomical landmarksand performingautomatic measurements. - Illustration (from Box 1): For a normal
transventricular planeof the fetal brain, a localization DL model would analyze the image and provide thelocation of the anterior and posterior horns of the lateral ventricles, cavum septi pellucidi and other anatomical landmarks.
- Purpose: Provides the
-
Object Detection:
- Purpose: This task is a
combination of classification and localization. Itsimultaneously provides the location of fetal structuresin an image (and, if necessary, their measurement)and its classification into normal or abnormal. - Illustration (from Box 1): An object-detection DL model given an image of the
four-chamber viewof the fetal heart would firstlocalize all the anatomical landmarks in this plane(e.g., both atria and ventricles, descending aorta, pulmonary veins, fetal spine) andthereafter, classify it as four-chamber view.
- Purpose: This task is a
-
Segmentation:
-
Purpose: Involves the
delineation of an objectpresent in the image, effectivelyisolating the object of interest from the rest of the structures. It is similar to localization but goes further by providingassessment of the morphology(shape, volume, and contour) of the object. It can also be paired with classification tasks. -
Illustration (from Box 1): A segmentation DL model provided with an image of the
four-chamber viewin a fetus withfetal growth restrictionwould not onlyidentify all the anatomical landmarksbut additionallyprovide an assessment of the morphology of the fetal heart(e.g., shape, area of the chambers), which is known to be affected in growth-restricted fetuses.These tasks are crucial for enabling DL models to support both
inexperiencedandexperienced operators. For inexperienced users, they can automatebiometric measurementsoridentify anatomical landmarks. For experienced operators, they canalertto subtle anomalies orreassurea junior examiner.
-
The following figure (Figure 2 from the original paper) shows an example of anatomical landmark identification using deep learning.
该图像是超声影像的示意图,左侧(a)展示了标注的脑小脑和CSP结构,右侧(b)展示了相应轮廓的深度学习算法识别结果。图中标注有脑小脑及CSP,展示了深度学习在胎儿超声成像中的应用。
The following figure (Figure 4 from the original paper) illustrates the workflow of an ideal DL model for screening fetal malformations.
该图像是示意图,展示了超声波成像中标准平面的自动检测和胎儿解剖结构的识别。图中包括不同的标准面,如横脑平面和四腔心视图,并指出如何通过生物测量进行胎儿缺陷的诊断。
5. Experimental Setup
As a State-of-the-Art Review, this paper does not present its own experimental setup, but rather synthesizes the common practices and challenges related to datasets, evaluation metrics, and baselines observed in the multitude of studies it reviews.
5.1. Datasets
The studies reviewed in the paper rely heavily on ultrasound image data for training and testing deep learning models.
-
Source and Characteristics:
- Most databases are
built retrospectively, which introducesinherent biasinto the studies. - The data consists of various
fetal ultrasound planes(e.g., brain, heart, abdomen, femur, face) andfetal anatomical structures. - A significant challenge is the
similar appearance of different ultrasound planes, requiring large and diverse datasets for accurate distinction. For example, distinguishingtransventricularandtransthalamicplanes forventriculomegalydiagnosis requires many images from both. - The data needs to be
labeled(for supervised learning), which is atedious and time-consumingprocess, often done manually by human operators, potentiallyintroducing bias into the training process.
- Most databases are
-
Scale:
- Deep learning algorithms, especially those for fetal imaging, generally
require a larger databasecompared to other AI algorithms. - The paper notes
considerable heterogeneity of sample sizesin different studies,ranging from only a few hundred to several thousand fetal images. - Data Augmentation: To address the need for large quantities of data without having to acquire huge numbers of new cases,
data augmentationis a common technique. This involvesgeneration of new images from an existing image, by making minor alterations to the original image, such as rotating it or adjusting the echogenicity.
- Deep learning algorithms, especially those for fetal imaging, generally
-
Challenges in Data Management:
- Building
prospective databasesto address specific research questions using DL is described astedious and time-consuming, leading toelevated costs and use of human resourcesdue to the laborious nature of data labeling. - There is
no standard method to calculate the sample size requiredfor building accurate DL algorithms in fetal imaging. - Data Privacy:
Confidential patient data and images need to be shared with third partiesinvolved in model development, raisingethical challengesand necessitatingregulations on data privacy.
- Building
5.2. Evaluation Metrics
The paper, as a review, does not explicitly list the specific evaluation metrics used, but implicitly, the effectiveness of the deep learning models discussed would be assessed using standard metrics common in medical image analysis and machine learning for classification, detection, and segmentation tasks. Here are some of the most relevant metrics that would likely be employed in the reviewed studies:
5.2.1. Accuracy
- Conceptual Definition: Accuracy measures the proportion of total predictions that were correct. It's a straightforward measure of overall correctness.
- Mathematical Formula: $ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} $
- Symbol Explanation:
- : True Positives (correctly predicted positive cases).
- : True Negatives (correctly predicted negative cases).
- : False Positives (incorrectly predicted positive cases, also known as Type I error).
- : False Negatives (incorrectly predicted negative cases, also known as Type II error).
5.2.2. Sensitivity (Recall)
- Conceptual Definition: Sensitivity, also known as recall or true positive rate, measures the proportion of actual positive cases that were correctly identified. It's crucial in medical diagnosis where missing a positive case (e.g., a malformation) can have severe consequences.
- Mathematical Formula: $ \text{Sensitivity} = \frac{\text{TP}}{\text{TP} + \text{FN}} $
- Symbol Explanation:
- : True Positives.
- : False Negatives.
5.2.3. Specificity
- Conceptual Definition: Specificity, or true negative rate, measures the proportion of actual negative cases that were correctly identified. It's important for ensuring that healthy cases are not misdiagnosed as abnormal.
- Mathematical Formula: $ \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}} $
- Symbol Explanation:
- : True Negatives.
- : False Positives.
5.2.4. Precision (Positive Predictive Value)
- Conceptual Definition: Precision, or positive predictive value, measures the proportion of positive predictions that were actually correct. It answers the question: "Of all items the model identified as positive, how many are truly positive?"
- Mathematical Formula: $ \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} $
- Symbol Explanation:
- : True Positives.
- : False Positives.
5.2.5. F1-score
- Conceptual Definition: The F1-score is the harmonic mean of precision and recall. It's a useful metric when you need to balance both precision and recall, especially in cases of imbalanced class distribution.
- Mathematical Formula: $ \text{F1-score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} $
- Symbol Explanation:
- : As defined above.
- : As defined above (same as Sensitivity).
5.2.6. Area Under the Receiver Operating Characteristic Curve (AUC-ROC)
- Conceptual Definition: The AUC-ROC curve plots the true positive rate (Sensitivity) against the false positive rate (1 - Specificity) at various threshold settings. AUC-ROC represents the degree or measure of separability, indicating how well the model distinguishes between classes. A higher AUC suggests a better ability to discriminate between positive and negative classes.
- Mathematical Formula: While there isn't a single simple formula for AUC, it is calculated as the area under the ROC curve. For discrete predictions, it can be approximated by ranking predicted probabilities and calculating the area using trapezoidal rule or by counting concordant pairs.
- Symbol Explanation:
- The AUC value ranges from 0 to 1. A value of 0.5 suggests no discrimination (equivalent to random guessing), while 1.0 suggests perfect discrimination.
5.2.7. Intersection over Union (IoU) / Jaccard Index
- Conceptual Definition: Particularly relevant for
segmentationandobject detectiontasks, IoU measures the similarity between the predicted bounding box/segmentation mask and the ground-truth bounding box/segmentation mask. It is defined as the area of overlap between the predicted and ground-truth regions divided by the area of their union. - Mathematical Formula: $ \text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}} $
- Symbol Explanation:
- : The region where the predicted and actual bounding boxes/masks both exist.
- : The total region covered by either the predicted or actual bounding box/mask (or both).
5.3. Baselines
The studies reviewed in the paper typically compare their deep learning methods against several benchmarks to demonstrate their effectiveness:
-
Human Operators (Sonographers): This is a primary baseline. DL models are often compared to
fully trained sonographersorexperienced operatorsto assess if they canmatch or even exceed human capability. The paper mentions comparisons regardingaccuracy,speed(e.g., 25 times faster classification), andinterobserver variability. -
Traditional Image Processing Techniques: Before the widespread adoption of DL, other computer vision algorithms (e.g., rule-based systems, statistical pattern recognition, active shape models) were used for tasks like segmentation or feature extraction. While not explicitly detailed, these would form an implicit baseline for demonstrating the superior performance of DL.
-
Other AI/Machine Learning Models: Within the broader AI/ML landscape, DL models are often compared against
other AI models(e.g., traditional machine learning classifiers like Support Vector Machines or Random Forests, or simpler neural network architectures) to show the advantage of deep architectures. -
Different Deep Learning Architectures: When a new DL model is proposed (in the reviewed papers), it is often compared against existing or state-of-the-art DL architectures (e.g., different CNN variants,
U-Netfor segmentation) to demonstrate incremental improvements or efficiency gains.These baselines are representative because they cover the spectrum from human expert performance to various computational approaches, allowing for a comprehensive evaluation of the deep learning models' advancements.
6. Results & Analysis
This section synthesizes the main findings and reported successes of deep learning applications in fetal ultrasound imaging, as discussed throughout the review paper. Since this is a review, there are no specific tables or experimental results generated by the authors themselves; rather, the paper summarizes findings from other published studies.
6.1. Core Results Analysis
The paper highlights that deep learning (DL) has demonstrated significant potential to improve fetal imaging by addressing key challenges like operator dependency, variability, and low detection rates.
6.1.1. Deep Learning for Automatic Measurement of Fetal Structures
- Biometric Measurements: DL models have been successfully developed for
automatic measurementof crucial fetal biometric parameters.- Head Biometry: Good performance has been shown for
head circumference,occipitofrontal diameter, andbiparietal diameter. - Femur Length: Automatic measurement of
femur lengthhas also been achieved. - Abdominal Circumference: More challenging due to
irregular shape and unclear boundaries, butobject detectionorsegmentationof landmarks (stomach bubble, umbilical vein, fetal spine) has enabled automatic measurement. - Multitasking Models: Advanced DL models can perform
all biometric measurements simultaneouslyacross the three fetal standard planes (head, abdomen, femur), and evenestimate gestational age (GA). - First Trimester Biometry: Automatic measurement of
crown-rump length (CRL)andnuchal translucency (NT)is possible, often leveraging3D imaging and segmentation techniquesto find ideal planes.
- Head Biometry: Good performance has been shown for
- Impact: These automations
reduce interobserver variabilityandexamination times, optimizing workflow and potentiallydecreasing fatigue and workplace injuriesfor sonographers.
6.1.2. Deep Learning for Identification of Normal and Abnormal Fetal Anatomy
This is a multi-step process that DL models can automate or support:
-
Correct Acquisition of Fetal Standard Planes:
- DL algorithms have been trained to
accurately detect different fetal standard planes, including those for thebrain,heart,face, andabdomen. Object-detectionandsegmentationmodels are found to bemore accurate than classification modelsfor this task, as they localize anatomical landmarks before classifying the plane, mimicking human operation.- Performance: Studies, such as one by Burgos-Artizzu et al., found that the
performance of the best models was similar to that of a fully trained sonographer, with a classification speed that was25 times faster.
- DL algorithms have been trained to
-
Accurate Identification of Normal Fetal Anatomy:
- DL models can
localize and label fetal anatomical structureson different standard planes usingobject-detection and segmentation tasks. Segmentation DL modelshave been shown tooutperform both humans and other AI modelsfor manual structural segmentation, a task prone to high variability.
- DL models can
-
Differentiating between Normal and Abnormal Anatomy:
- DL models serve as
screening tools(using classification models to determine if an image contains normal or abnormal anatomy) ordiagnostic tools(using object-detection and segmentation to locate and classify the type of malformation).
- DL models serve as
6.1.3. Clinical Applicability Across Fetal Systems
-
Fetal Central Nervous System (CNS):
- Identification: DL models identify standard brain planes and anatomical landmarks (lateral ventricles, choroid plexus, cavum septi pellucidi, thalami, cerebellum, cisterna magna, Sylvian fissure, brainstem) with
good accuracy. - Measurements: Automatic measurements of structures like
lateral ventriclesorcavum septi pellucidiare possible. - Cortical Development: DL models can assess cortical morphology to
estimate gestational age, alerting operators to potential developmental anomalies if the estimated GA doesn't match the actual GA. - Malformation Detection: DL models can detect
structural abnormalities of the fetal brain or spineon standard screening planes. Notably, Lin et al. demonstrated an algorithm that localized and classifiednine different brain malformationsfrom standard screening planes with anoverall accuracy of 99%.
- Identification: DL models identify standard brain planes and anatomical landmarks (lateral ventricles, choroid plexus, cavum septi pellucidi, thalami, cerebellum, cisterna magna, Sylvian fissure, brainstem) with
-
Fetal Heart:
- Plane Acquisition: Standard heart planes (
four-chamber,left/right ventricular outflow tract,three-vessel-and-trachea views) can beacquired automatically. - Structure Identification: DL models using
object detectionorsegmentationcan identify detailed structures beyond the four chambers, includingforamen ovale,mitral and tricuspid valves,aorta,apex cordis,moderator band,ventricular walls,interventricular septum, andpulmonary veins. They can even determine thecardiac cycle phase. - Morphology and Measurements:
Segmentation DL modelsallow evaluation of heart morphology andautomatic measurement of cardiac structures, including chamber areas andcardiothoracic ratioorcardiac axis angle. - Doppler Evaluation: Models can
automatically assess pulsed-wave Doppler tracesof left ventricular inflows and outflows. - Congenital Heart Disease (CHD) Detection: DL models are proposed to
alert operators to suspected cardiac malformations. Models have been built to detect specific CHDs likehypoplastic left heart syndrome (HLHS)andventricular septal defects (VSD)with object detection or segmentation, even delineating and measuring defect size.
- Plane Acquisition: Standard heart planes (
-
Placenta:
- DL models can perform
placental biometry(currently not routine due to time/operator dependency)rapidly and reliably. - They can assess
placenta location(anterior/posterior) andappearance(normal/abnormal). Segmentation DL modelswith3D ultrasoundprovide information on morphology and volume.Placental lacunae, associated withabnormal invasive placentation, can beidentified and localizedwith good accuracy.
- DL models can perform
-
Other Fetal Structures:
- DL algorithms are expanding to detect abnormalities in the
face,spine,kidneys,lungs,adipose tissue, andsexual organs. - Some ultrasound machines are already integrating
checklistsandguidancepowered by AI for comprehensive examinations.
- DL algorithms are expanding to detect abnormalities in the
-
Deep Learning and Intrapartum Ultrasound:
-
Research aims to develop DL models to
assess fetal occiput position(occiput anterior,posterior, ortransverse) during the second stage of labor, which couldplay a role in daily labor-ward practiceby providing simultaneous assessment of station, angle, and position.The following figure (Figure 3 from the original paper) summarizes the most important clinical applications of DL in fetal imaging.
该图像是一个示意图,展示了深度学习在胎儿超声影像中的应用,包括自动测量胎儿结构、识别正常和异常胎儿解剖、标准平面的自动检测及胎儿畸形的检测等方面。
-
6.2. Data Presentation (Tables)
The original paper is a State-of-the-Art Review and does not contain any original experimental result tables of its own. It synthesizes findings and advancements from numerous other research papers. Therefore, there are no tables from the original paper to transcribe here.
6.3. Ablation Studies / Parameter Analysis
As a review paper that summarizes the state of the field rather than presenting a new model, the authors do not conduct their own ablation studies or parameter analyses. These types of studies are typically performed in original research papers to evaluate the contribution of individual components of a proposed model or to optimize its hyperparameters. The paper implicitly refers to the results of such analyses from the reviewed literature by stating the performance of various DL models for specific tasks.
7. Conclusion & Reflections
7.1. Conclusion Summary
The paper concludes that the future routine implementation of DL in obstetrics and fetal imaging is inevitable. Deep learning offers significant advantages, including objectivity, reproducibility, speed, and accuracy, positioning it as a powerful support tool for antenatal ultrasound. The authors emphasize that this technology is not intended to replace experts but rather to support them and improve workflow, ultimately benefiting both patients and healthcare providers by saving time and enhancing service quality. Furthermore, DL has the potential to improve healthcare in rural areas or low-income countries where access to expert sonographers is limited. While acknowledging that there is still a long road ahead before full clinical implementation, the rapidly increasing number of publications in the field suggests that this could be achieved sooner than we think.
7.2. Limitations & Future Work
The authors provide a thorough discussion of the current limitations of deep learning in fetal imaging and suggest critical future research directions.
7.2.1. Limitations
- Amount of Data Required:
- A major barrier is the
large amount of labeled dataneeded for effective model training, especially due to thesimilar appearance of different ultrasound planes. - Most databases are built
retrospectively, potentially introducinginherent bias. - Building
prospective databasesistedious, time-consuming, andcostly. - There is
no standard method to calculate the sample size requiredfor an accurate DL algorithm. - Catastrophic Interference: A risk with
unlocked algorithms(which continuously learn from new data) iscatastrophic interference, where the algorithmforgets previously learned informationupon learning new data. This is a concern for regulatory institutions, which often preferlockedalgorithms.
- A major barrier is the
- Inherent Bias of Image Recognition:
- DL models are
highly sensitive to misinterpretation. They are usually trained onnormal fetal anatomy, butsmall but normal anatomical variationsmight be misinterpreted as anomalies. Human operators manually annotate fetal imagesfor supervised training, which canintroduce biasinto the training process and affect validity.- Current DL models
do not take into account the clinical history of the patient(e.g., previous malformations, genetic disorders, maternal habitus), which is crucial for accurate diagnosis in human practice.
- DL models are
- Interpretability of Results (
Black BoxProblem):- DL models
lack explanatory powerregardingwhy a particular decisionis made. Their complexity, with multiple hidden layers, makes it difficult to understand the decision-making process. Thislack of understandingcan hinder medical professionals' trust in the technology.
- DL models
- Ethical Challenges:
- Accountability: A key question is
whether AI models or human operators should be held accountablefor incorrect diagnoses or inappropriate clinical decisions, especially given thehuge medicolegal implicationsof false-positive or false-negative diagnoses in pregnancy. - Data Privacy: Concerns exist regarding
data privacy, asconfidential patient data and images need to be shared with third partiesinvolved in model development, necessitating strict regulations. - Job Displacement: The potential for algorithms to
replace humansand lead tohigher rates of unemploymentis another ethical consideration.
- Accountability: A key question is
7.2.2. Future Work
- Multitasking DL Models: There is an
urgent need for more multitasking DL modelsthat can integrate detection offetal standard planes, identification offetal anatomical structures, and performance ofautomatic measurementsto process an entire ultrasound examination and flag suspected malformations. - Prospective Validation: Before clinical implementation, it is
mandatory to validate the algorithms prospectively. This involves testing algorithms inreal-life scenarios with real patientsto understand how they perform withmalformations or normal anatomical variations for which they have not been trained. - Addressing Ethical Concerns:
Well-conducted prospective studiesare needed to inform policymakers and legislators regardingaccountability,data privacy, andjob displacementas AI integrates into healthcare.
7.3. Personal Insights & Critique
This review offers a highly valuable synthesis of a rapidly evolving field, striking a good balance between technical explanation and clinical relevance.
-
Inspirations and Transferability: The core concept of using AI to standardize and enhance operator-dependent diagnostic imaging is highly transferable. Similar DL applications are already being explored in other ultrasound domains (e.g., cardiac, abdominal, musculoskeletal) and other imaging modalities (e.g., MRI, CT). The idea of
tutoring young and inexperienced doctorsor providingsupport in low-resource settingsis particularly inspiring, as it suggests a path toward democratizing access to high-quality diagnostic imaging expertise globally. The automation of tedious tasks like biometric measurements could indeedreduce burnoutandworkplace injuriesfor sonographers, highlighting a patient and provider-centric benefit beyond just diagnostic accuracy. -
Potential Issues and Areas for Improvement:
-
The "Black Box" Problem: While acknowledged, the
lack of interpretabilityremains a critical hurdle for clinical adoption. Clinicians need totrustthe diagnosis, and understanding the reasoning behind an AI's decision is paramount, especially in high-stakes fields like fetal anomaly detection. Future research should focus more onexplainable AI (XAI)methods to provide transparent insights into how models arrive at their conclusions. -
Bias and Generalizability: The reliance on
retrospective databasesandhuman annotationfor supervised learning inherently embeds existing biases. The paper also points out the challenge ofnormal anatomical variationsbeing misinterpreted. This highlights a need for more diverse, multi-ethnic, and multi-institutional prospective datasets to ensure models are robust and generalizable to a global patient population, not just those represented in the training data. -
Integration of Clinical Context: The paper correctly identifies that current DL models
do not take into account the clinical history. For true decision support, AI models in medicine must integrate multimodal data—image, clinical history, genetic information, laboratory results—to provide a comprehensive and personalized assessment, moving beyond purely image-based diagnostics. -
Validation Standards: The lack of a
standard method to calculate the sample sizefor DL algorithms in fetal imaging and the call forprospective validationare crucial. Regulatory bodies will require robust evidence fromlarge-scale, randomized controlled trialsto approve these technologies for widespread clinical use. The transition from promising academic results to validated, regulated clinical tools is often the slowest part of the innovation cycle. -
Ethical Frameworks: While briefly mentioned, the ethical challenges of
accountability,data privacy, andjob displacementrequire proactive and collaborative efforts from technologists, clinicians, ethicists, legal experts, and policymakers to establish clear guidelines and safeguards before widespread deployment. Thecatastrophic interferenceproblem forunlocked algorithmsalso highlights a fundamental tension between continuous learning and regulatory stability, which needs innovative solutions.Overall, this review paints an optimistic yet realistic picture of DL's future in fetal imaging. Its greatest value may lie in prompting the community to address the critical limitations and ethical considerations with the same rigor and innovation that has driven the technological advancements themselves.
-
Similar papers
Recommended via semantic vector search.