Paper status: completed

Use of artificial intelligence and deep learning in fetal ultrasound imaging

Published:11/27/2022

Deep Learning Applications in Fetal Ultrasound Imaging (1)AI Techniques in Medical Imaging (1)Ultrasound Imaging Diagnostic Support Tools (1)Fetal Biometry and Anatomy Recognition (1)Integration of Deep Learning and Ultrasound Imaging (1)

Original Link

Price: 0.100000

1 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This review explores deep learning's application in fetal ultrasound imaging, highlighting its potential to enhance diagnostic accuracy affected by operator experience, while covering areas like fetal anatomy identification and biometry measurement.

Abstract

Deep learning is considered the leading artificial intelligence tool in image analysis in general. Deep-learning algorithms excel at image recognition, which makes them valuable in medical imaging. Obstetric ultrasound has become the gold standard imaging modality for detection and diagnosis of fetal malformations. However, ultrasound relies heavily on the operator’s experience, making it unreliable in inexperienced hands. Several studies have proposed the use of deep-learning models as a tool to support sonographers, in an attempt to overcome these problems inherent to ultrasound. Deep learning has many clinical applications in the field of fetal imaging, including identification of normal and abnormal fetal anatomy and measurement of fetal biometry. In this Review, we provide a comprehensive explanation of the fundamentals of deep learning in fetal imaging, with particular focus on its clinical applicability.

Mind Map

In-depth Reading

English Analysis~26 min read · 34,578 chars

1. Bibliographic Information

1.1. Title

The central topic of this paper is the Use of artificial intelligence and deep learning in fetal ultrasound imaging. It specifically focuses on how these advanced computational techniques are being applied and can be applied to improve the accuracy and efficiency of prenatal ultrasound examinations.

1.2. Authors

The authors are R. RAMIREZ ZEGARRA and T. GHI. They are affiliated with the Department of Medicine and Surgery, Obstetrics and Gynecology Unit, University of Parma, Parma, Italy. T. Ghi is the corresponding author.

1.3. Journal/Conference

This paper was published as a State-of-the-Art Review in Ultrasound in Obstetrics & Gynecology. This journal is a highly reputable and influential publication in the field of maternal-fetal medicine and ultrasound imaging, known for publishing high-impact research, guidelines, and reviews. Its standing reinforces the academic rigor and clinical relevance of the content.

1.4. Publication Year

The paper was published in 2022. The exact publication date is given as 2022-11-27T00:00:00.000Z.

1.5. Abstract

This paper reviews the application of deep learning (DL), a leading artificial intelligence (AI) tool, in fetal ultrasound imaging. It highlights that while obstetric ultrasound is the gold standard for detecting fetal malformations, its reliability is heavily dependent on the operator's experience. To address this, various studies have proposed DL models as support tools for sonographers. The review focuses on the clinical applicability of DL in fetal imaging, covering areas such as the identification of normal and abnormal fetal anatomy and the automatic measurement of fetal biometry. The authors aim to provide a comprehensive explanation of the fundamentals and clinical relevance of DL in this field.

1.6. Original Source Link

The original source link is /files/papers/691aa4e8110b75dcc59ae3e2/paper.pdf. This link points to the PDF of the paper, indicating it is an officially published work.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve revolves around the inherent limitations of conventional obstetric ultrasound, despite its status as the gold standard for detecting fetal malformations. These limitations are primarily:

Operator Dependency: The accuracy and reliability of ultrasound examinations heavily rely on the sonographer's experience, skill, and extensive knowledge of fetal anatomy. This makes it unreliable in inexperienced hands.
Subjectivity and Variability: Human analysis introduces subjectivity and interobserver variability, particularly in tasks like biometric measurements, which can lead to significant errors in estimating fetal weight and classifying fetal growth.
Low Detection Rates: Even with advancements, the overall detection rates for fetal malformations remain low, partly due to the human factor and problems inherent to ultrasound itself (e.g., acoustic shadows, speckle noise, motion blurring, unclear boundaries).
Time-Consuming Workflow: Acquiring correct scanning planes and performing detailed assessments is laborious and time-consuming.

The paper's entry point is the recognition of deep learning (DL) as a powerful artificial intelligence (AI) tool, particularly adept at image recognition and classification. DL models have demonstrated the ability to match or even exceed human capability in image analysis tasks. Therefore, the innovative idea is to leverage DL as a potential supporting tool for clinicians in fetal imaging to overcome these challenges, reduce examination times, improve reliability, and potentially aid in the training of new doctors.

2.2. Main Contributions / Findings

The paper provides a comprehensive review of the state-of-the-art applications of deep learning in fetal ultrasound imaging. Its primary contributions and key findings include:

Comprehensive Overview of DL Fundamentals: The paper explains what deep learning is, how it works, and why it is particularly suitable for fetal imaging, laying a foundational understanding for clinicians.
Detailed Clinical Applications: It systematically describes the diverse clinical applications of DL across various aspects of fetal imaging, categorizing them into:
- Automatic Measurement of Fetal Structures: DL models can automate biometric measurements (e.g., head circumference, femur length, abdominal circumference, crown-rump length, nuchal translucency), significantly reducing interobserver variability and examination times.
- Identification of Normal and Abnormal Fetal Anatomy: DL algorithms can be trained to:
  - Accurately detect different fetal standard planes (brain, heart, face, abdomen).
  - Localize and label fetal anatomical structures on various planes.
  - Differentiate between normal and abnormal anatomy, acting as a screening tool (e.g., normal/abnormal classification) or a diagnostic tool (e.g., localizing and classifying specific malformations).
- Specific Fetal Systems: The review details applications for the fetal central nervous system (CNS), fetal heart (including congenital heart disease (CHD) detection), placenta (biometry, lacunae), and other fetal structures (face, spine, kidneys, lungs, adipose tissue, sexual organs).
- Intrapartum Ultrasound: Discusses the emerging role of DL in assessing fetal head station, flexion, and position during labor.
Identification of Limitations and Future Directions: The paper critically discusses the limitations of deep learning in this field, such as the amount of data required, inherent bias of image recognition, the interpretability of results (black box problem), and ethical challenges. It also outlines future perspectives, emphasizing the need for multitasking DL models and prospective validation in real-life clinical scenarios.

The key conclusions are that DL holds considerable potential as a support tool for antenatal ultrasound, offering advantages in objectivity, reproducibility, speed, and accuracy. It is envisioned not as a replacement for experts but as a tool to support them and improve workflow, ultimately enhancing healthcare services, especially in rural areas or low-income countries with limited access to expert sonographers.

3.1. Foundational Concepts

To fully understand this paper, a reader needs to grasp several fundamental concepts related to artificial intelligence, machine learning, deep learning, and ultrasound imaging.

Artificial Intelligence (AI):
- Conceptual Definition: AI is a broad field of computer science that enables machines to perform tasks typically associated with human intelligence. This includes capabilities like learning, decision-making, visual perception, speech recognition, and problem-solving. The goal is to create systems that can reason and act intelligently.
- In this paper: AI algorithms are highlighted for their ability to identify complex patterns within data and provide quantitative solutions automatically, often more accurately and reproducibly than humans.
Machine Learning (ML):
- Conceptual Definition: ML is a subset of AI that allows computers to learn from data and improve their performance on a specific task without being explicitly programmed for that task. Instead of writing fixed rules, ML algorithms build a model from example data, called training data, to make predictions or decisions.
- In this paper: ML is presented as the overarching approach that enables computers to gain experience and improve. Deep learning is then introduced as a prominent type of ML.
Deep Learning (DL):
- Conceptual Definition: DL is a specialized subset of machine learning that uses artificial neural networks with multiple layers (hence "deep") to learn representations of data with multiple levels of abstraction. These networks are inspired by the structure and function of the human brain. Each layer processes the output from the previous layer, extracting increasingly complex features from the raw input data.
- In this paper: DL is considered the leading artificial intelligence tool in image analysis. Its architecture is described as complex, involving multiple deep layers of artificial neural networks.
Artificial Neural Networks (ANNs):
- Conceptual Definition: ANNs are computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers: an input layer, one or more hidden layers, and an output layer. Each connection between neurons has a weight, and each neuron has an activation function. During training, these weights are adjusted to minimize the difference between the network's output and the desired output.
- In this paper: DL models are built upon ANNs, with the "deep" aspect referring to the multiple hidden layers.
Convolutional Neural Networks (CNNs):
- Conceptual Definition: CNNs are a class of deep neural networks specifically designed for processing structured grid-like data, such as images. They are particularly effective for image recognition tasks. Key components of CNNs include convolutional layers (which apply filters to detect features like edges, textures, or patterns), pooling layers (which reduce spatial dimensions, making the model more robust to variations), and fully connected layers (which perform classification based on the learned features).
- In this paper: CNNs are highlighted as the most commonly used deep neural network in medical imaging due to their excellence in image recognition and classification.
Supervised Learning:
- Conceptual Definition: A type of machine learning where an algorithm learns from a labeled dataset, meaning each input data point is paired with a corresponding correct output. The algorithm uses this ground-truth data to learn a mapping function from inputs to outputs. The goal is to predict the output for new, unseen input data.
- In this paper: Supervised DL models are described as requiring labeled data or 'ground-truth' data as input for the neural networks during the training phase. This is the most common type of DL model discussed.
Unsupervised Learning:
- Conceptual Definition: A type of machine learning where an algorithm learns from unlabeled data, meaning there are no pre-defined correct outputs. The algorithm's goal is to discover hidden patterns, structures, or relationships within the input data on its own. Common tasks include clustering and dimensionality reduction.
- In this paper: Unsupervised learning techniques are mentioned as not requiring labels, with the DL model searching for the main patterns and similarities within the data.
Obstetric Ultrasound Imaging:
- Conceptual Definition: A medical imaging technique that uses high-frequency sound waves (ultrasound) to create real-time images of the fetus inside the mother's womb. It is non-invasive and does not involve radiation. It is crucial for monitoring fetal growth, assessing fetal anatomy, and detecting potential malformations throughout pregnancy.
- In this paper: It is described as the gold standard imaging modality for detection and diagnosis of fetal malformations.
Fetal Biometry:
- Conceptual Definition: The measurement of specific fetal body parts using ultrasound to estimate gestational age, monitor growth, and assess fetal well-being. Common measurements include head circumference (HC), biparietal diameter (BPD), abdominal circumference (AC), femur length (FL), crown-rump length (CRL), and nuchal translucency (NT).
- In this paper: Measurement of fetal biometry is identified as a key clinical application for DL to improve accuracy and reduce variability.

3.2. Previous Works

The paper, being a review, references numerous prior studies throughout its discussion of DL applications in fetal imaging. Here’s a summary of key areas and examples of prior work mentioned:

General AI/DL in Medical Imaging:
- Studies by Litjens et al. (2017) and Liu et al. (2019) are cited to establish the dominance of DL in medical image analysis, noting that over 80% of AI studies in medical imaging use a DL approach and that DL can match or exceed human capability.
- Drukker et al. (2020) and Fiorentino et al. (2023) are broader reviews on AI/DL in obstetrics and fetal ultrasound, providing context for the current paper's focus.
Automatic Measurement of Fetal Structures:
- For fetal head biometry (head circumference, occipitofrontal diameter, biparietal diameter), works by Sinclair et al. (2018), Li et al. (2020), and Rasuli et al. (2021) are referenced.
- For femur length, Zhu et al. (2021) is cited.
- Automatic abdominal circumference measurement, being more challenging, saw proposals for object detection or segmentation of landmarks, as noted by Chen et al. (2018) and Jang et al. (2018).
- Multitasking DL models for simultaneous biometric measurements in standard planes are mentioned, with examples from Plotka et al. (2021) and Ghelich Oghli et al. (2021).
- For CRL and NT in the first trimester, Ryou et al. (2019), Cengiz et al. (2021), and Chen et al. (2017) utilized 3D imaging and segmentation.
Identification of Normal and Abnormal Fetal Anatomy:
- Standard Plane Detection: Burgos-Artizzu et al. (2020) compared 19 DL algorithms for assigning four anatomical standard planes (abdomen, brain, femur, thorax), finding the best models performed similarly to trained sonographers but 25 times faster. Other works (Baumgartner et al. 2017; Chen et al. 2017; Yaqub et al. 2017) are cited for automatic detection of various standard planes.
- Structural Segmentation: Minae et al. (2022) are cited for showing segmentation DL models outperform humans and other AI models for structural segmentation tasks.
- Fetal Central Nervous System (CNS) Anomalies: Lin et al. (2022) developed a DL algorithm that localized and classified nine different brain malformations with 99% accuracy. Xie et al. (2020) and Xie et al. (2020) also developed models for classifying normal/abnormal brain images.
- Fetal Heart Anomalies (CHD): Arnaout et al. (2021) developed an ensemble of neural networks for expert-level prenatal detection of complex CHD. Dozen et al. (2020) and Nurmaini et al. (2020) are mentioned for detecting specific CHDs like hypoplastic left heart syndrome (HLHS) and ventricular septal defects (VSD).
Placenta: Hu et al. (2019) and Looney et al. (2018) explored automated placenta segmentation and volume estimation. Qi et al. (2018) worked on lacunae localization.
Intrapartum Ultrasound: Lu et al. (2022) and Ramirez Zegarra et al. (2022) developed DL models for assessing fetal occiput position during labor.

The paper does not present new formulas for these prior works, as it is a review. Instead, it summarizes their achievements and contributions to the field.

3.3. Technological Evolution

The field of medical imaging, and particularly fetal ultrasound, has seen a rapid technological evolution, with AI, and specifically DL, representing the latest significant leap.

Early Stages (Pre-AI): Ultrasound technology itself evolved from basic 2D imaging to more advanced 3D and 4D (real-time 3D) capabilities. However, image interpretation remained largely manual, requiring extensive human expertise. This led to the issues of operator dependency and interobserver variability highlighted in the paper.
Emergence of Traditional Machine Learning (Early 2000s): Initial attempts to automate medical image analysis involved traditional machine learning algorithms. These methods typically required feature engineering, where human experts manually designed algorithms to extract relevant features (e.g., edges, textures) from images before feeding them to classifiers. While useful, these methods were often limited by the quality of feature engineering and struggled with the complexity and variability of medical images.
Rise of Deep Learning (2010s onwards): The advent of deep learning revolutionized image analysis. Unlike traditional ML, DL models (especially CNNs) can automatically learn hierarchical features directly from raw image data, eliminating the need for manual feature engineering. This ability, coupled with increased computational power (GPUs) and access to large datasets, led to breakthroughs in tasks like image classification, object detection, and segmentation.
- General Medical Imaging: DL quickly demonstrated human-level performance or even superiority in various medical imaging domains, as noted by Litjens et al. (2017) and Liu et al. (2019).
- Fetal Imaging Specifics: In fetal ultrasound, DL has specifically evolved to address the unique challenges:
  - Standardization: DL models emerged to automate the difficult task of acquiring and recognizing fetal standard planes, a fundamental step for consistent examinations.
  - Biometry Automation: Moving beyond manual caliper placement, DL models advanced to automatically measure fetal structures, reducing human error and time.
  - Anomaly Detection: From simple classification (normal/abnormal) to complex object detection and segmentation of specific malformations in the brain, heart, and other organs.
  - 3D/4D Integration: DL is increasingly integrated with 3D ultrasound to select optimal planes and enhance analysis.
Current State: The paper highlights the current surge in DL applications in fetal imaging, with ongoing efforts to develop multitasking DL models that can integrate various steps of an ultrasound examination, from plane detection to anomaly identification. The focus is now shifting towards prospective validation and addressing ethical considerations for clinical implementation.

This paper's work fits squarely within the current state of technological evolution, synthesizing the cutting-edge applications and critically evaluating the path towards routine clinical integration.

3.4. Differentiation Analysis

As a State-of-the-Art Review, this paper does not propose a new method or algorithm; instead, its innovation lies in its comprehensive synthesis and critical analysis of the existing landscape of deep learning in fetal ultrasound imaging.

Compared to primary research papers that introduce specific DL models for particular tasks (e.g., a CNN for fetal brain anomaly detection), this review differentiates itself by:

Breadth over Specificity: It covers a wide array of DL applications across all major fetal systems (CNS, heart, placenta, other structures) and tasks (biometry, plane detection, anomaly detection, intrapartum use). This contrasts with individual research papers that often focus on a single task or organ system.
Educational and Foundational Focus: It provides a comprehensive explanation of the fundamentals of deep learning in fetal imaging, making it accessible to clinicians and researchers who may be new to the field. This includes defining core concepts like AI, ML, DL, and various DL tasks.
Clinical Applicability Emphasis: The review has a particular focus on its clinical applicability, bridging the gap between theoretical DL capabilities and their practical utility in improving obstetric ultrasound workflow, accuracy, and patient care.
Critical Evaluation: Beyond simply listing achievements, the paper dedicates significant sections to discussing the limitations of current DL models (data requirements, bias, interpretability) and ethical challenges. It also outlines future perspectives, including the need for multitasking models and prospective validation. This critical lens is often absent in individual research papers focused on presenting novel methods.
Synthesizing Knowledge: It brings together findings from numerous disparate studies, providing a unified view of the progress and challenges, which is invaluable for researchers and practitioners looking for an overview of the field.

In essence, while primary research papers expand the frontier of what DL can do in fetal imaging, this review paper maps that frontier, explains its terrain, and points out the remaining unexplored territories and potential pitfalls.

4. Methodology

This section details the fundamental concepts of deep learning as presented in the paper, explaining what deep learning is and how it works in the context of fetal imaging. The paper primarily describes various deep learning tasks and how they are applied, rather than a single overarching methodology for a new model.

4.1. Principles

The core idea behind implementing deep learning in fetal imaging stems from its exceptional capabilities in image recognition and classification. Deep learning models, particularly convolutional neural networks (CNNs), are designed to analyze large amounts of data in a layered, non-linear manner. They use pattern recognition to automatically extract highly representative image features from raw ultrasound data. This feature extraction allows the models to then label or classify an image (e.g., as normal or abnormal, or identifying specific anatomical planes).

The theoretical basis and intuition are that by exposing a deep learning model to a vast number of ultrasound images (often labeled by human experts, known as ground-truth data), the model learns to identify intricate patterns and relationships within the pixels that humans might miss or find difficult to consistently interpret. This learning process allows the model to develop an 'experience' that enables it to perform tasks such as image classification, detection, and segmentation, often matching or exceeding human performance. The goal is to overcome subjectivity, interobserver variability, and examination times inherent to human operators.

4.2. Core Methodology In-depth (Layer by Layer)

The paper explains deep learning by contrasting it with broader AI and machine learning concepts and then detailing the specific tasks DL models perform in fetal imaging.

4.2.1. Defining Deep Learning within AI and Machine Learning

The paper positions deep learning as a specialized tool within a broader hierarchy:

Artificial Intelligence (AI): This is the broadest concept, referring to a computer's ability to perform tasks requiring human-like intelligence, such as learning, decision-making, visual perception, and speech recognition. AI algorithms are designed to identify complex patterns within data to provide automatically a quantitative solution to a problem.
Machine Learning (ML): A subset of AI, ML enables computers to learn and improve their performance with 'experience' (i.e., using available data) without being programmed explicitly to do so. This is achieved by building models from data.
Deep Learning (DL): The most important algorithm type within ML for medical imaging. DL models are characterized by their complex architecture, involving multiple deep layers of artificial neural networks.

The most common type of deep neural network mentioned for medical imaging is the convolutional neural network (CNN), though other types exist (Figure 1).

The following figure (Figure 1 from the original paper) provides an overview of main types of deep-learning algorithm based on training techniques.

Figure 1 Overview of main types of deep-learning algorithm based on training techniques. 该图像是一个示意图，展示了基于训练技术的主要深度学习算法类型，包括监督学习和无监督学习的不同算法，如卷积神经网络（CNN）、递归神经网络（RNN）和生成对抗网络（GAN）。

4.2.2. Training Approaches for Deep Learning Models

DL models can be developed using two primary learning approaches:

Supervised Learning:
- This is the most common type.
- It requires the use of labeled data or 'ground-truth' data as input for the neural networks during the training phase. This means each ultrasound image used for training must be pre-annotated by a human expert with the correct classification or identification (e.g., "normal brain scan" or "scan with ventriculomegaly").
- After training, the model's performance is tested on unlabeled data, where it makes a prediction (output) and classifies the image.
Unsupervised Learning:
- These techniques do not require labels.
- The DL model searches for the main patterns and similarities within the data (input) on its own in order to classify the images (output). This approach is useful when labeled data is scarce or difficult to obtain.

4.2.3. Deep Learning Tasks in Fetal Imaging

For applying DL in obstetrics (e.g., identification of normal/abnormal anatomy, biometric measurements), DL models perform one or a combination of up to four specific tasks, depending on the required output:

Classification:
- Purpose: Assigns a binary 'class label' to an image.
- Examples: Labeling an image as normal/abnormal or identifying a specific anatomical plane like four-chamber view or left ventricular outflow tract.
- Illustration (from Box 1): If a classification DL model is given an image of ventriculomegaly in the axial transventricular view of the fetal brain, it will analyze the image and classify it as 'anomalous', but it does not provide information regarding where the anomaly is located.
Localization:
- Purpose: Provides the precise location of any given object in an image, typically indicating its position with a bounding box. This task assists in identifying anatomical landmarks and performing automatic measurements.
- Illustration (from Box 1): For a normal transventricular plane of the fetal brain, a localization DL model would analyze the image and provide the location of the anterior and posterior horns of the lateral ventricles, cavum septi pellucidi and other anatomical landmarks.
Object Detection:
- Purpose: This task is a combination of classification and localization. It simultaneously provides the location of fetal structures in an image (and, if necessary, their measurement) and its classification into normal or abnormal.
- Illustration (from Box 1): An object-detection DL model given an image of the four-chamber view of the fetal heart would first localize all the anatomical landmarks in this plane (e.g., both atria and ventricles, descending aorta, pulmonary veins, fetal spine) and thereafter, classify it as four-chamber view.
Segmentation:
- Purpose: Involves the delineation of an object present in the image, effectively isolating the object of interest from the rest of the structures. It is similar to localization but goes further by providing assessment of the morphology (shape, volume, and contour) of the object. It can also be paired with classification tasks.
- Illustration (from Box 1): A segmentation DL model provided with an image of the four-chamber view in a fetus with fetal growth restriction would not only identify all the anatomical landmarks but additionally provide an assessment of the morphology of the fetal heart (e.g., shape, area of the chambers), which is known to be affected in growth-restricted fetuses.
  
  These tasks are crucial for enabling DL models to support both inexperienced and experienced operators. For inexperienced users, they can automate biometric measurements or identify anatomical landmarks. For experienced operators, they can alert to subtle anomalies or reassure a junior examiner.

The following figure (Figure 2 from the original paper) shows an example of anatomical landmark identification using deep learning.

该图像是超声影像的示意图，左侧（a）展示了标注的脑小脑和CSP结构，右侧（b）展示了相应轮廓的深度学习算法识别结果。图中标注有脑小脑及CSP，展示了深度学习在胎儿超声成像中的应用。

The following figure (Figure 4 from the original paper) illustrates the workflow of an ideal DL model for screening fetal malformations.

该图像是示意图，展示了超声波成像中标准平面的自动检测和胎儿解剖结构的识别。图中包括不同的标准面，如横脑平面和四腔心视图，并指出如何通过生物测量进行胎儿缺陷的诊断。

5. Experimental Setup

As a State-of-the-Art Review, this paper does not present its own experimental setup, but rather synthesizes the common practices and challenges related to datasets, evaluation metrics, and baselines observed in the multitude of studies it reviews.

5.1. Datasets

The studies reviewed in the paper rely heavily on ultrasound image data for training and testing deep learning models.

Source and Characteristics:
- Most databases are built retrospectively, which introduces inherent bias into the studies.
- The data consists of various fetal ultrasound planes (e.g., brain, heart, abdomen, femur, face) and fetal anatomical structures.
- A significant challenge is the similar appearance of different ultrasound planes, requiring large and diverse datasets for accurate distinction. For example, distinguishing transventricular and transthalamic planes for ventriculomegaly diagnosis requires many images from both.
- The data needs to be labeled (for supervised learning), which is a tedious and time-consuming process, often done manually by human operators, potentially introducing bias into the training process.
Scale:
- Deep learning algorithms, especially those for fetal imaging, generally require a larger database compared to other AI algorithms.
- The paper notes considerable heterogeneity of sample sizes in different studies, ranging from only a few hundred to several thousand fetal images.
- Data Augmentation: To address the need for large quantities of data without having to acquire huge numbers of new cases, data augmentation is a common technique. This involves generation of new images from an existing image, by making minor alterations to the original image, such as rotating it or adjusting the echogenicity.
Challenges in Data Management:
- Building prospective databases to address specific research questions using DL is described as tedious and time-consuming, leading to elevated costs and use of human resources due to the laborious nature of data labeling.
- There is no standard method to calculate the sample size required for building accurate DL algorithms in fetal imaging.
- Data Privacy: Confidential patient data and images need to be shared with third parties involved in model development, raising ethical challenges and necessitating regulations on data privacy.

5.2. Evaluation Metrics

The paper, as a review, does not explicitly list the specific evaluation metrics used, but implicitly, the effectiveness of the deep learning models discussed would be assessed using standard metrics common in medical image analysis and machine learning for classification, detection, and segmentation tasks. Here are some of the most relevant metrics that would likely be employed in the reviewed studies:

5.2.1. Accuracy

Conceptual Definition: Accuracy measures the proportion of total predictions that were correct. It's a straightforward measure of overall correctness.
Mathematical Formula: $ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} $
Symbol Explanation:
- $\text{TP}$ : True Positives (correctly predicted positive cases).
- $\text{TN}$ : True Negatives (correctly predicted negative cases).
- $\text{FP}$ : False Positives (incorrectly predicted positive cases, also known as Type I error).
- $\text{FN}$ : False Negatives (incorrectly predicted negative cases, also known as Type II error).

5.2.2. Sensitivity (Recall)

Conceptual Definition: Sensitivity, also known as recall or true positive rate, measures the proportion of actual positive cases that were correctly identified. It's crucial in medical diagnosis where missing a positive case (e.g., a malformation) can have severe consequences.
Mathematical Formula: $ \text{Sensitivity} = \frac{\text{TP}}{\text{TP} + \text{FN}} $
Symbol Explanation:
- $\text{TP}$ : True Positives.
- $\text{FN}$ : False Negatives.

5.2.3. Specificity

Conceptual Definition: Specificity, or true negative rate, measures the proportion of actual negative cases that were correctly identified. It's important for ensuring that healthy cases are not misdiagnosed as abnormal.
Mathematical Formula: $ \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}} $
Symbol Explanation:
- $\text{TN}$ : True Negatives.
- $\text{FP}$ : False Positives.

5.2.4. Precision (Positive Predictive Value)

Conceptual Definition: Precision, or positive predictive value, measures the proportion of positive predictions that were actually correct. It answers the question: "Of all items the model identified as positive, how many are truly positive?"
Mathematical Formula: $ \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} $
Symbol Explanation:
- $\text{TP}$ : True Positives.
- $\text{FP}$ : False Positives.

5.2.5. F1-score

Conceptual Definition: The F1-score is the harmonic mean of precision and recall. It's a useful metric when you need to balance both precision and recall, especially in cases of imbalanced class distribution.
Mathematical Formula: $ \text{F1-score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} $
Symbol Explanation:
- $\text{Precision}$ : As defined above.
- $\text{Recall}$ : As defined above (same as Sensitivity).

5.2.6. Area Under the Receiver Operating Characteristic Curve (AUC-ROC)

Conceptual Definition: The AUC-ROC curve plots the true positive rate (Sensitivity) against the false positive rate (1 - Specificity) at various threshold settings. AUC-ROC represents the degree or measure of separability, indicating how well the model distinguishes between classes. A higher AUC suggests a better ability to discriminate between positive and negative classes.
Mathematical Formula: While there isn't a single simple formula for AUC, it is calculated as the area under the ROC curve. For discrete predictions, it can be approximated by ranking predicted probabilities and calculating the area using trapezoidal rule or by counting concordant pairs.
Symbol Explanation:
- The AUC value ranges from 0 to 1. A value of 0.5 suggests no discrimination (equivalent to random guessing), while 1.0 suggests perfect discrimination.

5.2.7. Intersection over Union (IoU) / Jaccard Index

Conceptual Definition: Particularly relevant for segmentation and object detection tasks, IoU measures the similarity between the predicted bounding box/segmentation mask and the ground-truth bounding box/segmentation mask. It is defined as the area of overlap between the predicted and ground-truth regions divided by the area of their union.
Mathematical Formula: $ \text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}} $
Symbol Explanation:
- $\text{Area of Overlap}$ : The region where the predicted and actual bounding boxes/masks both exist.
- $\text{Area of Union}$ : The total region covered by either the predicted or actual bounding box/mask (or both).

5.3. Baselines

The studies reviewed in the paper typically compare their deep learning methods against several benchmarks to demonstrate their effectiveness:

Human Operators (Sonographers): This is a primary baseline. DL models are often compared to fully trained sonographers or experienced operators to assess if they can match or even exceed human capability. The paper mentions comparisons regarding accuracy, speed (e.g., 25 times faster classification), and interobserver variability.
Traditional Image Processing Techniques: Before the widespread adoption of DL, other computer vision algorithms (e.g., rule-based systems, statistical pattern recognition, active shape models) were used for tasks like segmentation or feature extraction. While not explicitly detailed, these would form an implicit baseline for demonstrating the superior performance of DL.
Other AI/Machine Learning Models: Within the broader AI/ML landscape, DL models are often compared against other AI models (e.g., traditional machine learning classifiers like Support Vector Machines or Random Forests, or simpler neural network architectures) to show the advantage of deep architectures.
Different Deep Learning Architectures: When a new DL model is proposed (in the reviewed papers), it is often compared against existing or state-of-the-art DL architectures (e.g., different CNN variants, U-Net for segmentation) to demonstrate incremental improvements or efficiency gains.

These baselines are representative because they cover the spectrum from human expert performance to various computational approaches, allowing for a comprehensive evaluation of the deep learning models' advancements.

6. Results & Analysis

This section synthesizes the main findings and reported successes of deep learning applications in fetal ultrasound imaging, as discussed throughout the review paper. Since this is a review, there are no specific tables or experimental results generated by the authors themselves; rather, the paper summarizes findings from other published studies.

6.1. Core Results Analysis

The paper highlights that deep learning (DL) has demonstrated significant potential to improve fetal imaging by addressing key challenges like operator dependency, variability, and low detection rates.

6.1.1. Deep Learning for Automatic Measurement of Fetal Structures

Biometric Measurements: DL models have been successfully developed for automatic measurement of crucial fetal biometric parameters.
- Head Biometry: Good performance has been shown for head circumference, occipitofrontal diameter, and biparietal diameter.
- Femur Length: Automatic measurement of femur length has also been achieved.
- Abdominal Circumference: More challenging due to irregular shape and unclear boundaries, but object detection or segmentation of landmarks (stomach bubble, umbilical vein, fetal spine) has enabled automatic measurement.
- Multitasking Models: Advanced DL models can perform all biometric measurements simultaneously across the three fetal standard planes (head, abdomen, femur), and even estimate gestational age (GA).
- First Trimester Biometry: Automatic measurement of crown-rump length (CRL) and nuchal translucency (NT) is possible, often leveraging 3D imaging and segmentation techniques to find ideal planes.
Impact: These automations reduce interobserver variability and examination times, optimizing workflow and potentially decreasing fatigue and workplace injuries for sonographers.

6.1.2. Deep Learning for Identification of Normal and Abnormal Fetal Anatomy

This is a multi-step process that DL models can automate or support:

Correct Acquisition of Fetal Standard Planes:
- DL algorithms have been trained to accurately detect different fetal standard planes, including those for the brain, heart, face, and abdomen.
- Object-detection and segmentation models are found to be more accurate than classification models for this task, as they localize anatomical landmarks before classifying the plane, mimicking human operation.
- Performance: Studies, such as one by Burgos-Artizzu et al., found that the performance of the best models was similar to that of a fully trained sonographer, with a classification speed that was 25 times faster.
Accurate Identification of Normal Fetal Anatomy:
- DL models can localize and label fetal anatomical structures on different standard planes using object-detection and segmentation tasks.
- Segmentation DL models have been shown to outperform both humans and other AI models for manual structural segmentation, a task prone to high variability.
Differentiating between Normal and Abnormal Anatomy:
- DL models serve as screening tools (using classification models to determine if an image contains normal or abnormal anatomy) or diagnostic tools (using object-detection and segmentation to locate and classify the type of malformation).

6.1.3. Clinical Applicability Across Fetal Systems

Fetal Central Nervous System (CNS):
- Identification: DL models identify standard brain planes and anatomical landmarks (lateral ventricles, choroid plexus, cavum septi pellucidi, thalami, cerebellum, cisterna magna, Sylvian fissure, brainstem) with good accuracy.
- Measurements: Automatic measurements of structures like lateral ventricles or cavum septi pellucidi are possible.
- Cortical Development: DL models can assess cortical morphology to estimate gestational age, alerting operators to potential developmental anomalies if the estimated GA doesn't match the actual GA.
- Malformation Detection: DL models can detect structural abnormalities of the fetal brain or spine on standard screening planes. Notably, Lin et al. demonstrated an algorithm that localized and classified nine different brain malformations from standard screening planes with an overall accuracy of 99%.
Fetal Heart:
- Plane Acquisition: Standard heart planes (four-chamber, left/right ventricular outflow tract, three-vessel-and-trachea views) can be acquired automatically.
- Structure Identification: DL models using object detection or segmentation can identify detailed structures beyond the four chambers, including foramen ovale, mitral and tricuspid valves, aorta, apex cordis, moderator band, ventricular walls, interventricular septum, and pulmonary veins. They can even determine the cardiac cycle phase.
- Morphology and Measurements: Segmentation DL models allow evaluation of heart morphology and automatic measurement of cardiac structures, including chamber areas and cardiothoracic ratio or cardiac axis angle.
- Doppler Evaluation: Models can automatically assess pulsed-wave Doppler traces of left ventricular inflows and outflows.
- Congenital Heart Disease (CHD) Detection: DL models are proposed to alert operators to suspected cardiac malformations. Models have been built to detect specific CHDs like hypoplastic left heart syndrome (HLHS) and ventricular septal defects (VSD) with object detection or segmentation, even delineating and measuring defect size.
Placenta:
- DL models can perform placental biometry (currently not routine due to time/operator dependency) rapidly and reliably.
- They can assess placenta location (anterior/posterior) and appearance (normal/abnormal).
- Segmentation DL models with 3D ultrasound provide information on morphology and volume.
- Placental lacunae, associated with abnormal invasive placentation, can be identified and localized with good accuracy.
Other Fetal Structures:
- DL algorithms are expanding to detect abnormalities in the face, spine, kidneys, lungs, adipose tissue, and sexual organs.
- Some ultrasound machines are already integrating checklists and guidance powered by AI for comprehensive examinations.
Deep Learning and Intrapartum Ultrasound:
- Research aims to develop DL models to assess fetal occiput position (occiput anterior, posterior, or transverse) during the second stage of labor, which could play a role in daily labor-ward practice by providing simultaneous assessment of station, angle, and position.
  
  The following figure (Figure 3 from the original paper) summarizes the most important clinical applications of DL in fetal imaging.
  
  该图像是一个示意图，展示了深度学习在胎儿超声影像中的应用，包括自动测量胎儿结构、识别正常和异常胎儿解剖、标准平面的自动检测及胎儿畸形的检测等方面。

6.2. Data Presentation (Tables)

The original paper is a State-of-the-Art Review and does not contain any original experimental result tables of its own. It synthesizes findings and advancements from numerous other research papers. Therefore, there are no tables from the original paper to transcribe here.

6.3. Ablation Studies / Parameter Analysis

As a review paper that summarizes the state of the field rather than presenting a new model, the authors do not conduct their own ablation studies or parameter analyses. These types of studies are typically performed in original research papers to evaluate the contribution of individual components of a proposed model or to optimize its hyperparameters. The paper implicitly refers to the results of such analyses from the reviewed literature by stating the performance of various DL models for specific tasks.

7. Conclusion & Reflections

7.1. Conclusion Summary

The paper concludes that the future routine implementation of DL in obstetrics and fetal imaging is inevitable. Deep learning offers significant advantages, including objectivity, reproducibility, speed, and accuracy, positioning it as a powerful support tool for antenatal ultrasound. The authors emphasize that this technology is not intended to replace experts but rather to support them and improve workflow, ultimately benefiting both patients and healthcare providers by saving time and enhancing service quality. Furthermore, DL has the potential to improve healthcare in rural areas or low-income countries where access to expert sonographers is limited. While acknowledging that there is still a long road ahead before full clinical implementation, the rapidly increasing number of publications in the field suggests that this could be achieved sooner than we think.

7.2. Limitations & Future Work

The authors provide a thorough discussion of the current limitations of deep learning in fetal imaging and suggest critical future research directions.

7.2.1. Limitations

Amount of Data Required:
- A major barrier is the large amount of labeled data needed for effective model training, especially due to the similar appearance of different ultrasound planes.
- Most databases are built retrospectively, potentially introducing inherent bias.
- Building prospective databases is tedious, time-consuming, and costly.
- There is no standard method to calculate the sample size required for an accurate DL algorithm.
- Catastrophic Interference: A risk with unlocked algorithms (which continuously learn from new data) is catastrophic interference, where the algorithm forgets previously learned information upon learning new data. This is a concern for regulatory institutions, which often prefer locked algorithms.
Inherent Bias of Image Recognition:
- DL models are highly sensitive to misinterpretation. They are usually trained on normal fetal anatomy, but small but normal anatomical variations might be misinterpreted as anomalies.
- Human operators manually annotate fetal images for supervised training, which can introduce bias into the training process and affect validity.
- Current DL models do not take into account the clinical history of the patient (e.g., previous malformations, genetic disorders, maternal habitus), which is crucial for accurate diagnosis in human practice.
Interpretability of Results (Black Box Problem):
- DL models lack explanatory power regarding why a particular decision is made. Their complexity, with multiple hidden layers, makes it difficult to understand the decision-making process. This lack of understanding can hinder medical professionals' trust in the technology.
Ethical Challenges:
- Accountability: A key question is whether AI models or human operators should be held accountable for incorrect diagnoses or inappropriate clinical decisions, especially given the huge medicolegal implications of false-positive or false-negative diagnoses in pregnancy.
- Data Privacy: Concerns exist regarding data privacy, as confidential patient data and images need to be shared with third parties involved in model development, necessitating strict regulations.
- Job Displacement: The potential for algorithms to replace humans and lead to higher rates of unemployment is another ethical consideration.

7.2.2. Future Work

Multitasking DL Models: There is an urgent need for more multitasking DL models that can integrate detection of fetal standard planes, identification of fetal anatomical structures, and performance of automatic measurements to process an entire ultrasound examination and flag suspected malformations.
Prospective Validation: Before clinical implementation, it is mandatory to validate the algorithms prospectively. This involves testing algorithms in real-life scenarios with real patients to understand how they perform with malformations or normal anatomical variations for which they have not been trained.
Addressing Ethical Concerns: Well-conducted prospective studies are needed to inform policymakers and legislators regarding accountability, data privacy, and job displacement as AI integrates into healthcare.

7.3. Personal Insights & Critique

This review offers a highly valuable synthesis of a rapidly evolving field, striking a good balance between technical explanation and clinical relevance.

Inspirations and Transferability: The core concept of using AI to standardize and enhance operator-dependent diagnostic imaging is highly transferable. Similar DL applications are already being explored in other ultrasound domains (e.g., cardiac, abdominal, musculoskeletal) and other imaging modalities (e.g., MRI, CT). The idea of tutoring young and inexperienced doctors or providing support in low-resource settings is particularly inspiring, as it suggests a path toward democratizing access to high-quality diagnostic imaging expertise globally. The automation of tedious tasks like biometric measurements could indeed reduce burnout and workplace injuries for sonographers, highlighting a patient and provider-centric benefit beyond just diagnostic accuracy.
Potential Issues and Areas for Improvement:
- The "Black Box" Problem: While acknowledged, the lack of interpretability remains a critical hurdle for clinical adoption. Clinicians need to trust the diagnosis, and understanding the reasoning behind an AI's decision is paramount, especially in high-stakes fields like fetal anomaly detection. Future research should focus more on explainable AI (XAI) methods to provide transparent insights into how models arrive at their conclusions.
- Bias and Generalizability: The reliance on retrospective databases and human annotation for supervised learning inherently embeds existing biases. The paper also points out the challenge of normal anatomical variations being misinterpreted. This highlights a need for more diverse, multi-ethnic, and multi-institutional prospective datasets to ensure models are robust and generalizable to a global patient population, not just those represented in the training data.
- Integration of Clinical Context: The paper correctly identifies that current DL models do not take into account the clinical history. For true decision support, AI models in medicine must integrate multimodal data—image, clinical history, genetic information, laboratory results—to provide a comprehensive and personalized assessment, moving beyond purely image-based diagnostics.
- Validation Standards: The lack of a standard method to calculate the sample size for DL algorithms in fetal imaging and the call for prospective validation are crucial. Regulatory bodies will require robust evidence from large-scale, randomized controlled trials to approve these technologies for widespread clinical use. The transition from promising academic results to validated, regulated clinical tools is often the slowest part of the innovation cycle.
- Ethical Frameworks: While briefly mentioned, the ethical challenges of accountability, data privacy, and job displacement require proactive and collaborative efforts from technologists, clinicians, ethicists, legal experts, and policymakers to establish clear guidelines and safeguards before widespread deployment. The catastrophic interference problem for unlocked algorithms also highlights a fundamental tension between continuous learning and regulatory stability, which needs innovative solutions.
  
  Overall, this review paints an optimistic yet realistic picture of DL's future in fetal imaging. Its greatest value may lie in prompting the community to address the critical limitations and ethical considerations with the same rigor and innovation that has driven the technological advancements themselves.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.