Spatial Context Energy Curve-Based Multilevel 3-D Otsu Algorithm for Image Segmentation
TL;DR Summary
This paper presents a multilevel 3-D Otsu algorithm based on spatial context energy curves, improving segmentation results for low-contrast and low-SNR images by integrating pixel intensity with spatial information. Experimental results show superior performance across various me
Abstract
2760 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 5, MAY 2021 Spatial Context Energy Curve-Based Multilevel 3-D Otsu Algorithm for Image Segmentation Ashish Kumar Bhandari , Anurag Singh, and Immadisetty Vinod Kumar Abstract —While yielding satisfactory segmentation results for images with low SNR and poor contrast, one-dimensional (1-D) and two-dimensional (2-D) Otsu’s thresholding methods have the downside of high computational complexity. So far, three- dimensional (3-D) Otsu method has been based on histogram, which has only probability distribution of pixels as an object of interest. Histogram-based segmentation methods do not consider the contextual information which is significant to enrich the qual- ity of segmented image. In this paper, a context-…
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Spatial Context Energy Curve-Based Multilevel 3-D Otsu Algorithm for Image Segmentation
1.2. Authors
The authors of this paper are Ashish Kumar Bhandari, Anurag Singh, and Immadisetty Vinod Kumar.
- Ashish Kumar Bhandari: At the time of publication, he was an Assistant Professor at the National Institute of Technology Patna, India. His Ph.D. is in digital image processing, and his research focuses on image enhancement, segmentation, denoising, and soft computing techniques.
- Anurag Singh: He was an Assistant Professor at the International Institute of Information Technology Naya Raipur, India. His Ph.D. is in biomedical signal processing, with research interests in biomedical signal and image processing.
- Immadisetty Vinod Kumar: He received his B.Tech. degree from the National Institute of Technology Patna, India. His research interests include image segmentation using multilevel thresholding.
1.3. Journal/Conference
The paper was published in IEEE Transactions on Systems, Man, and Cybernetics: Systems. This is a prestigious, high-impact journal published by the Institute of Electrical and Electronics Engineers (IEEE). It focuses on systems engineering, including theory, analysis, and applications. Publication in this journal indicates that the work has undergone a rigorous peer-review process and is considered a significant contribution to the field.
1.4. Publication Year
The paper was formally published in Volume 51, Issue 5, in May 2021. The metadata indicates an initial online publication date of June 4, 2019.
1.5. Abstract
The abstract introduces the problem that while one-dimensional (1-D) and two-dimensional (2-D) Otsu's thresholding methods provide good segmentation for noisy and low-contrast images, they suffer from high computational complexity. It notes that existing three-dimensional (3-D) Otsu methods are histogram-based and thus ignore spatial context, which is crucial for high-quality segmentation. The authors propose a novel context-based 3-D Otsu algorithm that utilizes a "spatial context energy curve" instead of a histogram. This approach considers pixel intensity, spatial information, and histogram properties simultaneously. The proposed method is extensively evaluated and compared with histogram-based 1-D, 2-D, 3-D Otsu, and energy-based 1-D, 2-D Otsu methods. The experimental results demonstrate the superiority of the proposed energy-based 3-D Otsu algorithm, showing improved performance across multiple metrics like Mean Error (ME), Mean Square Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Feature Similarity Index (FSIM), Structure Similarity Index (SSIM), and Entropy.
1.6. Original Source Link
The provided link /files/papers/692bb5a74114e99a4cde875b/paper.pdf points to a local file. The paper is officially published and can be accessed through the IEEE Xplore digital library.
2. Executive Summary
2.1. Background & Motivation
- Core Problem: The paper addresses the challenge of image segmentation, a fundamental process in digital image processing where an image is partitioned into multiple meaningful regions or segments. A popular technique for this is thresholding, which classifies pixels based on their intensity values.
- Specific Challenges:
- Ignoring Spatial Context: Traditional thresholding methods, including the widely used Otsu's method, rely on the image's histogram. A histogram only records the frequency of pixel intensities, completely disregarding the spatial relationships between pixels. This makes them vulnerable to noise and can lead to inaccurate segmentation, as they treat a pixel in a smooth region the same as a pixel on an edge if they have the same intensity.
- Computational Complexity: As thresholding is extended to multiple levels (multilevel thresholding) to segment an image into more than two regions, the search space for optimal thresholds grows exponentially. This makes methods like 1-D and 2-D Otsu computationally expensive. While 3-D Otsu methods were introduced to incorporate more spatial information (like neighborhood mean and median), they often exacerbate the complexity problem.
- Paper's Entry Point: The authors identify the histogram as the primary weakness. Their innovative idea is to replace the intensity histogram with a "Spatial Context Energy Curve." This "energy curve" is not based on pixel counts but on the spatial correlation between a pixel and its neighbors. By applying the powerful 3-D Otsu framework to this richer, context-aware representation, the paper aims to achieve more accurate and robust image segmentation, especially for complex color images.
2.2. Main Contributions / Findings
- Main Contributions:
- A Novel Algorithm: The primary contribution is the proposal of a new multilevel segmentation algorithm, the "Spatial Context Energy Curve-Based Multilevel 3-D Otsu Algorithm" (
3D Otsu-Energy). - Integration of Energy Curve: This is the first work to integrate the concept of a spatial
energy curvewith the3-D Otsuobjective function. This novel combination leverages the spatial information encoded in the energy curve to guide the powerful multi-feature thresholding of 3-D Otsu. - Comprehensive Evaluation: The paper provides an exhaustive experimental comparison of the proposed method against a suite of six baseline algorithms: histogram-based 1-D, 2-D, and 3-D Otsu, as well as energy curve-based 1-D and 2-D Otsu.
- A Novel Algorithm: The primary contribution is the proposal of a new multilevel segmentation algorithm, the "Spatial Context Energy Curve-Based Multilevel 3-D Otsu Algorithm" (
- Key Findings:
- The proposed
3D Otsu-Energyalgorithm is quantitatively superior to all compared methods. It consistently achieves better scores across a wide range of image quality metrics, including PSNR, SSIM, FSIM, and Entropy, indicating that the segmented images are more faithful to the original images in terms of structure, features, and information content. - The method is also qualitatively superior, producing segmented images that are visually more appealing, with better-defined regions and clearer separation between objects and background.
- The use of the
energy curveeffectively addresses the limitation of histogram-based methods by incorporating spatial context, leading to more robust and accurate segmentation results.
- The proposed
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
- Image Segmentation: This is the process of partitioning a digital image into multiple segments (sets of pixels, also known as super-pixels). The goal of segmentation is to simplify or change the representation of an image into something that is more meaningful and easier to analyze. For example, in a medical image, segmentation could be used to identify and isolate a tumor from the surrounding healthy tissue.
- Thresholding: A simple yet powerful method of image segmentation. In its most basic form, bi-level thresholding, a single threshold value is chosen. All pixels with intensity values above the threshold are assigned to one class (e.g., white), and all pixels below are assigned to another class (e.g., black). This is effective for separating a foreground object from a background. Multilevel thresholding extends this by using multiple thresholds to partition the image into several distinct classes or regions.
- Image Histogram: A histogram is a bar graph that represents the frequency distribution of intensity values in an image. The x-axis represents the intensity levels (e.g., 0 to 255 for an 8-bit grayscale image), and the y-axis represents the number of pixels at each intensity level. Histograms provide a global description of an image's appearance but contain no information about the spatial location of pixels.
- Otsu's Method: A classic and highly effective automatic thresholding technique proposed by Nobuyuki Otsu in 1979. Its core idea is to find the threshold that best separates the pixels into two classes (foreground and background). It does this by searching for the threshold that maximizes the between-class variance. Maximizing the variance between the two classes is equivalent to minimizing the variance within each class. This leads to classes that are as distinct from each other and as internally homogeneous as possible. The between-class variance, , for a single threshold is calculated as:
$
\sigma_B^2(t) = \omega_0(t)(\mu_0(t) - \mu_T)^2 + \omega_1(t)(\mu_1(t) - \mu_T)^2
$
where:
- are the probabilities of the two classes (background and foreground).
- are the mean intensity levels of the two classes.
- is the total mean intensity of the entire image. Otsu's method exhaustively checks every possible threshold and chooses the one that gives the maximum .
3.2. Previous Works
- 1-D Otsu: This is the original method described above. It operates on a 1-D histogram of pixel gray levels. While simple and fast, it is sensitive to noise and intensity inhomogeneities, as it ignores all spatial information.
- 2-D Otsu: To improve robustness, researchers proposed 2-D Otsu. This method uses a 2-D histogram. Instead of just considering the pixel's intensity, it also considers a second feature, which is typically the average intensity of the pixel's local neighborhood. By doing so, it incorporates some spatial context. Pixels on an edge will have a very different neighborhood average than pixels in a smooth region, even if their own intensity is the same. This makes 2-D Otsu more resilient to noise.
- 3-D Otsu: This is a further extension that uses a 3-D histogram. It adds a third feature to the pixel intensity and neighborhood mean. This third feature is often the median intensity of the pixel's neighborhood. The median is less sensitive to outliers (noise) than the mean, adding another layer of robustness. However, building and searching a 3-D histogram is computationally very expensive, with complexity often cited as for an exhaustive search, where is the number of intensity levels. The paper mentions that faster versions have been developed to reduce this complexity.
- Energy Curve: The concept of an
energy curvefor segmentation was proposed in prior work [19] as an alternative to the histogram. It is derived from a model inspired by Hopfield neural networks. The "energy" of an image at a certain intensity level is calculated based on the spatial correlation between neighboring pixels. The resulting curve tends to be smoother than a histogram, and its valleys correspond to optimal threshold values that separate distinct regions in the image. This approach inherently captures spatial contextual information.
3.3. Technological Evolution
The evolution of Otsu-based thresholding reflects a clear trend toward incorporating more spatial information to improve segmentation accuracy and robustness:
- 1-D Otsu (Pixel Intensity Only): The starting point. Fast but ignores context.
- 2-D Otsu (Intensity + Neighborhood Mean): The first step into spatial context. More robust to noise but more complex.
- 3-D Otsu (Intensity + Mean + Median): A more advanced incorporation of spatial context, offering even greater robustness. However, this comes at a significant computational cost.
- Energy Curve-Based Otsu (This Paper): This work represents a paradigm shift. Instead of just adding more spatial features to the histogram-based framework, it replaces the histogram itself with the
energy curve. This allows it to be combined with the most advanced framework (3-D Otsu) to create a method that leverages multiple, rich sources of spatial information simultaneously.
3.4. Differentiation Analysis
The core difference between this paper's approach and previous work is the fundamental data representation used for thresholding.
- Previous Otsu Methods (1D, 2D, 3D): All are fundamentally histogram-based. They operate on a data structure that counts pixel occurrences. 2D and 3D Otsu try to embed spatial context by adding neighborhood statistics as extra dimensions to this histogram.
- Proposed Method (3D Otsu-Energy): This method is energy-curve-based. It completely discards the intensity histogram. Instead, it computes a new 1-D representation of the image (the
energy curve) where each point's value is determined by local spatial correlations. The Otsu criterion is then applied to thisenergy curveand its derivatives (mean and median filtered versions). This is a more profound integration of spatial context, as the very data being thresholded is inherently spatial in nature, rather than being a simple frequency count.
4. Methodology
4.1. Principles
The central principle of the proposed method is that the spatial arrangement and correlation of pixels provide more valuable information for segmentation than their intensity distribution alone. Traditional histograms ignore this spatial context. The proposed method addresses this by first transforming the image's intensity information into a "Spatial Context Energy Curve," where the value at each intensity level reflects the degree of spatial homogeneity among pixels at that level. This energy curve, being inherently context-aware, serves as a superior basis for thresholding. The method then employs a 3-D Otsu framework, which considers the energy value, the local neighborhood mean, and the local neighborhood median, to find optimal multilevel thresholds. This multi-pronged approach aims to maximize segmentation accuracy by leveraging rich spatial information at every stage.
4.2. Core Methodology In-depth (Layer by Layer)
The proposed algorithm can be broken down into two main stages: (1) calculating the energy curve, and (2) applying the 3-D Otsu criterion to find the optimal thresholds.
4.2.1. Stage 1: Energy Curve Calculation
The energy curve replaces the traditional histogram. It is calculated for an image of size with a maximum intensity level of .
Step 1: Define the Neighborhood System
The method first defines a spatial neighborhood system. The paper uses a second-order neighborhood, , for a pixel at position (x, y). This consists of the 8 pixels immediately surrounding the central pixel, as shown in the figure below (Figure 1 from the original paper).

Step 2: Create Binary Matrices for Each Intensity Level For each possible intensity level (where ), a temporary binary matrix of the same size as the image is created. The value of each element in this matrix is determined by the following rule:
- if the intensity of the pixel at
(x,y)in the original image, , is greater than (). - if . This process is repeated for every intensity level , creating such matrices.
Step 3: Calculate the Energy for Each Intensity Level The energy for each intensity level is computed using the following formula, presented as Equation (12) in the paper:
$ E_l = - \sum_{x=1}^{M} \sum_{y=1}^{N} \sum_{rs \in N_{xy}^{2}} b_{xy} \cdot b_{rs} + \sum_{x=1}^{M} \sum_{y=1}^{N} \sum_{rs \in N_{xy}^{2}} c_{xy} \cdot c_{rs} $
Let's break down this formula:
-
: The value (-1 or +1) of the pixel at
(x,y)in the binary matrix . -
: The value (-1 or +1) of a neighboring pixel within the neighborhood .
-
: A constant matrix where every element is 1.
-
First Term: . This term measures the spatial correlation.
- If a pixel and its neighbor have the same sign (e.g., both are +1 or both are -1), their product is +1. This indicates homogeneity.
- If they have different signs, their product is -1. This indicates a boundary or edge at intensity level . The summation calculates the total homogeneity. The negative sign in front means that the energy value is minimized at intensity levels that correspond to boundaries between regions. Therefore, the valleys of the energy curve represent the best candidate thresholds.
-
Second Term: . This is a constant value added to the energy. Since is always 1, this term is simply a constant offset added to ensure that the final energy value is always non-negative ().
After this process, we have a set of energy values , which form the energy curve. This curve is used as the basis for the Otsu algorithm instead of the histogram. Figure 2 from the paper visually compares histograms and energy curves for the "Lena" image.
The following figure (Figure 2 from the original paper) illustrates the difference between an energy curve and a histogram, and the resulting segmentation.
Original caption: Fig. 2. (a) Lena image. Energy curve of the (b) first frame, (c) second frame, and (d) third frame. (e) 3-D Otsu-based thresholded image at 5-level using energy curve. Histogram of the (f) first frame, (g) second frame, and (h) third frame. (i) 3-D Otsu-based thresholded image at 5-level using histogram concept.
4.2.2. Stage 2: Energy-Based 3-D Otsu for Multilevel Thresholding
Once the energy curve is computed, the 3-D Otsu algorithm is applied to find the optimal set of thresholds. This algorithm operates on three feature dimensions.
Step 1: Define the Three Feature Dimensions
For each pixel at position (x, y), three features are considered:
-
f(x, y): The energy value of the pixel, derived from the energy curve. -
g(x, y): The mean gray value of its neighborhood (the paper uses ). -
h(x, y): The median gray value of its neighborhood.The mean
g(x,y)and medianh(x,y)are calculated using the formulas from Equation (14): $ g(x, y) = \frac{1}{k^2} \sum_{i=-(k-1)/2}^{(k-1)/2} \sum_{j=-(k-1)/2}^{(k-1)/2} f(x+i, y+j) $ $ h(x, y) = \mathrm{med} \left{ f(x+i, y+j) : i = -\frac{k}{2}, \ldots, \frac{k}{2}, j = -\frac{k}{2}, \ldots, \frac{k}{2} \right} $
Step 2: Compute Probability Distributions The algorithm computes three separate 1-D probability distributions, one for each feature dimension. Let be the value of a feature. The probability of its occurrence is given by Equation (15): $ \left{ \begin{array}{ll} P_{E_i}^{(f)} = \frac{E_i}{N}, & P_{E_i}^{(f)} \ge 0 \ P_{E_j}^{(g)} = \frac{E_j}{N}, & P_{E_j}^{(g)} \ge 0 \ P_{E_k}^{(h)} = \frac{E_k}{N}, & P_{E_k}^{(h)} \ge 0 \end{array} \right. $ where is the total number of pixels.
Step 3: Maximize Between-Class Variance for Each Dimension Independently
The core of the 3-D Otsu method used here is to find the optimal thresholds for each of the three dimensions (f, g, h) independently. This is a crucial simplification that reduces the computational complexity significantly. The goal is to find thresholds that divide the pixels into classes.
The between-class variance for each dimension is given by Equation (16): $ \left{ \begin{array}{ll} \sigma_{(f)}^2 = \sum_{c=1}^{n+1} \omega_c^{(f)} \left( \mu_c^{(f)} - \mu_T^{(f)} \right)^2 \ \sigma_{(g)}^2 = \sum_{c=1}^{n+1} \omega_c^{(g)} \left( \mu_c^{(g)} - \mu_T^{(g)} \right)^2 \ \sigma_{(h)}^2 = \sum_{c=1}^{n+1} \omega_c^{(h)} \left( \mu_c^{(h)} - \mu_T^{(h)} \right)^2 \end{array} \right. $ where for each dimension (e.g., ):
-
is the probability of class .
-
is the mean value of class .
-
is the total mean value for that dimension.
These terms are calculated using Equations (17), (18), and (19) for each class defined by thresholds (e.g., to for the energy dimension).
The algorithm then finds the optimal set of thresholds for each dimension by maximizing its respective between-class variance, as shown in Equation (20): $ \left{ \begin{array}{ll} {r_1^, \dots, r_n^} = \arg\max_{0 \le r_1 < \dots < r_n \le L-1} { \sigma_{(f)}^2(r_1, \dots, r_n) } \ {s_1^, \dots, s_n^} = \arg\max_{0 \le s_1 < \dots < s_n \le L-1} { \sigma_{(g)}^2(s_1, \dots, s_n) } \ {t_1^, \dots, t_n^} = \arg\max_{0 \le t_1 < \dots < t_n \le L-1} { \sigma_{(h)}^2(t_1, \dots, t_n) } \end{array} \right. $ This results in three sets of optimal thresholds: one for energy , one for neighborhood mean , and one for neighborhood median .
Step 4: Combine Thresholds and Segment the Image The final set of thresholds is obtained by averaging the corresponding thresholds from the three dimensions: $ \mathrm{Th}_k = (r_k^* + s_k^* + t_k^*) / 3, \quad \text{for } k = 1, \dots, n $
These final thresholds are then applied to the original grayscale image to produce the segmented image according to the rule in Equation (13): $ I_{sg}(r, c) = \left{ \begin{array}{ll} I_G(r, c) & \text{if } I_G(r, c) \le \mathrm{th}1 \ \mathrm{th}{k-1} & \text{if } \mathrm{th}_{k-1} < I_G(r, c) \le \mathrm{th}_k, \quad k=2, \dots, n \ I_G(r, c) & \text{if } I_G(r, c) > \mathrm{th}_n \end{array} \right. $ This process is illustrated by the flowchart in Figure 3 of the paper.
The following figure (Figure 3 from the original paper) shows the flowchart of the proposed method.
Original caption: Fig. 3. Flowchart of proposed method.
5. Experimental Setup
5.1. Datasets
The authors used two standard benchmark datasets for their experiments to ensure the results are robust and generalizable.
-
Berkeley Segmentation Dataset and Benchmark (BSDS500): This is a widely-used and challenging dataset containing 500 natural images with human-annotated ground truth segmentations. Its diversity in content (animals, landscapes, people) makes it an excellent benchmark for evaluating segmentation algorithms.
-
Kodak Lossless True Color Image Suite (Kodim): This dataset consists of high-quality, uncompressed color images. It is often used in image processing research to test algorithms on clean, artifact-free data.
For detailed quantitative analysis in Tables II-VI, the authors selected ten diverse color images (shown in Figure 4) of size pixels from these datasets. For the broader evaluation using PRI, BDE, GCE, and VoI metrics, the full BSDS500 dataset was used.
The following figure (Figure 4 from the original paper) shows the ten test images used for detailed evaluation.
Original caption: Fig. 4. (a)(j) Original test images [23], [24].
5.2. Evaluation Metrics
The paper employs a comprehensive set of metrics to evaluate the quality of the segmented images from different perspectives.
-
Mean Error (ME):
- Conceptual Definition: ME measures the average absolute difference in intensity between the pixels of the original image and the segmented image. A lower ME value indicates that the segmented image is a closer representation of the original.
- Mathematical Formula: The paper's Table I presents the formula as . This appears to be a typo. The standard formula for Mean Absolute Error, which is likely intended, is: $ ME = \frac{1}{J \times K} \sum_{i=1}^{J} \sum_{j=1}^{K} |I(i,j) - I_s(i,j)| $
- Symbol Explanation:
J, K: The dimensions (height and width) of the image.I(i,j): The intensity of the pixel at position(i,j)in the original image.- : The intensity of the pixel at position
(i,j)in the segmented image.
-
Mean Square Error (MSE):
- Conceptual Definition: MSE calculates the average of the squares of the intensity differences between the original and segmented images. It penalizes larger errors more heavily than ME. A lower MSE is better.
- Mathematical Formula: $ MSE = \frac{1}{J \times K} \sum_{i=1}^{J} \sum_{j=1}^{K} (I(i,j) - I_s(i,j))^2 $
- Symbol Explanation: Same as for ME.
-
Peak Signal-to-Noise Ratio (PSNR):
- Conceptual Definition: PSNR measures the ratio between the maximum possible power of a signal (the image) and the power of corrupting noise that affects the fidelity of its representation. It is widely used to measure the quality of reconstructed or compressed images. A higher PSNR value indicates better quality.
- Mathematical Formula: $ PSNR = 20 \log_{10} \left( \frac{MAX_I}{\sqrt{MSE}} \right) $
- Symbol Explanation:
- : The maximum possible pixel value of the image (e.g., 255 for an 8-bit image).
MSE: The Mean Square Error calculated above.
-
Structural Similarity Index (SSIM):
- Conceptual Definition: SSIM is a perceptual metric that measures the similarity between two images by considering changes in structural information, luminance, and contrast. It is considered to be more aligned with human visual perception than MSE or PSNR. SSIM values range from -1 to 1, where 1 indicates perfect similarity.
- Mathematical Formula: $ SSIM(x, y) = \frac{(2\mu_x\mu_y + c_1)(2\sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)} $
- Symbol Explanation:
- : The mean of images and .
- : The variance of images and .
- : The covariance of and .
- : Small constants to stabilize the division.
-
Feature Similarity Index (FSIM):
- Conceptual Definition: FSIM evaluates image quality based on the premise that the human visual system understands an image mainly according to its low-level features. It measures similarity using phase congruency (a measure of edge and corner significance) and gradient magnitude. Higher FSIM values (closer to 1) indicate better feature preservation.
- Mathematical Formula: The paper provides a simplified version in Table I. The general form is: $ FSIM = \frac{\sum_{x \in \Omega} S_L(x) \cdot PC_m(x)}{\sum_{x \in \Omega} PC_m(x)} $
- Symbol Explanation:
- : The entire image domain.
- : A similarity score at pixel combining phase congruency and gradient magnitude.
- : The maximum phase congruency value at pixel , used as a weighting factor.
-
Entropy:
- Conceptual Definition: In image processing, entropy is a statistical measure of randomness that can be used to characterize the texture or information content of an image. A higher entropy value in the segmented image suggests that more information from the original image has been preserved.
- Mathematical Formula: $ H = - \sum_{i=0}^{L-1} p(i) \log_2 p(i) $
- Symbol Explanation:
L-1: The maximum intensity level.p(i): The probability of occurrence of intensity level .
-
Other Metrics (PRI, VoI, GCE, BDE): For the larger dataset evaluation, the paper also uses Probability Rand Index (PRI, higher is better), Variation of Information (VoI, lower is better), Global Consistency Error (GCE, lower is better), and Boundary Displacement Error (BDE, lower is better). These are standard metrics for comparing a computed segmentation against a ground-truth segmentation.
5.3. Baselines
The proposed method, 3D Otsu-Energy, was compared against a comprehensive set of five state-of-the-art and foundational methods to demonstrate its superiority.
-
Histogram-based methods:
1-D Otsu: The classic algorithm using only the intensity histogram.2-D Otsu: The extension using a 2D histogram of intensity and neighborhood mean.3-D Otsu: The advanced version using a 3D histogram of intensity, mean, and median.
-
Energy curve-based methods: 4.
1-D Otsu-Energy: A baseline created by applying the 1-D Otsu criterion to the energy curve. 5.2-D Otsu-Energy: A baseline created by applying the 2-D Otsu criterion to the energy curve and neighborhood mean.This selection of baselines allows for a thorough analysis, isolating the benefits of using the `energy curve` (by comparing histogram vs. energy versions) and the benefits of using a higher-dimensional framework (by comparing 1D vs. 2D vs. 3D versions).
6. Results & Analysis
6.1. Core Results Analysis
The paper presents a detailed quantitative and qualitative analysis of the experimental results. The proposed 3D Otsu-Energy method consistently outperforms the five baseline methods across various metrics and threshold levels.
6.1.1. Quantitative Analysis
The results are presented in Tables III, IV, V, and VI for the ten test images, and in Tables VII-X for the full BSDS500 dataset.
-
Information Preservation (Entropy) and Error (ME): As shown in Table III, the proposed
3D Otsu-Energymethod consistently yields the highest Entropy and the lowest Mean Error (ME) for almost all test images and at all threshold levels (L=2, 3, 5, 8). For example, for Image 1 at L=8, the proposed method achieves an Entropy of 4.3062 (compared to 4.0323 for 3D Otsu and 3.6391 for 2D Otsu-Energy) and an ME of 0.0352 (compared to 0.0577 for 3D Otsu and 0.0520 for 2D Otsu-Energy). This indicates superior information preservation and accuracy. -
Image Fidelity (MSE and PSNR): Table IV demonstrates that
3D Otsu-Energyachieves the lowest MSE and consequently the highest PSNR. A higher PSNR value signifies that the segmented image is a higher-fidelity representation of the original. For Image 2 at L=8, the PSNR for the proposed method is 27.3898 dB, surpassing all other methods, including the next best (2D Otsu-Energy at 27.1142 dB). -
Perceptual Quality (SSIM and FSIM): Table V, which reports on metrics more aligned with human perception, further solidifies the superiority of the proposed method. It achieves the highest SSIM and FSIM values in nearly every case. For Image 3 at L=8, the FSIM is 0.9246, a notable improvement over the 0.8874 from histogram-based 3D Otsu. This means the structural and feature-level details are better preserved.
-
Computational Time (CPU Time): Table VI reveals the primary trade-off of the proposed method. The energy-based methods are significantly slower than the histogram-based ones due to the overhead of calculating the energy curve. Furthermore, the
3D Otsu-Energymethod is the most computationally intensive of all. For example, for Image 1 at L=5, it takes 237.98 seconds, whereas the histogram-based 3D Otsu takes only 12.81 seconds. This highlights a trade-off between segmentation quality and computational efficiency. -
Performance on BSDS500 Dataset: The results on the larger dataset (Tables VII-X) confirm these findings. The
3D Otsu-Energymethod achieves the best average scores across all metrics: highest PRI (0.6932 at L=8) and lowest BDE (8.056 at L=8), GCE (0.4592 at L=8), and VoI (3.4789 at L=8). This demonstrates its robustness and generalizability.
6.1.2. Qualitative Analysis
The paper provides visual results in Figures 5, 6, 7, and a summary comparison in Figure 8.
The following figure (Figure 8 from the original paper) provides a direct visual comparison of the segmentation results from all six methods at a 5-level threshold.
Original caption: Fig. 8. Visual comparison of segmentation results at 5-level of thresholding for 1-D Otsu, 2-D Otsu, 3-D Otsu, 1-D Otsu-Energy, 2-D Otsu-Energy, and 3-D Otsu-Energy methods, respectively.
As seen in Figure 8, the results from the proposed 3D Otsu-Energy method (the rightmost column) are visually superior.
-
Clarity and Detail: In images like the bird and the dog, the proposed method provides a much clearer distinction between the subject and the background. The regions are well-delimited, and fine details are better preserved.
-
Region Homogeneity: The histogram-based methods, particularly 1-D Otsu, often produce noisy or fragmented regions. In contrast, the
3D Otsu-Energyresults show more uniform and consistent regions, thanks to the incorporation of spatial context which helps to group neighboring pixels correctly. -
Visual Effect: The segmented images have a more pleasant visual effect with better gray gradation, making the different segmented levels easier to discriminate.
In summary, both the quantitative data and the visual evidence strongly support the claim that the proposed
Spatial Context Energy Curve-Based 3-D Otsu Algorithmprovides a more accurate and robust solution for multilevel image segmentation compared to existing methods.
6.2. Data Presentation (Tables)
The following are the results from Table II of the original paper:
| Test Images | L | Histogram and Energy based Threshold Values | |||||
|---|---|---|---|---|---|---|---|
| 1D Otsu [5, 7, 9, 12] | 2D Otsu [14-15, 17] | 3D Otsu [22] | 1D Otsu-Energy | 2D Otsu-Energy | 3D Otsu-Energy | ||
| 1 | 2 | 88 137 | 119 179 | 85 146 | 68 142 | 127 169 | 141 194 |
| 3 | 83 108 152 | 61 80 152 | 75 127 216 | 39 82 133 | 109 115 164 | 91 114 195 | |
| 5 | 56 89 108 152 166 | 56 73 132 176 203 | 12 73 113 155 181 | 33 63 98 145 218 | 38 72 85 135 163 | 46 79 157 207 214 | |
| 8 | 7 86 113 123 152 173 245 245 | 58 89 130 143 156 165 184 204 | 46 61 100 116 136 174 194 219 | 24 36 62 92 127 157 178 182 | 27 47 50 85 114 153 160 177 | 30 85 96 98 147 163 187 207 | |
| 2 | 2 | 65 139 | 60 140 | 37 107 | 84 153 | 96 154 | 116 208 |
| 3 | 50 109 161 | 51 105 151 | 63 163 188 | 62 118 174 | 61 119 177 | 95 116 198 | |
| 5 | 18 54 100 128 190 | 14 29 124 186 248 | 28 69 84 102 114 | 38 58 96 136 175 | 78 98 105 143 180 | 27 59 102 143 186 | |
| 8 | 26 52 52 75 79 124 178 216 | 40 69 98 150 185 223 239 245 | 9 26 48 59 125 148 171 210 | 30 59 82 106 131 166 178 186 | 25 32 51 59 78 98 123 163 | 33 65 88 142 152 186 197 232 | |
| 3 | 2 | 74 129 | 57 141 | 81 139 | 63 122 | 61 150 | 50 106 |
| 3 | 55 80 115 | 43 89 193 | 99 130 213 | 47 78 151 | 51 103 151 | 91 165 205 | |
| 5 | 14 48 73 109 135 | 41 82 128 173 199 | 42 63 107 147 196 | 30 53 82 110 183 | 22 57 95 139 170 | 69 125 164 182 218 | |
| 8 | 39 60 69 84 99 125 127 147 | 42 86 96 99 132 164 185 197 | 42 66 95 118 130 143 186 198 | 20 36 54 80 96 107 128 192 | 39 66 70 75 80 112 126 170 | 44 56 72 99 129 141 168 236 | |
| 4 | 2 | 67 125 | 101 227 | 77 183 | 76 136 | 85 179 | 97 204 |
| 3 | 58 94 128 | 60 146 228 | 56 114 218 | 73 138 164 | 78 99 164 | 60 107 193 | |
| 5 | 59 92 138 177 192 | 44 81 152 169 189 | 53 101 113 156 222 | 62 92 103 143 182 | 15 47 106 141 161 | 23 56 89 155 219 | |
| 8 | 16 49 71 98 129 134 170 242 | 36 56 112 122 162 183 195 214 | 10 73 101 124 143 173 222 233 | 46 55 79 87 96 118 164 194 | 25 46 72 73 115 145 162 166 | 13 42 70 109 123 136 175 178 | |
| 5 | 2 | 154 205 | 89 170 | 110 152 | 120 171 | 79 153 | 130 198 |
| 3 | 93 143 206 | 21 39 134 | 86 143 182 | 79 147 185 | 50 87 147 | 219 | |
| 5 | 86 145 155 195 220 | 47 67 136 165 206 | 34 96 112 154 187 | 79 117 122 152 199 | 19 41 79 120 171 | 33 66 101 156 234 | |
| 8 | 55 85 115 149 186 191 223 239 | 47 104 137 149 186 206 221 234 | 30 57 85 94 119 149 179 199 | 21 55 87 134 145 173 190 195 | 10 38 57 80 111 125 149 162 | 49 109 121 158 176 193 223 239 | |
| 6 | 2 | 72 164 | 100 136 | 54 162 | 63 142 | 117 179 | 106 230 |
| 3 | 41 75 118 | 48 105 140 | 32 66 154 | 44 83 172 | 32 105 157 | 75 117 195 | |
| 5 | 19 46 67 108 218 | 12 83 112 137 183 | 56 144 169 215 239 | 13 36 84 128 179 | 37 98 132 161 187 | 43 59 144 179 221 | |
| 8 | 15 34 45 59 99 104 113 189 | 37 80 86 102 162 170 194 208 | 21 53 91 111 175 185 208 211 | 6 29 51 82 107 154 162 187 | 59 88 95 118 126 148 161 197 | 9 50 81 88 97 152 194 208 | |
| 7 | 2 | 63 149 | 69 183 | 84 144 | 69 148 | 61 141 | 119 199 |
| 3 | 53 96 174 | 82 138 198 | 233 | 62 123 168 | 8 111 154 | 106 188 210 | |
| 5 | 27 70 129 165 199 | 58 83 174 188 236 | 31 53 148 188 234 | 34 60 96 142 170 | 28 82 104 146 168 | 29 37 108 172 202 | |
| 8 | 36 48 79 97 148 179 214 254 | 33 104 116 133 150 179 209 211 | 34 65 101 142 189 217 238 249 | 22 54 76 103 120 137 155 199 | 29 40 68 86 97 123 133 177 | 62 78 84 133 138 155 187 214 | |
| 8 | 2 | 69 120 | 52 107 | 97 173 | 73 138 | 88 153 | 56 187 |
| 3 | 47 102 145 | 49 115 188 | 58 148 187 | 45 98 177 | 30 82 153 | 64 119 195 | |
| 5 | 46 64 89 139 196 | 27 42 104 166 201 | 55 96 104 152 200 | 33 47 82 117 168 | 34 59 96 155 199 | ||
| 8 | 51 64 128 175 235 | 18 61 105 136 164 176 200 248 | 32 44 69 104 113 151 166 184 | 28 74 114 122 152 171 186 225 | 11 35 57 64 95 129 135 172 | 34 59 101 137 156 178 198 224 | |
| 9 | 2 | 122 172 | 106 175 | 105 212 | 103 178 | 95 167 | 72 182 |
| 3 | 96 149 186 | 71 139 206 | 76 151 237 | 72 135 186 | 79 112 168 | 92 177 215 | |
| 5 | 82 123 154 174 233 | 27 75 116 136 177 | 60 82 162 201 228 | 41 92 115 161 196 | 24 32 80 94 119 122 160 195 | 64 102 142 181 216 | |
| 8 | 23 86 129 150 171 205 205 224 | 10 16 45 57 107 133 162 232 | 35 52 83 101 134 170 207 253 | 27 34 92 148 172 193 208 224 | 21 50 89 105 140 156 180 192 | 25 72 87 113 145 185 214 238 | |
| 10 | 2 | 126 176 | 144 179 | 125 199 | 118 185 | 131 195 | 116 165 |
| 3 | 99 157 195 | 89 170 213 | 42 154 227 | 62 113 176 | 92 148 174 | 73 123 170 | |
| 5 | 90 140 181 196 199 | 79 119 123 138 173 | 49 138 161 196 213 | 64 122 141 172 184 | 60 98 130 154 178 | 23 101 138 170 228 | |
| 8 | 56 124 147 164 183 190 199 230 | 25 95 143 168 185 203 217 247 | 20 80 116 142 150 170 235 245 | 29 55 92 127 128 162 182 198 | 6 54 76 118 154 164 178 186 | 25 36 54 91 144 160 183 201 | |
... [Note: Due to length constraints, Tables III-X are summarized in the analysis above. A full transcription would exceed typical response limits but follows the same format.]
6.3. Ablation Studies / Parameter Analysis
The paper does not contain a formal ablation study in a separate section. However, the entire experimental design acts as a form of ablation. By comparing 1D/2D/3D Otsu (histogram-based) with 1D/2D/3D Otsu-Energy (energy-based), the authors effectively isolate and demonstrate the impact of replacing the histogram with the energy curve. Similarly, by comparing the 1D, 2D, and 3D versions within each category, they demonstrate the incremental benefit of adding more spatial features (neighborhood mean and median). The results consistently show that both the energy curve and the higher-dimensional framework contribute to the final performance, with the combination of the two (3D Otsu-Energy) yielding the best results.
7. Conclusion & Reflections
7.1. Conclusion Summary
The paper successfully proposes and validates a novel multilevel thresholding method for color image segmentation called the Spatial Context Energy Curve-Based 3-D Otsu Algorithm. The key innovation is the replacement of the conventional intensity histogram with a spatial context energy curve, which captures crucial information about the spatial relationships between pixels. By combining this context-rich representation with a 3-D Otsu framework, the algorithm achieves segmentation results that are demonstrably superior to traditional histogram-based methods and lower-dimensional energy-based approaches. The experimental outcomes, supported by a comprehensive set of quantitative metrics (ME, MSE, PSNR, SSIM, FSIM, Entropy) and qualitative visual assessments, confirm that the proposed method produces higher quality segmented images with better-preserved information, structure, and features.
7.2. Limitations & Future Work
- Limitations:
- Computational Complexity: The most significant limitation, clearly shown in the experimental results (Table VI), is the high computational cost. The process of calculating the energy curve for every intensity level is intensive, making the proposed method much slower than its histogram-based counterparts. This could be a barrier to its use in real-time applications.
- Future Work:
- The authors suggest that future research could focus on designing more efficient 3-D Otsu algorithms to mitigate the high computational time.
- They also propose exploring the use of the
energy curveconcept with other objective functions besides Otsu's method for color image multilevel thresholding.
7.3. Personal Insights & Critique
-
Strengths:
- Novelty and Intuition: The core idea of replacing the histogram is elegant and powerful. It directly targets a fundamental weakness of many classic image processing techniques—the disregard for spatial context. The intuition that a representation based on spatial correlation is better for a spatial task like segmentation is very strong.
- Thoroughness of Evaluation: The experimental design is a major strength of this paper. The authors did not just compare their method to one or two baselines; they compared it against a full spectrum of related techniques, allowing them to precisely demonstrate the benefits of both the energy curve and the 3-D framework.
- Clear and Significant Results: The performance gains are not marginal. The consistent and often large improvements across multiple metrics provide compelling evidence for the method's effectiveness.
-
Potential Issues and Areas for Improvement:
- Efficiency is a Major Hurdle: While the quality of segmentation is excellent, the reported CPU times (often several minutes per image) make the method impractical for many applications. Future work must address this, perhaps through GPU acceleration, algorithmic optimizations for energy curve calculation, or by finding a less costly proxy for spatial energy.
- Clarity on Complexity Analysis: The paper briefly mentions that the 3-D Otsu function used reduces time complexity from to . This claim is not fully elaborated. The independent optimization of the three dimensions reduces the search space significantly compared to a true 3D search, but the overall complexity is still heavily dependent on the multilevel search, which is not . Given the very high runtimes in Table VI, a more detailed complexity analysis would have been beneficial.
- Generalization of the Energy Function: The specific energy function used is based on a Hopfield network model. It would be interesting to investigate whether other methods of quantifying spatial correlation could yield similar or better results, and how sensitive the algorithm is to the choice of the neighborhood size () and the energy function itself.
Similar papers
Recommended via semantic vector search.