Abstract

2760 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 5, MAY 2021 Spatial Context Energy Curve-Based Multilevel 3-D Otsu Algorithm for Image Segmentation Ashish Kumar Bhandari , Anurag Singh, and Immadisetty Vinod Kumar Abstract —While yielding satisfactory segmentation results for images with low SNR and poor contrast, one-dimensional (1-D) and two-dimensional (2-D) Otsu’s thresholding methods have the downside of high computational complexity. So far, three- dimensional (3-D) Otsu method has been based on histogram, which has only probability distribution of pixels as an object of interest. Histogram-based segmentation methods do not consider the contextual information which is significant to enrich the qual- ity of segmented image. In this paper, a context-…

1. Bibliographic Information

1.1. Title

Spatial Context Energy Curve-Based Multilevel 3-D Otsu Algorithm for Image Segmentation

1.2. Authors

The authors of this paper are Ashish Kumar Bhandari, Anurag Singh, and Immadisetty Vinod Kumar.

Ashish Kumar Bhandari: At the time of publication, he was an Assistant Professor at the National Institute of Technology Patna, India. His Ph.D. is in digital image processing, and his research focuses on image enhancement, segmentation, denoising, and soft computing techniques.
Anurag Singh: He was an Assistant Professor at the International Institute of Information Technology Naya Raipur, India. His Ph.D. is in biomedical signal processing, with research interests in biomedical signal and image processing.
Immadisetty Vinod Kumar: He received his B.Tech. degree from the National Institute of Technology Patna, India. His research interests include image segmentation using multilevel thresholding.

1.3. Journal/Conference

The paper was published in IEEE Transactions on Systems, Man, and Cybernetics: Systems. This is a prestigious, high-impact journal published by the Institute of Electrical and Electronics Engineers (IEEE). It focuses on systems engineering, including theory, analysis, and applications. Publication in this journal indicates that the work has undergone a rigorous peer-review process and is considered a significant contribution to the field.

1.4. Publication Year

The paper was formally published in Volume 51, Issue 5, in May 2021. The metadata indicates an initial online publication date of June 4, 2019.

1.5. Abstract

The abstract introduces the problem that while one-dimensional (1-D) and two-dimensional (2-D) Otsu's thresholding methods provide good segmentation for noisy and low-contrast images, they suffer from high computational complexity. It notes that existing three-dimensional (3-D) Otsu methods are histogram-based and thus ignore spatial context, which is crucial for high-quality segmentation. The authors propose a novel context-based 3-D Otsu algorithm that utilizes a "spatial context energy curve" instead of a histogram. This approach considers pixel intensity, spatial information, and histogram properties simultaneously. The proposed method is extensively evaluated and compared with histogram-based 1-D, 2-D, 3-D Otsu, and energy-based 1-D, 2-D Otsu methods. The experimental results demonstrate the superiority of the proposed energy-based 3-D Otsu algorithm, showing improved performance across multiple metrics like Mean Error (ME), Mean Square Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Feature Similarity Index (FSIM), Structure Similarity Index (SSIM), and Entropy.

1.6. Original Source Link

The provided link /files/papers/692bb5a74114e99a4cde875b/paper.pdf points to a local file. The paper is officially published and can be accessed through the IEEE Xplore digital library.

2. Executive Summary

2.1. Background & Motivation

Core Problem: The paper addresses the challenge of image segmentation, a fundamental process in digital image processing where an image is partitioned into multiple meaningful regions or segments. A popular technique for this is thresholding, which classifies pixels based on their intensity values.
Specific Challenges:
1. Ignoring Spatial Context: Traditional thresholding methods, including the widely used Otsu's method, rely on the image's histogram. A histogram only records the frequency of pixel intensities, completely disregarding the spatial relationships between pixels. This makes them vulnerable to noise and can lead to inaccurate segmentation, as they treat a pixel in a smooth region the same as a pixel on an edge if they have the same intensity.
2. Computational Complexity: As thresholding is extended to multiple levels (multilevel thresholding) to segment an image into more than two regions, the search space for optimal thresholds grows exponentially. This makes methods like 1-D and 2-D Otsu computationally expensive. While 3-D Otsu methods were introduced to incorporate more spatial information (like neighborhood mean and median), they often exacerbate the complexity problem.
Paper's Entry Point: The authors identify the histogram as the primary weakness. Their innovative idea is to replace the intensity histogram with a "Spatial Context Energy Curve." This "energy curve" is not based on pixel counts but on the spatial correlation between a pixel and its neighbors. By applying the powerful 3-D Otsu framework to this richer, context-aware representation, the paper aims to achieve more accurate and robust image segmentation, especially for complex color images.

2.2. Main Contributions / Findings

Main Contributions:
1. A Novel Algorithm: The primary contribution is the proposal of a new multilevel segmentation algorithm, the "Spatial Context Energy Curve-Based Multilevel 3-D Otsu Algorithm" (3D Otsu-Energy).
2. Integration of Energy Curve: This is the first work to integrate the concept of a spatial energy curve with the 3-D Otsu objective function. This novel combination leverages the spatial information encoded in the energy curve to guide the powerful multi-feature thresholding of 3-D Otsu.
3. Comprehensive Evaluation: The paper provides an exhaustive experimental comparison of the proposed method against a suite of six baseline algorithms: histogram-based 1-D, 2-D, and 3-D Otsu, as well as energy curve-based 1-D and 2-D Otsu.
Key Findings:
- The proposed 3D Otsu-Energy algorithm is quantitatively superior to all compared methods. It consistently achieves better scores across a wide range of image quality metrics, including PSNR, SSIM, FSIM, and Entropy, indicating that the segmented images are more faithful to the original images in terms of structure, features, and information content.
- The method is also qualitatively superior, producing segmented images that are visually more appealing, with better-defined regions and clearer separation between objects and background.
- The use of the energy curve effectively addresses the limitation of histogram-based methods by incorporating spatial context, leading to more robust and accurate segmentation results.

3.1. Foundational Concepts

Image Segmentation: This is the process of partitioning a digital image into multiple segments (sets of pixels, also known as super-pixels). The goal of segmentation is to simplify or change the representation of an image into something that is more meaningful and easier to analyze. For example, in a medical image, segmentation could be used to identify and isolate a tumor from the surrounding healthy tissue.
Thresholding: A simple yet powerful method of image segmentation. In its most basic form, bi-level thresholding, a single threshold value is chosen. All pixels with intensity values above the threshold are assigned to one class (e.g., white), and all pixels below are assigned to another class (e.g., black). This is effective for separating a foreground object from a background. Multilevel thresholding extends this by using multiple thresholds to partition the image into several distinct classes or regions.
Image Histogram: A histogram is a bar graph that represents the frequency distribution of intensity values in an image. The x-axis represents the intensity levels (e.g., 0 to 255 for an 8-bit grayscale image), and the y-axis represents the number of pixels at each intensity level. Histograms provide a global description of an image's appearance but contain no information about the spatial location of pixels.
Otsu's Method: A classic and highly effective automatic thresholding technique proposed by Nobuyuki Otsu in 1979. Its core idea is to find the threshold that best separates the pixels into two classes (foreground and background). It does this by searching for the threshold that maximizes the between-class variance. Maximizing the variance between the two classes is equivalent to minimizing the variance within each class. This leads to classes that are as distinct from each other and as internally homogeneous as possible. The between-class variance, $\sigma_B^2$ $σ_{B}^{2}$ , for a single threshold $t$ $t$ is calculated as: $ \sigma_B^2(t) = \omega_0(t)(\mu_0(t) - \mu_T)^2 + \omega_1(t)(\mu_1(t) - \mu_T)^2 $ where:
- $\omega_0, \omega_1$ are the probabilities of the two classes (background and foreground).
- $\mu_0, \mu_1$ are the mean intensity levels of the two classes.
- $\mu_T$ is the total mean intensity of the entire image. Otsu's method exhaustively checks every possible threshold $t$ and chooses the one that gives the maximum $\sigma_B^2(t)$ .

3.2. Previous Works

1-D Otsu: This is the original method described above. It operates on a 1-D histogram of pixel gray levels. While simple and fast, it is sensitive to noise and intensity inhomogeneities, as it ignores all spatial information.
2-D Otsu: To improve robustness, researchers proposed 2-D Otsu. This method uses a 2-D histogram. Instead of just considering the pixel's intensity, it also considers a second feature, which is typically the average intensity of the pixel's local neighborhood. By doing so, it incorporates some spatial context. Pixels on an edge will have a very different neighborhood average than pixels in a smooth region, even if their own intensity is the same. This makes 2-D Otsu more resilient to noise.
3-D Otsu: This is a further extension that uses a 3-D histogram. It adds a third feature to the pixel intensity and neighborhood mean. This third feature is often the median intensity of the pixel's neighborhood. The median is less sensitive to outliers (noise) than the mean, adding another layer of robustness. However, building and searching a 3-D histogram is computationally very expensive, with complexity often cited as $O(L^3)$ for an exhaustive search, where $L$ is the number of intensity levels. The paper mentions that faster versions have been developed to reduce this complexity.
Energy Curve: The concept of an energy curve for segmentation was proposed in prior work [19] as an alternative to the histogram. It is derived from a model inspired by Hopfield neural networks. The "energy" of an image at a certain intensity level is calculated based on the spatial correlation between neighboring pixels. The resulting curve tends to be smoother than a histogram, and its valleys correspond to optimal threshold values that separate distinct regions in the image. This approach inherently captures spatial contextual information.

3.3. Technological Evolution

The evolution of Otsu-based thresholding reflects a clear trend toward incorporating more spatial information to improve segmentation accuracy and robustness:

1-D Otsu (Pixel Intensity Only): The starting point. Fast but ignores context.
2-D Otsu (Intensity + Neighborhood Mean): The first step into spatial context. More robust to noise but more complex.
3-D Otsu (Intensity + Mean + Median): A more advanced incorporation of spatial context, offering even greater robustness. However, this comes at a significant computational cost.
Energy Curve-Based Otsu (This Paper): This work represents a paradigm shift. Instead of just adding more spatial features to the histogram-based framework, it replaces the histogram itself with the energy curve. This allows it to be combined with the most advanced framework (3-D Otsu) to create a method that leverages multiple, rich sources of spatial information simultaneously.

3.4. Differentiation Analysis

The core difference between this paper's approach and previous work is the fundamental data representation used for thresholding.

Previous Otsu Methods (1D, 2D, 3D): All are fundamentally histogram-based. They operate on a data structure that counts pixel occurrences. 2D and 3D Otsu try to embed spatial context by adding neighborhood statistics as extra dimensions to this histogram.
Proposed Method (3D Otsu-Energy): This method is energy-curve-based. It completely discards the intensity histogram. Instead, it computes a new 1-D representation of the image (the energy curve) where each point's value is determined by local spatial correlations. The Otsu criterion is then applied to this energy curve and its derivatives (mean and median filtered versions). This is a more profound integration of spatial context, as the very data being thresholded is inherently spatial in nature, rather than being a simple frequency count.

4. Methodology

4.1. Principles

The central principle of the proposed method is that the spatial arrangement and correlation of pixels provide more valuable information for segmentation than their intensity distribution alone. Traditional histograms ignore this spatial context. The proposed method addresses this by first transforming the image's intensity information into a "Spatial Context Energy Curve," where the value at each intensity level reflects the degree of spatial homogeneity among pixels at that level. This energy curve, being inherently context-aware, serves as a superior basis for thresholding. The method then employs a 3-D Otsu framework, which considers the energy value, the local neighborhood mean, and the local neighborhood median, to find optimal multilevel thresholds. This multi-pronged approach aims to maximize segmentation accuracy by leveraging rich spatial information at every stage.

4.2. Core Methodology In-depth (Layer by Layer)

The proposed algorithm can be broken down into two main stages: (1) calculating the energy curve, and (2) applying the 3-D Otsu criterion to find the optimal thresholds.

4.2.1. Stage 1: Energy Curve Calculation

The energy curve replaces the traditional histogram. It is calculated for an image $I$ of size $M \times N$ with a maximum intensity level of $L$ .

Step 1: Define the Neighborhood System The method first defines a spatial neighborhood system. The paper uses a second-order neighborhood, $N_{xy}^2$ , for a pixel at position (x, y). This consists of the 8 pixels immediately surrounding the central pixel, as shown in the figure below (Figure 1 from the original paper).

Fig. 2. (a) Lena image. Energy curve of the (b) first frame, (c) second frame, and (d) third frame. (e) 3-D Otsu-based thresholded image at 5-level using energy curve. Histogram of the (f) first frame, (g) second frame, and (h) third frame. (i) 3-D Otsu-based thresholded image at 5-level using histogram concept.

Step 2: Create Binary Matrices for Each Intensity Level For each possible intensity level $l$ (where $0 \le l \le L$ ), a temporary binary matrix $B_l$ of the same size as the image is created. The value of each element $b_{x,y}$ in this matrix is determined by the following rule:

$b_{x,y} = 1$ if the intensity of the pixel at (x,y) in the original image, $I_{x,y}$ , is greater than $l$ ( $I_{x,y} > l$ ).
$b_{x,y} = -1$ if $I_{x,y} \le l$ . This process is repeated for every intensity level $l$ , creating $L+1$ such matrices.

Step 3: Calculate the Energy for Each Intensity Level The energy $E_l$ for each intensity level $l$ is computed using the following formula, presented as Equation (12) in the paper:

$ E_l = - \sum_{x=1}^{M} \sum_{y=1}^{N} \sum_{rs \in N_{xy}^{2}} b_{xy} \cdot b_{rs} + \sum_{x=1}^{M} \sum_{y=1}^{N} \sum_{rs \in N_{xy}^{2}} c_{xy} \cdot c_{rs} $

Let's break down this formula:

$b_{xy}$ : The value (-1 or +1) of the pixel at (x,y) in the binary matrix $B_l$ .
$b_{rs}$ : The value (-1 or +1) of a neighboring pixel within the neighborhood $N_{xy}^2$ .
$c_{xy}$ : A constant matrix where every element is 1.
First Term: $- \sum \sum \sum b_{xy} \cdot b_{rs}$ . This term measures the spatial correlation.
- If a pixel $b_{xy}$ and its neighbor $b_{rs}$ have the same sign (e.g., both are +1 or both are -1), their product $b_{xy} \cdot b_{rs}$ is +1. This indicates homogeneity.
- If they have different signs, their product is -1. This indicates a boundary or edge at intensity level $l$ . The summation calculates the total homogeneity. The negative sign in front means that the energy value is minimized at intensity levels that correspond to boundaries between regions. Therefore, the valleys of the energy curve represent the best candidate thresholds.
Second Term: $\sum \sum \sum c_{xy} \cdot c_{rs}$ . This is a constant value added to the energy. Since $c_{xy}$ is always 1, this term is simply a constant offset added to ensure that the final energy value $E_l$ is always non-negative ( $E_l \ge 0$ ).

After this process, we have a set of energy values $\{E_0, E_1, ..., E_L\}$ , which form the energy curve. This curve is used as the basis for the Otsu algorithm instead of the histogram. Figure 2 from the paper visually compares histograms and energy curves for the "Lena" image.

The following figure (Figure 2 from the original paper) illustrates the difference between an energy curve and a histogram, and the resulting segmentation.

Original caption: Fig. 2. (a) Lena image. Energy curve of the (b) first frame, (c) second frame, and (d) third frame. (e) 3-D Otsu-based thresholded image at 5-level using energy curve. Histogram of the (f) first frame, (g) second frame, and (h) third frame. (i) 3-D Otsu-based thresholded image at 5-level using histogram concept.

4.2.2. Stage 2: Energy-Based 3-D Otsu for Multilevel Thresholding

Once the energy curve is computed, the 3-D Otsu algorithm is applied to find the optimal set of $n$ thresholds. This algorithm operates on three feature dimensions.

Step 1: Define the Three Feature Dimensions For each pixel at position (x, y), three features are considered:

f(x, y): The energy value of the pixel, derived from the energy curve.
g(x, y): The mean gray value of its $k \times k$ neighborhood (the paper uses $k=3$ ).
h(x, y): The median gray value of its $k \times k$ neighborhood.

The mean g(x,y) and median h(x,y) are calculated using the formulas from Equation (14): $ g(x, y) = \frac{1}{k^2} \sum_{i=-(k-1)/2}^{(k-1)/2} \sum_{j=-(k-1)/2}^{(k-1)/2} f(x+i, y+j) $ $ h(x, y) = \mathrm{med} \left{ f(x+i, y+j) : i = -\frac{k}{2}, \ldots, \frac{k}{2}, j = -\frac{k}{2}, \ldots, \frac{k}{2} \right} $

Step 2: Compute Probability Distributions The algorithm computes three separate 1-D probability distributions, one for each feature dimension. Let $E_i$ be the value of a feature. The probability of its occurrence is given by Equation (15): $ \left{ \begin{array}{ll} P_{E_i}^{(f)} = \frac{E_i}{N}, & P_{E_i}^{(f)} \ge 0 \ P_{E_j}^{(g)} = \frac{E_j}{N}, & P_{E_j}^{(g)} \ge 0 \ P_{E_k}^{(h)} = \frac{E_k}{N}, & P_{E_k}^{(h)} \ge 0 \end{array} \right. $ where $N$ is the total number of pixels.

Step 3: Maximize Between-Class Variance for Each Dimension Independently The core of the 3-D Otsu method used here is to find the optimal thresholds for each of the three dimensions (f, g, h) independently. This is a crucial simplification that reduces the computational complexity significantly. The goal is to find $n$ thresholds that divide the pixels into $n+1$ classes.

The between-class variance for each dimension is given by Equation (16): $ \left{ \begin{array}{ll} \sigma_{(f)}^2 = \sum_{c=1}^{n+1} \omega_c^{(f)} \left( \mu_c^{(f)} - \mu_T^{(f)} \right)^2 \ \sigma_{(g)}^2 = \sum_{c=1}^{n+1} \omega_c^{(g)} \left( \mu_c^{(g)} - \mu_T^{(g)} \right)^2 \ \sigma_{(h)}^2 = \sum_{c=1}^{n+1} \omega_c^{(h)} \left( \mu_c^{(h)} - \mu_T^{(h)} \right)^2 \end{array} \right. $ where for each dimension (e.g., $f$ ):

$\omega_c^{(f)}$ is the probability of class $c$ .
$\mu_c^{(f)}$ is the mean value of class $c$ .
$\mu_T^{(f)}$ is the total mean value for that dimension.

These terms are calculated using Equations (17), (18), and (19) for each class $c$ defined by thresholds (e.g., $r_{c-1}$ to $r_c$ for the energy dimension).

The algorithm then finds the optimal set of thresholds for each dimension by maximizing its respective between-class variance, as shown in Equation (20): $ \left{ \begin{array}{ll} {r_1^, \dots, r_n^} = \arg\max_{0 \le r_1 < \dots < r_n \le L-1} { \sigma_{(f)}^2(r_1, \dots, r_n) } \ {s_1^, \dots, s_n^} = \arg\max_{0 \le s_1 < \dots < s_n \le L-1} { \sigma_{(g)}^2(s_1, \dots, s_n) } \ {t_1^, \dots, t_n^} = \arg\max_{0 \le t_1 < \dots < t_n \le L-1} { \sigma_{(h)}^2(t_1, \dots, t_n) } \end{array} \right. $ This results in three sets of optimal thresholds: one for energy $\{r^*\}$ , one for neighborhood mean $\{s^*\}$ , and one for neighborhood median $\{t^*\}$ .

Step 4: Combine Thresholds and Segment the Image The final set of $n$ thresholds is obtained by averaging the corresponding thresholds from the three dimensions: $ \mathrm{Th}_k = (r_k^* + s_k^* + t_k^*) / 3, \quad \text{for } k = 1, \dots, n $

These final thresholds are then applied to the original grayscale image $I_G$ to produce the segmented image $I_{sg}$ according to the rule in Equation (13): $ I_{sg}(r, c) = \left{ \begin{array}{ll} I_G(r, c) & \text{if } I_G(r, c) \le \mathrm{th}1 \ \mathrm{th}{k-1} & \text{if } \mathrm{th}_{k-1} < I_G(r, c) \le \mathrm{th}_k, \quad k=2, \dots, n \ I_G(r, c) & \text{if } I_G(r, c) > \mathrm{th}_n \end{array} \right. $ This process is illustrated by the flowchart in Figure 3 of the paper.

The following figure (Figure 3 from the original paper) shows the flowchart of the proposed method.

Original caption: Fig. 3. Flowchart of proposed method.

5. Experimental Setup

5.1. Datasets

The authors used two standard benchmark datasets for their experiments to ensure the results are robust and generalizable.

Berkeley Segmentation Dataset and Benchmark (BSDS500): This is a widely-used and challenging dataset containing 500 natural images with human-annotated ground truth segmentations. Its diversity in content (animals, landscapes, people) makes it an excellent benchmark for evaluating segmentation algorithms.
Kodak Lossless True Color Image Suite (Kodim): This dataset consists of high-quality, uncompressed color images. It is often used in image processing research to test algorithms on clean, artifact-free data.

For detailed quantitative analysis in Tables II-VI, the authors selected ten diverse color images (shown in Figure 4) of size $256 \times 256$ pixels from these datasets. For the broader evaluation using PRI, BDE, GCE, and VoI metrics, the full BSDS500 dataset was used.

The following figure (Figure 4 from the original paper) shows the ten test images used for detailed evaluation.

$Fig. 4. (a)(j) Original test images \[23\], \[24\].$ Original caption: Fig. 4. (a)(j) Original test images [23], [24].

5.2. Evaluation Metrics

The paper employs a comprehensive set of metrics to evaluate the quality of the segmented images from different perspectives.

Mean Error (ME):
- Conceptual Definition: ME measures the average absolute difference in intensity between the pixels of the original image and the segmented image. A lower ME value indicates that the segmented image is a closer representation of the original.
- Mathematical Formula: The paper's Table I presents the formula as $= N i=$ . This appears to be a typo. The standard formula for Mean Absolute Error, which is likely intended, is: $ ME = \frac{1}{J \times K} \sum_{i=1}^{J} \sum_{j=1}^{K} |I(i,j) - I_s(i,j)| $
- Symbol Explanation:
  - J, K: The dimensions (height and width) of the image.
  - I(i,j): The intensity of the pixel at position (i,j) in the original image.
  - $I_s(i,j)$ : The intensity of the pixel at position (i,j) in the segmented image.
Mean Square Error (MSE):
- Conceptual Definition: MSE calculates the average of the squares of the intensity differences between the original and segmented images. It penalizes larger errors more heavily than ME. A lower MSE is better.
- Mathematical Formula: $ MSE = \frac{1}{J \times K} \sum_{i=1}^{J} \sum_{j=1}^{K} (I(i,j) - I_s(i,j))^2 $
- Symbol Explanation: Same as for ME.
Peak Signal-to-Noise Ratio (PSNR):
- Conceptual Definition: PSNR measures the ratio between the maximum possible power of a signal (the image) and the power of corrupting noise that affects the fidelity of its representation. It is widely used to measure the quality of reconstructed or compressed images. A higher PSNR value indicates better quality.
- Mathematical Formula: $ PSNR = 20 \log_{10} \left( \frac{MAX_I}{\sqrt{MSE}} \right) $
- Symbol Explanation:
  - $MAX_I$ : The maximum possible pixel value of the image (e.g., 255 for an 8-bit image).
  - MSE: The Mean Square Error calculated above.
Structural Similarity Index (SSIM):
- Conceptual Definition: SSIM is a perceptual metric that measures the similarity between two images by considering changes in structural information, luminance, and contrast. It is considered to be more aligned with human visual perception than MSE or PSNR. SSIM values range from -1 to 1, where 1 indicates perfect similarity.
- Mathematical Formula: $ SSIM(x, y) = \frac{(2\mu_x\mu_y + c_1)(2\sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)} $
- Symbol Explanation:
  - $\mu_x, \mu_y$ : The mean of images $x$ and $y$ .
  - $\sigma_x^2, \sigma_y^2$ : The variance of images $x$ and $y$ .
  - $\sigma_{xy}$ : The covariance of $x$ and $y$ .
  - $c_1, c_2$ : Small constants to stabilize the division.
Feature Similarity Index (FSIM):
- Conceptual Definition: FSIM evaluates image quality based on the premise that the human visual system understands an image mainly according to its low-level features. It measures similarity using phase congruency (a measure of edge and corner significance) and gradient magnitude. Higher FSIM values (closer to 1) indicate better feature preservation.
- Mathematical Formula: The paper provides a simplified version in Table I. The general form is: $ FSIM = \frac{\sum_{x \in \Omega} S_L(x) \cdot PC_m(x)}{\sum_{x \in \Omega} PC_m(x)} $
- Symbol Explanation:
  - $\Omega$ : The entire image domain.
  - $S_L(x)$ : A similarity score at pixel $x$ combining phase congruency and gradient magnitude.
  - $PC_m(x)$ : The maximum phase congruency value at pixel $x$ , used as a weighting factor.
Entropy:
- Conceptual Definition: In image processing, entropy is a statistical measure of randomness that can be used to characterize the texture or information content of an image. A higher entropy value in the segmented image suggests that more information from the original image has been preserved.
- Mathematical Formula: $ H = - \sum_{i=0}^{L-1} p(i) \log_2 p(i) $
- Symbol Explanation:
  - L-1: The maximum intensity level.
  - p(i): The probability of occurrence of intensity level $i$ .
Other Metrics (PRI, VoI, GCE, BDE): For the larger dataset evaluation, the paper also uses Probability Rand Index (PRI, higher is better), Variation of Information (VoI, lower is better), Global Consistency Error (GCE, lower is better), and Boundary Displacement Error (BDE, lower is better). These are standard metrics for comparing a computed segmentation against a ground-truth segmentation.

5.3. Baselines

The proposed method, 3D Otsu-Energy, was compared against a comprehensive set of five state-of-the-art and foundational methods to demonstrate its superiority.

Histogram-based methods:
1. 1-D Otsu: The classic algorithm using only the intensity histogram.
2. 2-D Otsu: The extension using a 2D histogram of intensity and neighborhood mean.
3. 3-D Otsu: The advanced version using a 3D histogram of intensity, mean, and median.

Energy curve-based methods: 4. 1-D Otsu-Energy: A baseline created by applying the 1-D Otsu criterion to the energy curve. 5. 2-D Otsu-Energy: A baseline created by applying the 2-D Otsu criterion to the energy curve and neighborhood mean.

This selection of baselines allows for a thorough analysis, isolating the benefits of using the `energy curve` (by comparing histogram vs. energy versions) and the benefits of using a higher-dimensional framework (by comparing 1D vs. 2D vs. 3D versions).

6. Results & Analysis

6.1. Core Results Analysis

The paper presents a detailed quantitative and qualitative analysis of the experimental results. The proposed 3D Otsu-Energy method consistently outperforms the five baseline methods across various metrics and threshold levels.

6.1.1. Quantitative Analysis

The results are presented in Tables III, IV, V, and VI for the ten test images, and in Tables VII-X for the full BSDS500 dataset.

Information Preservation (Entropy) and Error (ME): As shown in Table III, the proposed 3D Otsu-Energy method consistently yields the highest Entropy and the lowest Mean Error (ME) for almost all test images and at all threshold levels (L=2, 3, 5, 8). For example, for Image 1 at L=8, the proposed method achieves an Entropy of 4.3062 (compared to 4.0323 for 3D Otsu and 3.6391 for 2D Otsu-Energy) and an ME of 0.0352 (compared to 0.0577 for 3D Otsu and 0.0520 for 2D Otsu-Energy). This indicates superior information preservation and accuracy.
Image Fidelity (MSE and PSNR): Table IV demonstrates that 3D Otsu-Energy achieves the lowest MSE and consequently the highest PSNR. A higher PSNR value signifies that the segmented image is a higher-fidelity representation of the original. For Image 2 at L=8, the PSNR for the proposed method is 27.3898 dB, surpassing all other methods, including the next best (2D Otsu-Energy at 27.1142 dB).
Perceptual Quality (SSIM and FSIM): Table V, which reports on metrics more aligned with human perception, further solidifies the superiority of the proposed method. It achieves the highest SSIM and FSIM values in nearly every case. For Image 3 at L=8, the FSIM is 0.9246, a notable improvement over the 0.8874 from histogram-based 3D Otsu. This means the structural and feature-level details are better preserved.
Computational Time (CPU Time): Table VI reveals the primary trade-off of the proposed method. The energy-based methods are significantly slower than the histogram-based ones due to the overhead of calculating the energy curve. Furthermore, the 3D Otsu-Energy method is the most computationally intensive of all. For example, for Image 1 at L=5, it takes 237.98 seconds, whereas the histogram-based 3D Otsu takes only 12.81 seconds. This highlights a trade-off between segmentation quality and computational efficiency.
Performance on BSDS500 Dataset: The results on the larger dataset (Tables VII-X) confirm these findings. The 3D Otsu-Energy method achieves the best average scores across all metrics: highest PRI (0.6932 at L=8) and lowest BDE (8.056 at L=8), GCE (0.4592 at L=8), and VoI (3.4789 at L=8). This demonstrates its robustness and generalizability.

6.1.2. Qualitative Analysis

The paper provides visual results in Figures 5, 6, 7, and a summary comparison in Figure 8.

The following figure (Figure 8 from the original paper) provides a direct visual comparison of the segmentation results from all six methods at a 5-level threshold.

Original caption: Fig. 8. Visual comparison of segmentation results at 5-level of thresholding for 1-D Otsu, 2-D Otsu, 3-D Otsu, 1-D Otsu-Energy, 2-D Otsu-Energy, and 3-D Otsu-Energy methods, respectively.

As seen in Figure 8, the results from the proposed 3D Otsu-Energy method (the rightmost column) are visually superior.

Clarity and Detail: In images like the bird and the dog, the proposed method provides a much clearer distinction between the subject and the background. The regions are well-delimited, and fine details are better preserved.
Region Homogeneity: The histogram-based methods, particularly 1-D Otsu, often produce noisy or fragmented regions. In contrast, the 3D Otsu-Energy results show more uniform and consistent regions, thanks to the incorporation of spatial context which helps to group neighboring pixels correctly.
Visual Effect: The segmented images have a more pleasant visual effect with better gray gradation, making the different segmented levels easier to discriminate.

In summary, both the quantitative data and the visual evidence strongly support the claim that the proposed Spatial Context Energy Curve-Based 3-D Otsu Algorithm provides a more accurate and robust solution for multilevel image segmentation compared to existing methods.

6.2. Data Presentation (Tables)

The following are the results from Table II of the original paper:

Test Images	L	Histogram and Energy based Threshold Values
Test Images	L	1D Otsu [5, 7, 9, 12]	2D Otsu [14-15, 17]	3D Otsu [22]	1D Otsu-Energy	2D Otsu-Energy	3D Otsu-Energy
1	2	88 137	119 179	85 146	68 142	127 169	141 194
	3	83 108 152	61 80 152	75 127 216	39 82 133	109 115 164	91 114 195
	5	56 89 108 152 166	56 73 132 176 203	12 73 113 155 181	33 63 98 145 218	38 72 85 135 163	46 79 157 207 214
	8	7 86 113 123 152 173 245 245	58 89 130 143 156 165 184 204	46 61 100 116 136 174 194 219	24 36 62 92 127 157 178 182	27 47 50 85 114 153 160 177	30 85 96 98 147 163 187 207
2	2	65 139	60 140	37 107	84 153	96 154	116 208
	3	50 109 161	51 105 151	63 163 188	62 118 174	61 119 177	95 116 198
	5	18 54 100 128 190	14 29 124 186 248	28 69 84 102 114	38 58 96 136 175	78 98 105 143 180	27 59 102 143 186
	8	26 52 52 75 79 124 178 216	40 69 98 150 185 223 239 245	9 26 48 59 125 148 171 210	30 59 82 106 131 166 178 186	25 32 51 59 78 98 123 163	33 65 88 142 152 186 197 232
3	2	74 129	57 141	81 139	63 122	61 150	50 106
	3	55 80 115	43 89 193	99 130 213	47 78 151	51 103 151	91 165 205
	5	14 48 73 109 135	41 82 128 173 199	42 63 107 147 196	30 53 82 110 183	22 57 95 139 170	69 125 164 182 218
	8	39 60 69 84 99 125 127 147	42 86 96 99 132 164 185 197	42 66 95 118 130 143 186 198	20 36 54 80 96 107 128 192	39 66 70 75 80 112 126 170	44 56 72 99 129 141 168 236
4	2	67 125	101 227	77 183	76 136	85 179	97 204
	3	58 94 128	60 146 228	56 114 218	73 138 164	78 99 164	60 107 193
	5	59 92 138 177 192	44 81 152 169 189	53 101 113 156 222	62 92 103 143 182	15 47 106 141 161	23 56 89 155 219
	8	16 49 71 98 129 134 170 242	36 56 112 122 162 183 195 214	10 73 101 124 143 173 222 233	46 55 79 87 96 118 164 194	25 46 72 73 115 145 162 166	13 42 70 109 123 136 175 178
5	2	154 205	89 170	110 152	120 171	79 153	130 198
	3	93 143 206	21 39 134	86 143 182	79 147 185	50 87 147	219
	5	86 145 155 195 220	47 67 136 165 206	34 96 112 154 187	79 117 122 152 199	19 41 79 120 171	33 66 101 156 234
	8	55 85 115 149 186 191 223 239	47 104 137 149 186 206 221 234	30 57 85 94 119 149 179 199	21 55 87 134 145 173 190 195	10 38 57 80 111 125 149 162	49 109 121 158 176 193 223 239
6	2	72 164	100 136	54 162	63 142	117 179	106 230
	3	41 75 118	48 105 140	32 66 154	44 83 172	32 105 157	75 117 195
	5	19 46 67 108 218	12 83 112 137 183	56 144 169 215 239	13 36 84 128 179	37 98 132 161 187	43 59 144 179 221
	8	15 34 45 59 99 104 113 189	37 80 86 102 162 170 194 208	21 53 91 111 175 185 208 211	6 29 51 82 107 154 162 187	59 88 95 118 126 148 161 197	9 50 81 88 97 152 194 208
7	2	63 149	69 183	84 144	69 148	61 141	119 199
	3	53 96 174	82 138 198	233	62 123 168	8 111 154	106 188 210
	5	27 70 129 165 199	58 83 174 188 236	31 53 148 188 234	34 60 96 142 170	28 82 104 146 168	29 37 108 172 202
	8	36 48 79 97 148 179 214 254	33 104 116 133 150 179 209 211	34 65 101 142 189 217 238 249	22 54 76 103 120 137 155 199	29 40 68 86 97 123 133 177	62 78 84 133 138 155 187 214
8	2	69 120	52 107	97 173	73 138	88 153	56 187
	3	47 102 145	49 115 188	58 148 187	45 98 177	30 82 153	64 119 195
	5	46 64 89 139 196	27 42 104 166 201	55 96 104 152 200	33 47 82 117 168	34 59 96 155 199
	8	51 64 128 175 235	18 61 105 136 164 176 200 248	32 44 69 104 113 151 166 184	28 74 114 122 152 171 186 225	11 35 57 64 95 129 135 172	34 59 101 137 156 178 198 224
9	2	122 172	106 175	105 212	103 178	95 167	72 182
	3	96 149 186	71 139 206	76 151 237	72 135 186	79 112 168	92 177 215
	5	82 123 154 174 233	27 75 116 136 177	60 82 162 201 228	41 92 115 161 196	24 32 80 94 119 122 160 195	64 102 142 181 216
	8	23 86 129 150 171 205 205 224	10 16 45 57 107 133 162 232	35 52 83 101 134 170 207 253	27 34 92 148 172 193 208 224	21 50 89 105 140 156 180 192	25 72 87 113 145 185 214 238
10	2	126 176	144 179	125 199	118 185	131 195	116 165
	3	99 157 195	89 170 213	42 154 227	62 113 176	92 148 174	73 123 170
	5	90 140 181 196 199	79 119 123 138 173	49 138 161 196 213	64 122 141 172 184	60 98 130 154 178	23 101 138 170 228
	8	56 124 147 164 183 190 199 230	25 95 143 168 185 203 217 247	20 80 116 142 150 170 235 245	29 55 92 127 128 162 182 198	6 54 76 118 154 164 178 186	25 36 54 91 144 160 183 201

... [Note: Due to length constraints, Tables III-X are summarized in the analysis above. A full transcription would exceed typical response limits but follows the same format.]

6.3. Ablation Studies / Parameter Analysis

The paper does not contain a formal ablation study in a separate section. However, the entire experimental design acts as a form of ablation. By comparing 1D/2D/3D Otsu (histogram-based) with 1D/2D/3D Otsu-Energy (energy-based), the authors effectively isolate and demonstrate the impact of replacing the histogram with the energy curve. Similarly, by comparing the 1D, 2D, and 3D versions within each category, they demonstrate the incremental benefit of adding more spatial features (neighborhood mean and median). The results consistently show that both the energy curve and the higher-dimensional framework contribute to the final performance, with the combination of the two (3D Otsu-Energy) yielding the best results.

7. Conclusion & Reflections

7.1. Conclusion Summary

The paper successfully proposes and validates a novel multilevel thresholding method for color image segmentation called the Spatial Context Energy Curve-Based 3-D Otsu Algorithm. The key innovation is the replacement of the conventional intensity histogram with a spatial context energy curve, which captures crucial information about the spatial relationships between pixels. By combining this context-rich representation with a 3-D Otsu framework, the algorithm achieves segmentation results that are demonstrably superior to traditional histogram-based methods and lower-dimensional energy-based approaches. The experimental outcomes, supported by a comprehensive set of quantitative metrics (ME, MSE, PSNR, SSIM, FSIM, Entropy) and qualitative visual assessments, confirm that the proposed method produces higher quality segmented images with better-preserved information, structure, and features.

7.2. Limitations & Future Work

Limitations:
- Computational Complexity: The most significant limitation, clearly shown in the experimental results (Table VI), is the high computational cost. The process of calculating the energy curve for every intensity level is intensive, making the proposed method much slower than its histogram-based counterparts. This could be a barrier to its use in real-time applications.
Future Work:
- The authors suggest that future research could focus on designing more efficient 3-D Otsu algorithms to mitigate the high computational time.
- They also propose exploring the use of the energy curve concept with other objective functions besides Otsu's method for color image multilevel thresholding.

7.3. Personal Insights & Critique

Strengths:
- Novelty and Intuition: The core idea of replacing the histogram is elegant and powerful. It directly targets a fundamental weakness of many classic image processing techniques—the disregard for spatial context. The intuition that a representation based on spatial correlation is better for a spatial task like segmentation is very strong.
- Thoroughness of Evaluation: The experimental design is a major strength of this paper. The authors did not just compare their method to one or two baselines; they compared it against a full spectrum of related techniques, allowing them to precisely demonstrate the benefits of both the energy curve and the 3-D framework.
- Clear and Significant Results: The performance gains are not marginal. The consistent and often large improvements across multiple metrics provide compelling evidence for the method's effectiveness.
Potential Issues and Areas for Improvement:
- Efficiency is a Major Hurdle: While the quality of segmentation is excellent, the reported CPU times (often several minutes per image) make the method impractical for many applications. Future work must address this, perhaps through GPU acceleration, algorithmic optimizations for energy curve calculation, or by finding a less costly proxy for spatial energy.
- Clarity on Complexity Analysis: The paper briefly mentions that the 3-D Otsu function used reduces time complexity from $O(L^3)$ to $O(L)$ . This claim is not fully elaborated. The independent optimization of the three dimensions reduces the search space significantly compared to a true 3D search, but the overall complexity is still heavily dependent on the multilevel search, which is not $O(L)$ . Given the very high runtimes in Table VI, a more detailed complexity analysis would have been beneficial.
- Generalization of the Energy Function: The specific energy function used is based on a Hopfield network model. It would be interesting to investigate whether other methods of quantifying spatial correlation could yield similar or better results, and how sensitive the algorithm is to the choice of the neighborhood size ( $k=3$ ) and the energy function itself.