M3W: Multistep Three-Way Clustering

APRIL 2024

Paper status: completed

M3W: Multistep Three-Way Clustering

Published:09/29/2022

Three-Way Clustering Algorithm (1)Neighborhood Information Utilization (1)Multistep Clustering Strategy (1)Progressive Erosion Strategy (1)Multilevel Data Structure (1)

Original Link

Price: 0.10

6 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

M3W tackles limitations in three-way clustering (information deficiency, fixed thresholds) by proposing a progressive erosion strategy to build a multilevel data structure. Coupled with a multistep allocation considering neighborhood information, it gradually acquires more data,

Abstract

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 4, APRIL 2024 5627 M3W: Multistep Three-Way Clustering Mingjing Du , Member, IEEE , Jingqi Zhao, Jiarui Sun, and Yongquan Dong Abstract — Three-way clustering has been an active research topic in the field of cluster analysis in recent years. Some efforts are focused on the technique due to its feasibility and rationality. We observe, however, that the existing three-way clustering algorithms struggle to obtain more information and limit the fault tolerance excessively. Moreover, although the one-step three-way allocation based on a pair of fixed, global thresholds is the most straightforward way to generate the three-way cluster representations, the clusters derived from a pair of global thresholds cannot exactly reveal the inherent clustering structure of the dataset, and the threshold values are often difficult to determine beforehand. Inspired by sequential three-way decisions, we propose an algorithm, called multistep three-way clustering (M3W), to address these issues. Specifically, we first use a progressive erosion strategy to

Mind Map

In-depth Reading

English Analysis~14 min read · 21,213 chars

1. Bibliographic Information

Title: M3W: Multistep Three-Way Clustering
Authors: Mingjing Du (Member, IEEE), Jingqi Zhao, Jiarui Sun, and Yongquan Dong. Their affiliations are with Jiangsu Normal University, Xuzhou, China, and Shandong University, Jinan, China.
Journal/Conference: The paper mentions "Member, IEEE" and uses a standard IEEE transaction format, suggesting publication in a reputable IEEE journal, likely related to data mining, cybernetics, or knowledge systems.
Publication Year: The specific publication date is not provided in the text, but the references include papers up to 2021, suggesting the work was likely published in or after 2021.
Abstract: The paper addresses key limitations in existing three-way clustering algorithms, namely their difficulty in leveraging sufficient information for ambiguous data points and their reliance on fixed, global thresholds that are hard to determine. To solve this, the authors propose a novel algorithm called Multistep Three-Way Clustering (M3W), inspired by sequential three-way decisions. M3W first employs a progressive erosion strategy to create a multilevel data structure, separating high-density core regions from lower-density boundary regions. It then uses a multistep allocation strategy that considers neighborhood information to gradually assign data points, starting from the core and moving outwards. This allows the algorithm to adaptively capture the inherent clustering structure of the data. The paper demonstrates through experiments on 18 benchmark datasets that M3W outperforms eight competing algorithms.
Original Source Link: /files/papers/68ef675cc4954f31b6b55d82/paper.pdf (Formally published paper).

2. Executive Summary

Background & Motivation (Why):
- Core Problem: Traditional clustering methods, often called "two-way clustering," force each data point to belong to exactly one cluster. This is problematic for data with overlapping regions or ambiguous boundaries. Three-way clustering addresses this by introducing a "boundary region" for points that might belong to multiple clusters.
- Identified Gaps: Existing three-way clustering algorithms suffer from two main weaknesses:
  1. They use a one-step allocation process with fixed, global thresholds. This approach is not adaptive to clusters with different shapes, densities, or distributions and struggles to handle complex boundaries. Determining these global thresholds is also a significant challenge.
  2. They often fail to gather sufficient information for points in the boundary regions, leading to potential misclassifications and limiting their fault tolerance.
- Innovation: The paper introduces a novel approach inspired by sequential three-way decisions. Instead of making an immediate, final decision for every data point, this paradigm allows for deferring judgment on uncertain points until more information becomes available. The M3W algorithm applies this concept by processing data in stages, from the most certain (cluster cores) to the least certain (cluster boundaries).
Main Contributions / Findings (What):
1. A Novel Algorithm (M3W): The paper proposes M3W, a multistep three-way clustering algorithm that combines a progressive erosion strategy with the principles of sequential three-way decision-making. This iterative process is better suited for handling clusters with complex distributions and unclear boundaries.
2. Multilevel Structure via Progressive Erosion: M3W uses a progressive erosion technique to peel away the outer, lower-density layers of data points, effectively creating a multilevel structure. This process clearly separates the high-density core regions of different clusters, which are easier to identify correctly.
3. Multistep Allocation Strategy: Leveraging the multilevel structure, M3W employs a multistep clustering process. It first clusters the high-confidence core data and then iteratively assigns the eroded (boundary) data points level by level. This allows boundary points to gather information from their already-classified neighbors, increasing the probability of correct assignment.
4. Superior Performance: Experimental results on 18 synthetic and real-world datasets show that M3W consistently outperforms eight state-of-the-art and benchmark clustering algorithms, demonstrating its effectiveness and robustness.

Foundational Concepts

Clustering: An unsupervised machine learning task that involves grouping a set of objects (data points) in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups.
Two-Way Clustering: The most common form of clustering, where every data point is assigned to at most one cluster. The relationship is binary: a point either "belongs to" or "does not belong to" a cluster.
Three-Way Decision Theory: A framework for problem-solving that mimics human triadic thinking. Instead of making a binary decision (e.g., accept/reject), it divides a set into three regions: a positive region (accept), a negative region (reject), and a boundary region (defer judgment and seek more information). This theory originated from the interpretation of rough sets.
Three-Way Clustering: An extension of clustering based on three-way decision theory. For any given cluster, a data point can:
1. Belong-to certainly (Positive Region / Core Region): The point is definitively part of the cluster.
2. Belong-to partially (Boundary Region): The point lies in an overlapping or ambiguous area and may belong to this cluster and potentially others.
3. Not belong-to certainly (Negative Region): The point is definitively not part of the cluster. As shown in Image 2, this allows for a more nuanced representation of clusters, especially where they overlap. A cluster $C_i$ is represented by an interval set $(\underline{\mathbf{C}_i}, \overline{\mathbf{C}_i})$ , where $\underline{\mathbf{C}_i}$ is the core region and $\overline{\mathbf{C}_i}$ is the core plus the boundary region.
  
  $该图像是示意图，展示了两种不同的数据聚类场景。(a) 部分显示了两个相邻的聚类C1和C2，其中一些数据点位于两者之间，难以明确归属。 (b) 部分则描绘了多步三向聚类（M3W）策略下的聚类结果，通过内层核心聚类（如$\\widehat{C}_1, \\widehat{C}_2$）和外层区域（如$\\ddot{C}_1, \\ddot{C}_2$）的划分，更精细地揭示了数据的多层结构，并有助于更准确地处理…$ 该图像是示意图，展示了两种不同的数据聚类场景。(a) 部分显示了两个相邻的聚类C1和C2，其中一些数据点位于两者之间，难以明确归属。 (b) 部分则描绘了多步三向聚类（M3W）策略下的聚类结果，通过内层核心聚类（如 $\widehat{C}_1, \widehat{C}_2$ ）和外层区域（如 $\ddot{C}_1, \ddot{C}_2$ ）的划分，更精细地揭示了数据的多层结构，并有助于更准确地处理边界点。
Sequential Three-Way Decisions: An extension of the three-way decision theory where decisions are not made in a single step. Instead, it uses a multilevel or iterative process. At each step, confident decisions (accept/reject) are made, while uncertain cases are passed to the next level for re-evaluation with more information or at a finer granularity. M3W is directly inspired by this concept.
Density-Based Clustering: A family of algorithms (like DBSCAN) that identify clusters as dense regions of data points separated by sparser regions. M3W uses density estimation as a core component.
Progressive Erosion: A technique, borrowed from mathematical morphology, where the outer layers of a data distribution are iteratively "eroded" or removed. In this paper, low-density points are considered the outer layers and are peeled away to reveal the high-density cores of clusters.

Previous Works and Differentiation

The paper positions M3W against several categories of existing work:

Extensions of Classic Algorithms:
- CE3: A three-way clustering framework based on K-means that uses contraction and expansion.
- 3W-DBSCAN: A three-way extension of DBSCAN that uses a pair of density thresholds.
- TWKM and TCM: Other variations of three-way K-means.
- 3WDPET: A three-way version of density peak clustering.
- Limitation: The performance of these methods is heavily tied to the underlying algorithms (K-means, DBSCAN). Furthermore, CE3, TWKM, TCM, and 3WDPET require the number of clusters to be specified in advance.
Threshold-Based Methods: Most existing three-way clustering algorithms rely on a pair of fixed, global thresholds to partition data into core and boundary regions.
- Limitation: A single pair of global thresholds cannot adapt to clusters with varying densities and distributions. The paper argues this is a fundamental flaw.
Automatic Threshold Selection Methods: Some algorithms like 3WC-OR (using genetic algorithms) and another by Yu et al. (using universal gravitation) attempt to automatically determine the thresholds.
- Differentiation: While these methods automate threshold selection, they still rely on a one-step allocation process driven by these estimated thresholds. In contrast, M3W uses a multistep, adaptive allocation scheme where decisions are made progressively, level by level, without relying on a single pair of global thresholds.

4. Methodology (Core Technology & Implementation)

The M3W algorithm is structured into three main phases, as illustrated in Image 1.

Fig. 2. Architecture of M3W. 该图像是M3W算法的架构示意图（图2），展示了从输入到输出的聚类流程。它首先通过渐进式“侵蚀过程”构建数据的多层级结构。接着，在聚类阶段，结合基于连接的聚类和三支聚类策略，逐步对数据点进行识别和分配，以增加正确分配的概率。最终，原始输入数据被清晰地划分为两个不同的聚类输出，有效揭示了数据集的内在结构。

A. Dynamic Density Estimation

Instead of using a static density measure, M3W re-estimates the density of the remaining data points at each level of the erosion process. This dynamic approach allows the density estimation to adapt as the outer, sparser points are removed. The density $\rho_i^{(l)}$ of a data point $\mathbf{x}_i$ at level $l$ is calculated using an adaptive Gaussian kernel:

$\rho _ { i } ^ { ( l ) } = \sum _ { \mathbf { x } _ { j } \in R \mathbf { k } \mathbf { N } \mathbf { N } ^ { ( l ) } ( \mathbf { x } _ { i } ) } \exp \biggl ( - \frac { \| \mathbf { x } _ { j } - \mathbf { x } _ { i } \| ^ { 2 } } { \| \mathbf { x } _ { j } - N _ { k } ^ { ( l ) } \bigl ( \mathbf { x } _ { j } \bigr ) \| ^ { 2 } } \biggr ) .$

$\rho_i^{(l)}$ : Density of point $\mathbf{x}_i$ at level $l$ .
$RkNN^{(l)}(\mathbf{x}_i)$ : The set of reverse k-nearest neighbors of $\mathbf{x}_i$ at level $l$ . A point $\mathbf{x}_j$ is a reverse neighbor of $\mathbf{x}_i$ if $\mathbf{x}_i$ is one of the k-nearest neighbors of $\mathbf{x}_j$ . This measure helps identify the local density structure.
$\| \mathbf{x}_j - \mathbf{x}_i \|$ : Euclidean distance between points $\mathbf{x}_j$ and $\mathbf{x}_i$ .
$N_k^{(l)}(\mathbf{x}_j)$ : The $k$ -th nearest neighbor of $\mathbf{x}_j$ at level $l$ . The denominator term $\| \mathbf{x}_j - N_k^{(l)}(\mathbf{x}_j) \|^2$ acts as an adaptive bandwidth, which varies based on the local data density around $\mathbf{x}_j$ .

B. Progressive Erosion Strategy

This phase constructs the multilevel structure of the data. It's an iterative process over $L$ levels.

Density Calculation: At each level $l$ , the density of all remaining points $\mathbf{X}^{(l)}$ is calculated using the dynamic density estimation formula above.
Erosion: Points with the lowest densities are identified as boundary points and are "eroded." Specifically, points whose density is below a cutoff value $\rho_c^{(l)}$ are removed. $\mathbf { X } _ { E } ^ { ( l ) } = \Big \{ { \mathbf { x } _ { i } } \mid \rho _ { i } ^ { ( l ) } \leqslant \rho _ { c } ^ { ( l ) } \Big \} .$ The paper sets $\rho_c^{(l)}$ such that the $10\%$ of points with the lowest densities are eroded at each level.
Update: The set of remaining points for the next level is updated: $\mathbf { X } ^ { ( l + 1 ) } = \mathbf { X } ^ { ( l ) } \setminus \mathbf { X } _ { E } ^ { ( l ) }$ After $L$ levels, the remaining data, $\mathbf{X}^{(L+1)}$ , represents the high-confidence "initial" cores of the clusters. The eroded sets $\mathbf{X}_E^{(1)}, \dots, \mathbf{X}_E^{(L)}$ form the different layers of the boundary regions.

C. Multistep Clustering Process

This phase reconstructs the clusters in reverse order of erosion, from the inside out.

1) Connectivity-Based Approach for Core Clustering

The initial cores, $\mathbf{X}^{(L+1)}$ , are clustered first. This is done using a connectivity-based method inspired by DBSCAN and HDBSCAN.

Core Distance ( $d_{\mathrm{core}}$ ): The distance from a point $\mathbf{x}_i$ to its $k$ -th nearest neighbor.
Adaptive Distance Threshold ( $\varepsilon_{\mathrm{core}}^{(l)}(\mathbf{x}_i)$ ): A dynamic threshold that adapts based on neighborhood information from previously eroded layers. For the initial level ( $l=1$ ), it's based on the mean and standard deviation of core distances. For subsequent levels, it's the minimum of the previous level's threshold and a value derived from the distances to its nearest eroded neighbors. This allows the threshold to shrink in dense areas.
Mutual Reachability: Two core points $\mathbf{x}_i$ and $\mathbf{x}_j$ are considered mutually reachable if there's a path of core points between them, where the distance between any two adjacent points on the path is less than or equal to their mutual reachability distance threshold. All mutually reachable points in $\mathbf{X}^{(L+1)}$ are merged to form the initial core regions of the clusters, denoted as $\{\underline{\mathbf{C}_1^{(L+1)}}, \dots, \underline{\mathbf{C}_K^{(L+1)}}\}$ .

2) Three-Way Approach for Boundary Assignment

Next, the eroded points are assigned back in reverse order: $\mathbf{X}_E^{(L)}, \dots, \mathbf{X}_E^{(1)}$ . For each level $l$ (from $L$ down to 1), the points in $\mathbf{X}_E^{(l)}$ are assigned using a "two-stage" allocation scheme based on their neighbors in the already-clustered, higher-level data ( $\mathbf{X}^{(l+1)}$ ).

Probability Calculation: For an unassigned point $\mathbf{x}_i \in \mathbf{X}_E^{(l)}$ , the probability of it belonging to an existing cluster core $\underline{\mathbf{C}_r^{(l)}}$ is calculated based on the labels of its $k$ nearest neighbors that are already in a core region (core k-nearest neighbors). $P \left( \mathbf { C } _ { r } ^ { ( l ) } | \mathbf { x } _ { i } \right) = \frac { \left| \left\{ \mathbf { x } _ { j } | \mathbf { x } _ { j } \in C N ^ { ( l ) } ( \mathbf { x } _ { i } ) \right\} \cap \left\{ \mathbf { x } _ { p } | \mathbf { x } _ { p } \in \underline { { \mathbf { C } _ { r } ^ { ( l ) } } } \right\} \right| } { \left| C N ^ { ( l ) } ( \mathbf { x } _ { i } ) \right| }$ This is essentially the fraction of its core neighbors that belong to cluster $r$ .
Rule 1 (First Stage):
- If $P(\mathbf{C}_r^{(l)} | \mathbf{x}_i) = 1$ (all core neighbors belong to a single cluster $r$ ), $\mathbf{x}_i$ is assigned to the core region of $\mathbf{C}_r$ .
- If $0 < P(\mathbf{C}_r^{(l)} | \mathbf{x}_i) < 1$ but it's the maximum probability, $\mathbf{x}_i$ is temporarily assigned to a "candidate" boundary region of $\mathbf{C}_r$ for further processing.
Rule 2 (Second Stage): For a point $\mathbf{x}_i$ in a "candidate" boundary region of cluster $\mathbf{C}_r$ :
- If there exists another cluster $\mathbf{C}_s$ such that the probability gap is small ( $P(\mathbf{C}_r^{(l)} | \mathbf{x}_i) - P(\mathbf{C}_s^{(l)} | \mathbf{x}_i) < 1/K$ ), the point is considered truly ambiguous. It is assigned to the boundary region of $\mathbf{C}_r$ and all other clusters $\mathbf{C}_s$ that satisfy this condition.
- Otherwise, the point is considered confident enough and is assigned to the core region of $\mathbf{C}_r$ .
  
  This process repeats until all points in $\mathbf{X}_E^{(1)}$ are assigned, completing the clustering.

D. Algorithm and Complexity Analysis

The full procedure is summarized in Algorithm 1.

The computational complexity is dominated by the $k$ -NN searches during the erosion phase. With efficient data structures like k-d trees, the complexity is approximately $O(L k \bar{N}_C \log \bar{N}_C)$ , where $L$ is the number of levels, $k$ is the number of neighbors, and $\bar{N}_C$ is the average number of points per level. Since $L$ and $k$ are much smaller than the total number of points $N$ , the overall complexity is roughly $O(N \log N)$ , making it efficient for large datasets.

5. Experimental Setup

Datasets: The evaluation uses 18 benchmark datasets:

8 Synthetic Datasets: Triangle1, Triangle2, S1, S2, T2, Pathbased, Ds2c2sc13, and T4. These are 2D datasets designed to test clustering algorithms on various challenges like different densities, shapes, and overlapping regions.
10 Real-World Datasets: Glass, Dermatology, Digits, MSRA, Segmentation, Optdigits, Statlog, Pendigits, Htru, and Shuttle from repositories like UCI. The details are transcribed in the table below, as per the paper's Table II. Table II: Datasets Used in Experiments (Manual Transcription)

Data sets	#Instances	#Features	#Classes
Triangle1	1000	2	4
Triangle2	1000	2	4
S1	5000	2	15
S2	5000	2	15
T2	4000	2	6
Pathbased	300	2	3
Ds2c2sc13	588	2	13
T4	7236	2	6
Glass	214	10	6
Dermatology	366	34	6
Digits	1797	64	10
MSRA	1799	256	12
Segmentation	2310	19	7
Optdigits	5620	64	10
Statlog	6435	36	6
Pendigits	10992	16	10
Htru	17898	8	2
Shuttle	22170	9	3

Evaluation Metrics:
1. Normalized Mutual Information (NMI):
  - Conceptual Definition: Measures the "mutual information" (statistical dependency) between the predicted clustering and the true ground-truth labels, normalized to a value between 0 and 1. A score of 1 means a perfect match, while 0 means the clustering is independent of the true labels.
  - Mathematical Formula: $\mathrm{NMI}(Y, C) = \frac{I(Y, C)}{\sqrt{H(Y)H(C)}}$
  - Symbol Explanation:
    - $Y$ : The ground-truth class labels.
    - $C$ : The predicted cluster labels.
    - I(Y, C): The mutual information between $Y$ and $C$ .
    - H(Y) and H(C): The entropy of the ground-truth labels and cluster labels, respectively.
2. Adjusted Rand Index (ARI):
  - Conceptual Definition: Measures the similarity between two data clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in both the predicted and true clusterings. It is "adjusted for chance," so a random clustering will have an ARI close to 0. The range is -1 to 1, with 1 indicating a perfect match.
  - Mathematical Formula: $\mathrm{ARI} = \frac{\text{Index} - \text{Expected Index}}{\text{Max Index} - \text{Expected Index}} = \frac{\sum_{ij}\binom{n_{ij}}{2} - [\sum_i\binom{a_i}{2}\sum_j\binom{b_j}{2}]/\binom{n}{2}}{\frac{1}{2}[\sum_i\binom{a_i}{2}+\sum_j\binom{b_j}{2}] - [\sum_i\binom{a_i}{2}\sum_j\binom{b_j}{2}]/\binom{n}{2}}$
  - Symbol Explanation:
    - $n_{ij}$ : Number of objects in true class $i$ and predicted cluster $j$ .
    - $a_i$ : Sum of $n_{ij}$ over $j$ .
    - $b_j$ : Sum of $n_{ij}$ over $i$ .
    - $\binom{n}{2}$ : Total number of pairs of objects.
3. Pairwise F1 Score:
  - Conceptual Definition: Views clustering as a series of pairwise decisions. For each pair of points, it checks if they are correctly placed together or correctly separated. The F1 score is the harmonic mean of pairwise precision and recall.
  - Mathematical Formula: $F1 = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}, \quad \text{where} \quad \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}, \quad \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$
  - Symbol Explanation:
    - TP (True Positives): Number of pairs of points that are in the same cluster in both ground truth and prediction.
    - FP (False Positives): Number of pairs that are in the same cluster in the prediction but not in the ground truth.
    - FN (False Negatives): Number of pairs that are not in the same cluster in the prediction but are in the ground truth.
Baselines: M3W is compared against eight algorithms:
- Three-Way: 3W-DBSCAN, 3W-DPET, CE3.
- Overlapping/Soft: NEO-K-Means, Fuzzy C-means (FCM), Rough K-Means (RKM).
- Two-Way: Kernel K-means (KnK-Means), Spectral Clustering (SC).

6. Results & Analysis

Core Results on Synthetic Datasets

The visual results on synthetic datasets demonstrate M3W's ability to handle complex structures.

On Triangle1 and Triangle2 (Gaussian clusters), M3W perfectly or near-perfectly identifies the clusters, even when the margins are small (Image 8 and 9).
On S1 and S2 (many overlapping clusters), M3W shows strong performance, whereas 3W-DBSCAN fails on S2 by merging adjacent clusters. M3W's erosion strategy effectively prevents this incorrect linking.
On Pathbased (a challenging dataset with non-convex shapes), M3W is the only algorithm that gives a satisfactory result, correctly separating the half-circle from the two inner clusters.
On T4 (complex shapes), M3W is again the only one to find the optimal structure (Image 3).

该图像是图10，展示了T4数据集上的聚类结果。它比较了M3W算法与3W-DBSCAN、3W-DPET等八种竞争算法的性能。M3W（a）清晰地识别出四种聚类结构，其结果比其他算法更准确地反映了数据的内在分布，验证了其优势。

The quantitative results are transcribed from Table III below. M3W achieves the highest scores on nearly all synthetic datasets across all three metrics.

Table III: Performance Comparison of M3W on Synthetic Datasets (Manual Transcription)

Algorithm	Triangle1				Triangle2
Algorithm	NMI	ARI	F1	Params	NMI	ARI	F1	Params
M3W	1.000	1.000	1.000	k=21, L=3	0.987	0.990	0.993	k=25, L=4
3W-DBSCAN	1.000	1.000	1.000	k=5, η=0.1, ε=0.1	0.974	0.986	0.993	k=15, η=0.4, ε=0.1
3W-DPET	1.000	1.000	1.000	K=4, k=5	0.968	0.981	0.993	K=4, k=5
CE3	0.948±0.000	0.970±0.000	1.000±0.000	K=4, q=6	0.859±0.000	0.917±0.000	0.993±0.000	K=4, q=14
NEO-K-Means	0.952±0.000	0.963±0.000	1.000±0.000	K=4, α=2, β=6	0.876±0.000	0.893±0.000	0.993±0.000	K=4, α=1.5, β=3
FCM	0.949±0.000	0.961±0.000	0.971±0.000	K=4, m=2	0.871±0.000	0.885±0.000	0.919±0.000	K=4, m=2
RKM	0.952±0.000	0.963±0.000	0.972±0.000	K=4, wu=0.2, ε=1	0.869±0.000	0.882±0.000	0.916±0.000	K=4, wu=0.4, ε=1
KnK-Means	0.978±0.108	0.984±0.185	0.988±0.118	K=4	0.886±0.056	0.901±0.065	0.938±0.004	K=4
SC	0.995±0.000	0.997±0.000	0.996±0.000	K=4, σ=0.5	0.901±0.001	0.919±0.001	0.953±0.001	K=4, σ=0.5
Algorithm	S1				S2
Algorithm	NMI	ARI	F1	Params	NMI	ARI	F1	Params
M3W	0.989	0.989	0.990	k=25, L=12	0.946	0.938	0.922	k=30, L=12
3W-DBSCAN	0.821	0.507	0.990	k=17, η=0.2, ε=0.1	0.000	0.000	0.922	k=20, η=1, ε=1
3W-DPET	0.819	0.587	0.990	K=15, k=20	0.656	0.337	0.922	K=15, k=20
CE3	0.979±0.025	0.982±0.076	0.990±0.033	K=15, q=8	0.898±0.019	0.840±0.057	0.854±0.068	K=15, q=8
NEO-K-Means	0.986±0.008	0.986±0.010	0.990±0.046	K=15, α=1, β=6	0.946±0.008	0.937±0.053	0.941±0.028	K=15, α=1, β=6
FCM	0.951±0.017	0.890±0.042	0.898±0.036	K=15, m=2	0.946±0.010	0.938±0.014	0.942±0.032	K=15, m=2.5
RKM	0.962±0.017	0.908±0.045	0.914±0.007	K=15, wu=0.3, ε=1	0.922±0.024	0.901±0.055	0.908±0.049	K=15, wu=0.4, ε=0.8
KnK-Means	0.861±0.017	0.662±0.029	0.691±0.089	K=15	0.820±0.034	0.622±0.072	0.654±0.082	K=15
SC	0.950±0.007	0.898±0.027	0.905±0.037	K=15, σ=2	0.905±0.021	0.845±0.062	0.856±0.050	K=15, σ=1.5
Algorithm	T2				Pathbased
Algorithm	NMI	ARI	F1	Params	NMI	ARI	F1	Params
M3W	0.986	0.995	0.997	k=24, L=2	0.935	0.960	0.973	k=9, L=4
3W-DBSCAN	0.873	0.809	0.997	k=20, η=0.9, ε=0.1	0.810	0.801	0.973	k=5, η=0.4, ε=0.1
3W-DPET	0.890	0.767	0.997	K=6, k=10	0.781	0.732	0.973	K=3, k=8
CE3	0.806±0.036	0.640±0.082	0.717±0.055	K=6, q=9	0.550±0.008	0.473±0.002	0.668±0.001	K=3, q=18
NEO-K-Means	0.789±0.028	0.614±0.028	0.706±0.011	K=6, α=1.5, β=6	0.549±0.001	0.464±0.001	0.660±0.000	K=3, α=0, β=6
FCM	0.758±0.017	0.515±0.051	0.615±0.032	K=6, m=3.5	0.535±0.000	0.460±0.000	0.657±0.000	K=3, m=2
RKM	0.727±0.013	0.480±0.042	0.588±0.007	K=6, wu=0.2, ε=0.8	0.549±0.000	0.464±0.000	0.660±0.000	K=3, wu=0.4, ε=1
KnK-Means	0.719±0.031	0.522±0.048	0.622±0.033	K=6	0.552±0.001	0.468±0.001	0.664±0.001	K=3
SC	0.778±0.042	0.522±0.069	0.622±0.032	K=6, σ=2	0.501±0.000	0.459±0.000	0.668±0.031	K=3, σ=0.5
Algorithm	Ds2c2sc13				T4
Algorithm	NMI	ARI	F1	Params	NMI	ARI	F1	Params
M3W	0.994	0.994	0.995	k=8, L=3	1.000	1.000	1.000	k=20, L=2
3W-DBSCAN	0.936	0.848	0.995	k=10, η=0.1, ε=0.1	0.000	0.000	0.322	k=20, η=1, ε=1
3W-DPET	0.948	0.842	0.995	K=13, k=13	0.924	0.872	0.897	K=6, k=7
CE3	0.778±0.019	0.567±0.033	0.605±0.010	K=13, q=16	0.713±0.004	0.615±0.019	0.685±0.000	K=6, q=17
NEO-K-Means	0.795±0.017	0.577±0.025	0.615±0.008	K=13, α=0.5, β=3	0.707±0.004	0.607±0.019	0.678±0.014	K=6, α=0, β=6
FCM	0.782±0.016	0.579±0.050	0.616±0.072	K=13, m=2	0.727±0.055	0.636±0.085	0.702±0.089	K=6, m=4
RKM	0.784±0.011	0.472±0.030	0.524±0.043	K=13, wu=0.3, ε=1	0.707±0.005	0.607±0.021	0.678±0.003	K=6, wu=0.1, ε=1
KnK-Means	0.792±0.037	0.626±0.081	0.586±0.060	K=13	0.629±0.045	0.511±0.063	0.600±0.037	K=6
SC	0.799±0.024	0.567±0.070	0.607±0.065	K=13, σ=1	0.588±0.024	0.393±0.020	0.504±0.046	K=6, σ=1.5

Core Results on Real-World Datasets

The results on real-world datasets, transcribed from Table IV, confirm M3W's strong performance.

Table IV: Performance Comparison of M3W on Real-World Datasets (Manual Transcription)

Algorithm	Glass				Dermatology
Algorithm	NMI	ARI	F1	Params	NMI	ARI	F1	Params
M3W	0.811	0.709	0.772	k=5, L=2	0.915	0.849	0.882	k=8, L=6
3W-DBSCAN	0.665	0.458	0.616	k=7, η=0.2, ε=0.1	0.763	0.622	0.712	k=15, η=0.5, ε=0.5
3W-DPET	0.596	0.398	0.590	K=6, k=7	0.910	0.892	0.917	K=6, k=6
CE3	0.699±0.010	0.486±0.010	0.590±0.023	K=6, q=18	0.821±0.093	0.740±0.051	0.789±0.086	K=6, q=15
NEO-K-Means	0.720±0.010	0.531±0.015	0.625±0.017	K=6, α=-0.5, β=6	0.877±0.038	0.738±0.018	0.791±0.128	K=6, α=0.5, β=6
FCM	0.762±0.000	0.570±0.000	0.656±0.000	K=6, m=2	0.784±0.000	0.657±0.000	0.729±0.020	K=6, m=2
RKM	0.730±0.010	0.537±0.014	0.630±0.009	K=6, wu=0.1, ε=0.8	0.872±0.034	0.798±0.009	0.840±0.165	K=6, wu=0.2, ε=0.8
KnK-Means	0.645±0.062	0.460±0.121	0.581±0.026	K=6	0.856±0.047	0.821±0.026	0.856±0.096	K=6
SC	0.726±0.033	0.554±0.043	0.646±0.034	K=6, σ=1.5	0.868±0.017	0.705±0.011	0.763±0.056	K=6, σ=0.5
Algorithm	Digits				MSRA
Algorithm	NMI	ARI	F1	Params	NMI	ARI	F1	Params
M3W	0.879	0.787	0.718	k=14, L=11	0.836	0.622	0.705	k=22, L=8
3W-DBSCAN	0.820	0.710	0.744	k=15, η=0.6, ε=0.5	0.692	0.517	0.644	k=16, η=0.2, ε=0.8
3W-DPET	0.813	0.543	0.610	K=10, k=15	0.310	0.048	0.553	K=12, k=20
CE3	0.749±0.029	0.652±0.061	0.689±0.057	K=10, q=6	0.546±0.053	0.276±0.078	0.196	K=12, q=8
NEO-K-Means	0.742±0.009	0.645±0.027	0.683±0.003	K=10, α=0.5, β=6	0.582±0.031	0.314±0.051	0.352±0.038	K=12, α=0, β=6
FCM	0.405±0.030	0.231±0.035	0.339±0.012	K=10, m=2.5	0.258±0.008	0.119±0.007	0.385±0.032	K=12, m=2
RKM	0.000±0.000	0.000±0.000	0.181±0.000	K=10, wu=0.4, ε=0.8	0.655±0.031	0.440±0.037	0.490±0.037	K=12, wu=0.1, ε=0.8
KnK-Means	0.750±0.030	0.653±0.062	0.690±0.022	K=10	0.604±0.020	0.351±0.031	0.411±0.045	K=12
SC	0.732±0.002	0.665±0.003	0.698±0.002	K=10, σ=1	0.614±0.025	0.408±0.036	0.460±0.015	K=12, σ=1
Algorithm	Segmentation				Optdigits
Algorithm	NMI	ARI	F1	Params	NMI	ARI	F1	Params
M3W	0.728	0.411	0.533	k=30, L=6	0.871	0.814	0.833	k=20, L=12
3W-DBSCAN	0.607	0.253	0.417	k=18, η=0.1, ε=0.2	0.845	0.788	0.810	k=17, η=0.4, ε=0.6
3W-DPET	0.512	0.284	0.428	K=7, k=9	0.817	0.695	0.733	K=10, k=5
CE3	0.557±0.101	0.393±0.118	0.494±0.038	K=7, q=8	0.747±0.034	0.680±0.078	0.713±0.041	K=10, q=5
NEO-K-Means	0.518±0.031	0.337±0.040	0.464±0.043	K=7, α=0.5, β=6	0.745±0.007	0.674±0.026	0.707±0.022	K=10, α=-0.5, β=3
FCM	0.516±0.035	0.391±0.039	0.479±0.034	K=7, m=2	0.406±0.017	0.224±0.032	0.339±0.032	K=10, m=4
RKM	0.561±0.050	0.401±0.061	0.498±0.040	K=7, wu=0.1, ε=1	0.000±0.000	0.000±0.000	0.182±0.000	K=10, ωu=0.4, ε=0.8
KnK-Means	0.518±0.060	0.360±0.062	0.464±0.035	K=7	0.768±0.009	0.686±0.009	0.719±0.031	K=10
SC	0.390±0.027	0.274±0.047	0.384±0.037	K=7, σ=2	0.731±0.007	0.629±0.023	0.668±0.039	K=10, σ=0.5
Algorithm	Statlog				Pendigits
Algorithm	NMI	ARI	F1	Params	NMI	ARI	F1	Params
M3W	0.693	0.686	0.752	k=25, L=12	0.865	0.809	0.827	k=30, L=8
3W-DBSCAN	0.620	0.432	0.580	k=20, η=0.7, ε=0.1	0.732	0.646	0.683	k=20, η=1, ε=0.1
3W-DPET	0.556	0.318	0.510	K=6, k=20	0.706	0.514	0.578	K=10, k=5
CE3	0.584±0.047	0.516±0.066	0.611±0.067	K=6, q=19	0.692±0.008	0.612±0.039	0.654±0.036	K=10, q=17
NEO-K-Means	0.584±0.047	0.528±0.086	0.616±0.053	K=6, α=-0.5, β=6	0.691±0.012	0.595±0.038	0.637±0.027	K=10, α=-1, β=6
FCM	0.584±0.000	0.509±0.000	0.600±0.000	K=6, m=2	0.638±0.026	0.523±0.039	0.577±0.000	K=10, m=2
RKM	0.612±0.058	0.530±0.090	0.617±0.043	K=6, wu=0.4, ε=1	0.266±0.095	0.082±0.082	0.236±0.049	K=10, wu=0.1, ε=0.8
KnK-Means	0.617±0.048	0.530±0.069	0.617±0.071	K=6	0.690±0.012	0.581±0.049	0.624±0.023	K=10
SC	0.617±0.048	0.500±0.052	0.592±0.026	K=6, σ=1	0.679±0.014	0.574±0.037	0.618±0.023	K=10, σ=1
Algorithm	Htru				Shuttle
Algorithm	NMI	ARI	F1	Params	NMI	ARI	F1	Params
M3W	0.554	0.709	0.951	k=25, L=12	0.809	0.773	0.849	k=35, L=5
3W-DBSCAN	0.446	0.554	0.944	k=15, η=0.3, ε=0.1	0.015	0.001	0.558	k=8, η=0.1, ε=0.4
3W-DPET	0.436	0.519	0.941	K=2, k=7	0.449	0.383	0.656	K=3, k=20
CE3	0.353±0.000	0.530±0.000	0.907±0.000	K=2, q=18	0.010±0.003	0.001±0.000	0.558±0.014	K=3, q=17
NEO-K-Means	0.342±0.000	0.530±0.001	0.906±0.000	K=2, α=-0.5, β=6	0.000±0.139	0.001±0.000	0.558±0.014	K=3, α=3, β=6
FCM	0.269±0.000	0.343±0.000	0.819±0.000	K=2, m=4	0.236±0.001	0.244±0.001	0.526±0.000	K=3, m=4
RKM	0.365±0.000	0.579±0.000	0.929±0.000	K=2, wu=0.7, ε=0.8	0.384±0.102	0.364±0.099	0.591±0.061	K=3, wu=0.5, ε=0.8
KnK-Means	0.290±0.000	0.400±0.000	0.848±0.000	K=2	0.309±0.038	0.293±0.027	0.582±0.008	K=3
SC	0.376±0.000	0.571±0.000	0.917±0.000	K=2, σ=0.5	0.010±0.000	0.002±0.000	0.558±0.000	K=3, σ=0.5

Key Findings: M3W achieves the highest NMI scores on all 10 real-world datasets. It also achieves the best ARI and F1 scores on at least 8 of them. The performance gap is particularly large on datasets like Pendigits, Htru, and Shuttle, where M3W outperforms the second-best algorithm by a significant margin. This confirms that the benefits of its multistep approach are not limited to synthetic data but translate to complex, high-dimensional real-world problems.

Running Time

Figure 11 compares the running time of M3W with other three-way clustering algorithms.

$Fig. 11. Running time comparison. (a) Linear scale on the $y$ -axis. (b) Log scale on the $y$ axis.$ 该图像是图11，展示了不同算法的运行时间比较。图(a)采用线性Y轴，图(b)采用对数Y轴。随着数据量的增加，M3W、3W-DBSCAN、CE3和3WDPET四种算法的运行时间均呈上升趋势。在绝大多数数据点上，M3W的运行时间优于其他三种算法，尤其在对数尺度下M3W（红色线）的效率优势更明显，验证了其在效率方面的优势。

3WDPET is significantly slower than the others.
On a log scale, M3W is shown to be faster than 3W-DBSCAN and CE3 in most cases.
The performance advantage of M3W becomes more pronounced as the number of data instances increases, indicating better scalability. This aligns with its $O(N log N)$ complexity.

7. Conclusion & Reflections

Conclusion Summary: The paper successfully introduces M3W, a novel three-way clustering algorithm that addresses fundamental limitations of existing methods. By incorporating a progressive erosion strategy to create a multilevel data structure and a multistep allocation scheme inspired by sequential three-way decisions, M3W can adaptively and accurately identify clusters, especially those with complex shapes and ambiguous boundaries. Extensive experiments validate its superior performance and efficiency compared to a wide range of competitors.
Limitations & Future Work: The authors identify several directions for future research:
1. Online Learning: Extending M3W to handle streaming data.
2. Automatic Parameter Determination: Developing methods to automatically select the optimal values for the number of neighbors ( $k$ ) and the number of levels ( $L$ ).
3. General Framework: Building a more general framework for sequential three-way clustering that could be applied to different types of algorithms.
Personal Insights & Critique:
- Novelty and Strength: The main strength of M3W is its elegant fusion of ideas from sequential decision theory and density-based clustering. The multistep, inside-out clustering process is intuitive and powerful. It provides a robust way to handle boundary points by allowing them to "wait" for more information (i.e., for their neighbors to be classified first). This is a significant conceptual advance over methods that rely on a single, global thresholding step.
- Potential Limitations:
  - Parameter Sensitivity: While effective, the algorithm still has two key parameters, $k$ and $L$ , that require tuning. The paper provides a range for them but doesn't offer a principled way to set them, a common challenge in clustering.
  - Density Assumption: Like other density-based methods, M3W may struggle with datasets containing clusters of vastly different densities, although the dynamic density estimation and adaptive thresholds are designed to mitigate this.
- Overall Impact: M3W presents a strong and well-motivated contribution to the field of three-way clustering. Its core idea of sequential, information-gathering allocation could inspire new approaches in other areas of unsupervised learning where uncertainty and ambiguity are prevalent. The work is a prime example of how concepts from decision theory can enrich machine learning models.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.

M3W: Multistep Three-Way Clustering

TL;DR Summary

Abstract

Mind Map

In-depth Reading

English Analysis~14 min read · 21,213 chars

1. Bibliographic Information

2. Executive Summary

3. Prerequisite Knowledge & Related Work

Foundational Concepts

Previous Works and Differentiation

4. Methodology (Core Technology & Implementation)

A. Dynamic Density Estimation

B. Progressive Erosion Strategy

C. Multistep Clustering Process

D. Algorithm and Complexity Analysis

5. Experimental Setup

6. Results & Analysis

Core Results on Synthetic Datasets

Core Results on Real-World Datasets

Running Time

7. Conclusion & Reflections

Similar papers