M3W: Multistep Three-Way Clustering
TL;DR Summary
M3W tackles limitations in three-way clustering (information deficiency, fixed thresholds) by proposing a progressive erosion strategy to build a multilevel data structure. Coupled with a multistep allocation considering neighborhood information, it gradually acquires more data,
Abstract
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 4, APRIL 2024 5627 M3W: Multistep Three-Way Clustering Mingjing Du , Member, IEEE , Jingqi Zhao, Jiarui Sun, and Yongquan Dong Abstract — Three-way clustering has been an active research topic in the field of cluster analysis in recent years. Some efforts are focused on the technique due to its feasibility and rationality. We observe, however, that the existing three-way clustering algorithms struggle to obtain more information and limit the fault tolerance excessively. Moreover, although the one-step three-way allocation based on a pair of fixed, global thresholds is the most straightforward way to generate the three-way cluster representations, the clusters derived from a pair of global thresholds cannot exactly reveal the inherent clustering structure of the dataset, and the threshold values are often difficult to determine beforehand. Inspired by sequential three-way decisions, we propose an algorithm, called multistep three-way clustering (M3W), to address these issues. Specifically, we first use a progressive erosion strategy to
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
- Title: M3W: Multistep Three-Way Clustering
- Authors: Mingjing Du (Member, IEEE), Jingqi Zhao, Jiarui Sun, and Yongquan Dong. Their affiliations are with Jiangsu Normal University, Xuzhou, China, and Shandong University, Jinan, China.
- Journal/Conference: The paper mentions "Member, IEEE" and uses a standard IEEE transaction format, suggesting publication in a reputable IEEE journal, likely related to data mining, cybernetics, or knowledge systems.
- Publication Year: The specific publication date is not provided in the text, but the references include papers up to 2021, suggesting the work was likely published in or after 2021.
- Abstract: The paper addresses key limitations in existing three-way clustering algorithms, namely their difficulty in leveraging sufficient information for ambiguous data points and their reliance on fixed, global thresholds that are hard to determine. To solve this, the authors propose a novel algorithm called Multistep Three-Way Clustering (M3W), inspired by sequential three-way decisions. M3W first employs a progressive erosion strategy to create a multilevel data structure, separating high-density core regions from lower-density boundary regions. It then uses a multistep allocation strategy that considers neighborhood information to gradually assign data points, starting from the core and moving outwards. This allows the algorithm to adaptively capture the inherent clustering structure of the data. The paper demonstrates through experiments on 18 benchmark datasets that M3W outperforms eight competing algorithms.
- Original Source Link: /files/papers/68ef675cc4954f31b6b55d82/paper.pdf (Formally published paper).
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: Traditional clustering methods, often called "two-way clustering," force each data point to belong to exactly one cluster. This is problematic for data with overlapping regions or ambiguous boundaries. Three-way clustering addresses this by introducing a "boundary region" for points that might belong to multiple clusters.
- Identified Gaps: Existing three-way clustering algorithms suffer from two main weaknesses:
- They use a one-step allocation process with fixed, global thresholds. This approach is not adaptive to clusters with different shapes, densities, or distributions and struggles to handle complex boundaries. Determining these global thresholds is also a significant challenge.
- They often fail to gather sufficient information for points in the boundary regions, leading to potential misclassifications and limiting their fault tolerance.
- Innovation: The paper introduces a novel approach inspired by sequential three-way decisions. Instead of making an immediate, final decision for every data point, this paradigm allows for deferring judgment on uncertain points until more information becomes available. The M3W algorithm applies this concept by processing data in stages, from the most certain (cluster cores) to the least certain (cluster boundaries).
-
Main Contributions / Findings (What):
- A Novel Algorithm (M3W): The paper proposes
M3W, a multistep three-way clustering algorithm that combines a progressive erosion strategy with the principles of sequential three-way decision-making. This iterative process is better suited for handling clusters with complex distributions and unclear boundaries. - Multilevel Structure via Progressive Erosion: M3W uses a progressive erosion technique to peel away the outer, lower-density layers of data points, effectively creating a multilevel structure. This process clearly separates the high-density core regions of different clusters, which are easier to identify correctly.
- Multistep Allocation Strategy: Leveraging the multilevel structure, M3W employs a multistep clustering process. It first clusters the high-confidence core data and then iteratively assigns the eroded (boundary) data points level by level. This allows boundary points to gather information from their already-classified neighbors, increasing the probability of correct assignment.
- Superior Performance: Experimental results on 18 synthetic and real-world datasets show that
M3Wconsistently outperforms eight state-of-the-art and benchmark clustering algorithms, demonstrating its effectiveness and robustness.
- A Novel Algorithm (M3W): The paper proposes
3. Prerequisite Knowledge & Related Work
Foundational Concepts
-
Clustering: An unsupervised machine learning task that involves grouping a set of objects (data points) in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups.
-
Two-Way Clustering: The most common form of clustering, where every data point is assigned to at most one cluster. The relationship is binary: a point either "belongs to" or "does not belong to" a cluster.
-
Three-Way Decision Theory: A framework for problem-solving that mimics human triadic thinking. Instead of making a binary decision (e.g., accept/reject), it divides a set into three regions: a positive region (accept), a negative region (reject), and a boundary region (defer judgment and seek more information). This theory originated from the interpretation of rough sets.
-
Three-Way Clustering: An extension of clustering based on three-way decision theory. For any given cluster, a data point can:
-
Belong-to certainly (Positive Region / Core Region): The point is definitively part of the cluster.
-
Belong-to partially (Boundary Region): The point lies in an overlapping or ambiguous area and may belong to this cluster and potentially others.
-
Not belong-to certainly (Negative Region): The point is definitively not part of the cluster. As shown in Image 2, this allows for a more nuanced representation of clusters, especially where they overlap. A cluster is represented by an interval set , where is the core region and is the core plus the boundary region.
该图像是示意图,展示了两种不同的数据聚类场景。(a) 部分显示了两个相邻的聚类C1和C2,其中一些数据点位于两者之间,难以明确归属。 (b) 部分则描绘了多步三向聚类(M3W)策略下的聚类结果,通过内层核心聚类(如)和外层区域(如)的划分,更精细地揭示了数据的多层结构,并有助于更准确地处理边界点。
-
-
Sequential Three-Way Decisions: An extension of the three-way decision theory where decisions are not made in a single step. Instead, it uses a multilevel or iterative process. At each step, confident decisions (accept/reject) are made, while uncertain cases are passed to the next level for re-evaluation with more information or at a finer granularity.
M3Wis directly inspired by this concept. -
Density-Based Clustering: A family of algorithms (like
DBSCAN) that identify clusters as dense regions of data points separated by sparser regions.M3Wuses density estimation as a core component. -
Progressive Erosion: A technique, borrowed from mathematical morphology, where the outer layers of a data distribution are iteratively "eroded" or removed. In this paper, low-density points are considered the outer layers and are peeled away to reveal the high-density cores of clusters.
Previous Works and Differentiation
The paper positions M3W against several categories of existing work:
-
Extensions of Classic Algorithms:
CE3: A three-way clustering framework based on K-means that uses contraction and expansion.3W-DBSCAN: A three-way extension ofDBSCANthat uses a pair of density thresholds.TWKMandTCM: Other variations of three-way K-means.3WDPET: A three-way version of density peak clustering.- Limitation: The performance of these methods is heavily tied to the underlying algorithms (
K-means,DBSCAN). Furthermore,CE3,TWKM,TCM, and3WDPETrequire the number of clusters to be specified in advance.
-
Threshold-Based Methods: Most existing three-way clustering algorithms rely on a pair of fixed, global thresholds to partition data into core and boundary regions.
- Limitation: A single pair of global thresholds cannot adapt to clusters with varying densities and distributions. The paper argues this is a fundamental flaw.
-
Automatic Threshold Selection Methods: Some algorithms like
3WC-OR(using genetic algorithms) and another by Yu et al. (using universal gravitation) attempt to automatically determine the thresholds.- Differentiation: While these methods automate threshold selection, they still rely on a one-step allocation process driven by these estimated thresholds. In contrast,
M3Wuses a multistep, adaptive allocation scheme where decisions are made progressively, level by level, without relying on a single pair of global thresholds.
- Differentiation: While these methods automate threshold selection, they still rely on a one-step allocation process driven by these estimated thresholds. In contrast,
4. Methodology (Core Technology & Implementation)
The M3W algorithm is structured into three main phases, as illustrated in Image 1.
该图像是M3W算法的架构示意图(图2),展示了从输入到输出的聚类流程。它首先通过渐进式“侵蚀过程”构建数据的多层级结构。接着,在聚类阶段,结合基于连接的聚类和三支聚类策略,逐步对数据点进行识别和分配,以增加正确分配的概率。最终,原始输入数据被清晰地划分为两个不同的聚类输出,有效揭示了数据集的内在结构。
A. Dynamic Density Estimation
Instead of using a static density measure, M3W re-estimates the density of the remaining data points at each level of the erosion process. This dynamic approach allows the density estimation to adapt as the outer, sparser points are removed. The density of a data point at level is calculated using an adaptive Gaussian kernel:
- : Density of point at level .
- : The set of reverse k-nearest neighbors of at level . A point is a reverse neighbor of if is one of the k-nearest neighbors of . This measure helps identify the local density structure.
- : Euclidean distance between points and .
- : The -th nearest neighbor of at level . The denominator term acts as an adaptive bandwidth, which varies based on the local data density around .
B. Progressive Erosion Strategy
This phase constructs the multilevel structure of the data. It's an iterative process over levels.
- Density Calculation: At each level , the density of all remaining points is calculated using the dynamic density estimation formula above.
- Erosion: Points with the lowest densities are identified as boundary points and are "eroded." Specifically, points whose density is below a cutoff value are removed. The paper sets such that the of points with the lowest densities are eroded at each level.
- Update: The set of remaining points for the next level is updated: After levels, the remaining data, , represents the high-confidence "initial" cores of the clusters. The eroded sets form the different layers of the boundary regions.
C. Multistep Clustering Process
This phase reconstructs the clusters in reverse order of erosion, from the inside out.
1) Connectivity-Based Approach for Core Clustering
The initial cores, , are clustered first. This is done using a connectivity-based method inspired by DBSCAN and HDBSCAN.
- Core Distance (): The distance from a point to its -th nearest neighbor.
- Adaptive Distance Threshold (): A dynamic threshold that adapts based on neighborhood information from previously eroded layers. For the initial level (), it's based on the mean and standard deviation of core distances. For subsequent levels, it's the minimum of the previous level's threshold and a value derived from the distances to its nearest eroded neighbors. This allows the threshold to shrink in dense areas.
- Mutual Reachability: Two core points and are considered mutually reachable if there's a path of core points between them, where the distance between any two adjacent points on the path is less than or equal to their mutual reachability distance threshold. All mutually reachable points in are merged to form the initial core regions of the clusters, denoted as .
2) Three-Way Approach for Boundary Assignment
Next, the eroded points are assigned back in reverse order: . For each level (from down to 1), the points in are assigned using a "two-stage" allocation scheme based on their neighbors in the already-clustered, higher-level data ().
-
Probability Calculation: For an unassigned point , the probability of it belonging to an existing cluster core is calculated based on the labels of its nearest neighbors that are already in a core region (
core k-nearest neighbors). This is essentially the fraction of its core neighbors that belong to cluster . -
Rule 1 (First Stage):
- If (all core neighbors belong to a single cluster ), is assigned to the core region of .
- If but it's the maximum probability, is temporarily assigned to a "candidate" boundary region of for further processing.
-
Rule 2 (Second Stage): For a point in a "candidate" boundary region of cluster :
-
If there exists another cluster such that the probability gap is small (), the point is considered truly ambiguous. It is assigned to the boundary region of and all other clusters that satisfy this condition.
-
Otherwise, the point is considered confident enough and is assigned to the core region of .
This process repeats until all points in are assigned, completing the clustering.
-
D. Algorithm and Complexity Analysis
The full procedure is summarized in Algorithm 1.
The computational complexity is dominated by the -NN searches during the erosion phase. With efficient data structures like k-d trees, the complexity is approximately , where is the number of levels, is the number of neighbors, and is the average number of points per level. Since and are much smaller than the total number of points , the overall complexity is roughly , making it efficient for large datasets.
5. Experimental Setup
-
Datasets: The evaluation uses 18 benchmark datasets:
- 8 Synthetic Datasets:
Triangle1,Triangle2,S1,S2,T2,Pathbased,Ds2c2sc13, andT4. These are 2D datasets designed to test clustering algorithms on various challenges like different densities, shapes, and overlapping regions. - 10 Real-World Datasets:
Glass,Dermatology,Digits,MSRA,Segmentation,Optdigits,Statlog,Pendigits,Htru, andShuttlefrom repositories like UCI. The details are transcribed in the table below, as per the paper's Table II. Table II: Datasets Used in Experiments (Manual Transcription)
Data sets #Instances #Features #Classes Triangle1 1000 2 4 Triangle2 1000 2 4 S1 5000 2 15 S2 5000 2 15 T2 4000 2 6 Pathbased 300 2 3 Ds2c2sc13 588 2 13 T4 7236 2 6 Glass 214 10 6 Dermatology 366 34 6 Digits 1797 64 10 MSRA 1799 256 12 Segmentation 2310 19 7 Optdigits 5620 64 10 Statlog 6435 36 6 Pendigits 10992 16 10 Htru 17898 8 2 Shuttle 22170 9 3 - 8 Synthetic Datasets:
-
Evaluation Metrics:
-
Normalized Mutual Information (NMI):
- Conceptual Definition: Measures the "mutual information" (statistical dependency) between the predicted clustering and the true ground-truth labels, normalized to a value between 0 and 1. A score of 1 means a perfect match, while 0 means the clustering is independent of the true labels.
- Mathematical Formula:
- Symbol Explanation:
- : The ground-truth class labels.
- : The predicted cluster labels.
I(Y, C): The mutual information between and .H(Y)andH(C): The entropy of the ground-truth labels and cluster labels, respectively.
-
Adjusted Rand Index (ARI):
- Conceptual Definition: Measures the similarity between two data clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in both the predicted and true clusterings. It is "adjusted for chance," so a random clustering will have an ARI close to 0. The range is -1 to 1, with 1 indicating a perfect match.
- Mathematical Formula:
- Symbol Explanation:
- : Number of objects in true class and predicted cluster .
- : Sum of over .
- : Sum of over .
- : Total number of pairs of objects.
-
Pairwise F1 Score:
- Conceptual Definition: Views clustering as a series of pairwise decisions. For each pair of points, it checks if they are correctly placed together or correctly separated. The F1 score is the harmonic mean of pairwise precision and recall.
- Mathematical Formula:
- Symbol Explanation:
- TP (True Positives): Number of pairs of points that are in the same cluster in both ground truth and prediction.
- FP (False Positives): Number of pairs that are in the same cluster in the prediction but not in the ground truth.
- FN (False Negatives): Number of pairs that are not in the same cluster in the prediction but are in the ground truth.
-
-
Baselines:
M3Wis compared against eight algorithms:- Three-Way:
3W-DBSCAN,3W-DPET,CE3. - Overlapping/Soft:
NEO-K-Means,Fuzzy C-means (FCM),Rough K-Means (RKM). - Two-Way:
Kernel K-means (KnK-Means),Spectral Clustering (SC).
- Three-Way:
6. Results & Analysis
Core Results on Synthetic Datasets
The visual results on synthetic datasets demonstrate M3W's ability to handle complex structures.
-
On
Triangle1andTriangle2(Gaussian clusters),M3Wperfectly or near-perfectly identifies the clusters, even when the margins are small (Image 8 and 9). -
On
S1andS2(many overlapping clusters),M3Wshows strong performance, whereas3W-DBSCANfails onS2by merging adjacent clusters.M3W's erosion strategy effectively prevents this incorrect linking. -
On
Pathbased(a challenging dataset with non-convex shapes),M3Wis the only algorithm that gives a satisfactory result, correctly separating the half-circle from the two inner clusters. -
On
T4(complex shapes),M3Wis again the only one to find the optimal structure (Image 3).
该图像是图10,展示了T4数据集上的聚类结果。它比较了M3W算法与3W-DBSCAN、3W-DPET等八种竞争算法的性能。M3W(a)清晰地识别出四种聚类结构,其结果比其他算法更准确地反映了数据的内在分布,验证了其优势。
The quantitative results are transcribed from Table III below. M3W achieves the highest scores on nearly all synthetic datasets across all three metrics.
Table III: Performance Comparison of M3W on Synthetic Datasets (Manual Transcription)
| Algorithm | Triangle1 | Triangle2 | ||||||
| NMI | ARI | F1 | Params | NMI | ARI | F1 | Params | |
| M3W | 1.000 | 1.000 | 1.000 | k=21, L=3 | 0.987 | 0.990 | 0.993 | k=25, L=4 |
| 3W-DBSCAN | 1.000 | 1.000 | 1.000 | k=5, η=0.1, ε=0.1 | 0.974 | 0.986 | 0.993 | k=15, η=0.4, ε=0.1 |
| 3W-DPET | 1.000 | 1.000 | 1.000 | K=4, k=5 | 0.968 | 0.981 | 0.993 | K=4, k=5 |
| CE3 | 0.948±0.000 | 0.970±0.000 | 1.000±0.000 | K=4, q=6 | 0.859±0.000 | 0.917±0.000 | 0.993±0.000 | K=4, q=14 |
| NEO-K-Means | 0.952±0.000 | 0.963±0.000 | 1.000±0.000 | K=4, α=2, β=6 | 0.876±0.000 | 0.893±0.000 | 0.993±0.000 | K=4, α=1.5, β=3 |
| FCM | 0.949±0.000 | 0.961±0.000 | 0.971±0.000 | K=4, m=2 | 0.871±0.000 | 0.885±0.000 | 0.919±0.000 | K=4, m=2 |
| RKM | 0.952±0.000 | 0.963±0.000 | 0.972±0.000 | K=4, wu=0.2, ε=1 | 0.869±0.000 | 0.882±0.000 | 0.916±0.000 | K=4, wu=0.4, ε=1 |
| KnK-Means | 0.978±0.108 | 0.984±0.185 | 0.988±0.118 | K=4 | 0.886±0.056 | 0.901±0.065 | 0.938±0.004 | K=4 |
| SC | 0.995±0.000 | 0.997±0.000 | 0.996±0.000 | K=4, σ=0.5 | 0.901±0.001 | 0.919±0.001 | 0.953±0.001 | K=4, σ=0.5 |
| Algorithm | S1 | S2 | ||||||
| NMI | ARI | F1 | Params | NMI | ARI | F1 | Params | |
| M3W | 0.989 | 0.989 | 0.990 | k=25, L=12 | 0.946 | 0.938 | 0.922 | k=30, L=12 |
| 3W-DBSCAN | 0.821 | 0.507 | 0.990 | k=17, η=0.2, ε=0.1 | 0.000 | 0.000 | 0.922 | k=20, η=1, ε=1 |
| 3W-DPET | 0.819 | 0.587 | 0.990 | K=15, k=20 | 0.656 | 0.337 | 0.922 | K=15, k=20 |
| CE3 | 0.979±0.025 | 0.982±0.076 | 0.990±0.033 | K=15, q=8 | 0.898±0.019 | 0.840±0.057 | 0.854±0.068 | K=15, q=8 |
| NEO-K-Means | 0.986±0.008 | 0.986±0.010 | 0.990±0.046 | K=15, α=1, β=6 | 0.946±0.008 | 0.937±0.053 | 0.941±0.028 | K=15, α=1, β=6 |
| FCM | 0.951±0.017 | 0.890±0.042 | 0.898±0.036 | K=15, m=2 | 0.946±0.010 | 0.938±0.014 | 0.942±0.032 | K=15, m=2.5 |
| RKM | 0.962±0.017 | 0.908±0.045 | 0.914±0.007 | K=15, wu=0.3, ε=1 | 0.922±0.024 | 0.901±0.055 | 0.908±0.049 | K=15, wu=0.4, ε=0.8 |
| KnK-Means | 0.861±0.017 | 0.662±0.029 | 0.691±0.089 | K=15 | 0.820±0.034 | 0.622±0.072 | 0.654±0.082 | K=15 |
| SC | 0.950±0.007 | 0.898±0.027 | 0.905±0.037 | K=15, σ=2 | 0.905±0.021 | 0.845±0.062 | 0.856±0.050 | K=15, σ=1.5 |
| Algorithm | T2 | Pathbased | ||||||
| NMI | ARI | F1 | Params | NMI | ARI | F1 | Params | |
| M3W | 0.986 | 0.995 | 0.997 | k=24, L=2 | 0.935 | 0.960 | 0.973 | k=9, L=4 |
| 3W-DBSCAN | 0.873 | 0.809 | 0.997 | k=20, η=0.9, ε=0.1 | 0.810 | 0.801 | 0.973 | k=5, η=0.4, ε=0.1 |
| 3W-DPET | 0.890 | 0.767 | 0.997 | K=6, k=10 | 0.781 | 0.732 | 0.973 | K=3, k=8 |
| CE3 | 0.806±0.036 | 0.640±0.082 | 0.717±0.055 | K=6, q=9 | 0.550±0.008 | 0.473±0.002 | 0.668±0.001 | K=3, q=18 |
| NEO-K-Means | 0.789±0.028 | 0.614±0.028 | 0.706±0.011 | K=6, α=1.5, β=6 | 0.549±0.001 | 0.464±0.001 | 0.660±0.000 | K=3, α=0, β=6 |
| FCM | 0.758±0.017 | 0.515±0.051 | 0.615±0.032 | K=6, m=3.5 | 0.535±0.000 | 0.460±0.000 | 0.657±0.000 | K=3, m=2 |
| RKM | 0.727±0.013 | 0.480±0.042 | 0.588±0.007 | K=6, wu=0.2, ε=0.8 | 0.549±0.000 | 0.464±0.000 | 0.660±0.000 | K=3, wu=0.4, ε=1 |
| KnK-Means | 0.719±0.031 | 0.522±0.048 | 0.622±0.033 | K=6 | 0.552±0.001 | 0.468±0.001 | 0.664±0.001 | K=3 |
| SC | 0.778±0.042 | 0.522±0.069 | 0.622±0.032 | K=6, σ=2 | 0.501±0.000 | 0.459±0.000 | 0.668±0.031 | K=3, σ=0.5 |
| Algorithm | Ds2c2sc13 | T4 | ||||||
| NMI | ARI | F1 | Params | NMI | ARI | F1 | Params | |
| M3W | 0.994 | 0.994 | 0.995 | k=8, L=3 | 1.000 | 1.000 | 1.000 | k=20, L=2 |
| 3W-DBSCAN | 0.936 | 0.848 | 0.995 | k=10, η=0.1, ε=0.1 | 0.000 | 0.000 | 0.322 | k=20, η=1, ε=1 |
| 3W-DPET | 0.948 | 0.842 | 0.995 | K=13, k=13 | 0.924 | 0.872 | 0.897 | K=6, k=7 |
| CE3 | 0.778±0.019 | 0.567±0.033 | 0.605±0.010 | K=13, q=16 | 0.713±0.004 | 0.615±0.019 | 0.685±0.000 | K=6, q=17 |
| NEO-K-Means | 0.795±0.017 | 0.577±0.025 | 0.615±0.008 | K=13, α=0.5, β=3 | 0.707±0.004 | 0.607±0.019 | 0.678±0.014 | K=6, α=0, β=6 |
| FCM | 0.782±0.016 | 0.579±0.050 | 0.616±0.072 | K=13, m=2 | 0.727±0.055 | 0.636±0.085 | 0.702±0.089 | K=6, m=4 |
| RKM | 0.784±0.011 | 0.472±0.030 | 0.524±0.043 | K=13, wu=0.3, ε=1 | 0.707±0.005 | 0.607±0.021 | 0.678±0.003 | K=6, wu=0.1, ε=1 |
| KnK-Means | 0.792±0.037 | 0.626±0.081 | 0.586±0.060 | K=13 | 0.629±0.045 | 0.511±0.063 | 0.600±0.037 | K=6 |
| SC | 0.799±0.024 | 0.567±0.070 | 0.607±0.065 | K=13, σ=1 | 0.588±0.024 | 0.393±0.020 | 0.504±0.046 | K=6, σ=1.5 |
Core Results on Real-World Datasets
The results on real-world datasets, transcribed from Table IV, confirm M3W's strong performance.
Table IV: Performance Comparison of M3W on Real-World Datasets (Manual Transcription)
| Algorithm | Glass | Dermatology | ||||||
| NMI | ARI | F1 | Params | NMI | ARI | F1 | Params | |
| M3W | 0.811 | 0.709 | 0.772 | k=5, L=2 | 0.915 | 0.849 | 0.882 | k=8, L=6 |
| 3W-DBSCAN | 0.665 | 0.458 | 0.616 | k=7, η=0.2, ε=0.1 | 0.763 | 0.622 | 0.712 | k=15, η=0.5, ε=0.5 |
| 3W-DPET | 0.596 | 0.398 | 0.590 | K=6, k=7 | 0.910 | 0.892 | 0.917 | K=6, k=6 |
| CE3 | 0.699±0.010 | 0.486±0.010 | 0.590±0.023 | K=6, q=18 | 0.821±0.093 | 0.740±0.051 | 0.789±0.086 | K=6, q=15 |
| NEO-K-Means | 0.720±0.010 | 0.531±0.015 | 0.625±0.017 | K=6, α=-0.5, β=6 | 0.877±0.038 | 0.738±0.018 | 0.791±0.128 | K=6, α=0.5, β=6 |
| FCM | 0.762±0.000 | 0.570±0.000 | 0.656±0.000 | K=6, m=2 | 0.784±0.000 | 0.657±0.000 | 0.729±0.020 | K=6, m=2 |
| RKM | 0.730±0.010 | 0.537±0.014 | 0.630±0.009 | K=6, wu=0.1, ε=0.8 | 0.872±0.034 | 0.798±0.009 | 0.840±0.165 | K=6, wu=0.2, ε=0.8 |
| KnK-Means | 0.645±0.062 | 0.460±0.121 | 0.581±0.026 | K=6 | 0.856±0.047 | 0.821±0.026 | 0.856±0.096 | K=6 |
| SC | 0.726±0.033 | 0.554±0.043 | 0.646±0.034 | K=6, σ=1.5 | 0.868±0.017 | 0.705±0.011 | 0.763±0.056 | K=6, σ=0.5 |
| Algorithm | Digits | MSRA | ||||||
| NMI | ARI | F1 | Params | NMI | ARI | F1 | Params | |
| M3W | 0.879 | 0.787 | 0.718 | k=14, L=11 | 0.836 | 0.622 | 0.705 | k=22, L=8 |
| 3W-DBSCAN | 0.820 | 0.710 | 0.744 | k=15, η=0.6, ε=0.5 | 0.692 | 0.517 | 0.644 | k=16, η=0.2, ε=0.8 |
| 3W-DPET | 0.813 | 0.543 | 0.610 | K=10, k=15 | 0.310 | 0.048 | 0.553 | K=12, k=20 |
| CE3 | 0.749±0.029 | 0.652±0.061 | 0.689±0.057 | K=10, q=6 | 0.546±0.053 | 0.276±0.078 | 0.196 | K=12, q=8 |
| NEO-K-Means | 0.742±0.009 | 0.645±0.027 | 0.683±0.003 | K=10, α=0.5, β=6 | 0.582±0.031 | 0.314±0.051 | 0.352±0.038 | K=12, α=0, β=6 |
| FCM | 0.405±0.030 | 0.231±0.035 | 0.339±0.012 | K=10, m=2.5 | 0.258±0.008 | 0.119±0.007 | 0.385±0.032 | K=12, m=2 |
| RKM | 0.000±0.000 | 0.000±0.000 | 0.181±0.000 | K=10, wu=0.4, ε=0.8 | 0.655±0.031 | 0.440±0.037 | 0.490±0.037 | K=12, wu=0.1, ε=0.8 |
| KnK-Means | 0.750±0.030 | 0.653±0.062 | 0.690±0.022 | K=10 | 0.604±0.020 | 0.351±0.031 | 0.411±0.045 | K=12 |
| SC | 0.732±0.002 | 0.665±0.003 | 0.698±0.002 | K=10, σ=1 | 0.614±0.025 | 0.408±0.036 | 0.460±0.015 | K=12, σ=1 |
| Algorithm | Segmentation | Optdigits | ||||||
| NMI | ARI | F1 | Params | NMI | ARI | F1 | Params | |
| M3W | 0.728 | 0.411 | 0.533 | k=30, L=6 | 0.871 | 0.814 | 0.833 | k=20, L=12 |
| 3W-DBSCAN | 0.607 | 0.253 | 0.417 | k=18, η=0.1, ε=0.2 | 0.845 | 0.788 | 0.810 | k=17, η=0.4, ε=0.6 |
| 3W-DPET | 0.512 | 0.284 | 0.428 | K=7, k=9 | 0.817 | 0.695 | 0.733 | K=10, k=5 |
| CE3 | 0.557±0.101 | 0.393±0.118 | 0.494±0.038 | K=7, q=8 | 0.747±0.034 | 0.680±0.078 | 0.713±0.041 | K=10, q=5 |
| NEO-K-Means | 0.518±0.031 | 0.337±0.040 | 0.464±0.043 | K=7, α=0.5, β=6 | 0.745±0.007 | 0.674±0.026 | 0.707±0.022 | K=10, α=-0.5, β=3 |
| FCM | 0.516±0.035 | 0.391±0.039 | 0.479±0.034 | K=7, m=2 | 0.406±0.017 | 0.224±0.032 | 0.339±0.032 | K=10, m=4 |
| RKM | 0.561±0.050 | 0.401±0.061 | 0.498±0.040 | K=7, wu=0.1, ε=1 | 0.000±0.000 | 0.000±0.000 | 0.182±0.000 | K=10, ωu=0.4, ε=0.8 |
| KnK-Means | 0.518±0.060 | 0.360±0.062 | 0.464±0.035 | K=7 | 0.768±0.009 | 0.686±0.009 | 0.719±0.031 | K=10 |
| SC | 0.390±0.027 | 0.274±0.047 | 0.384±0.037 | K=7, σ=2 | 0.731±0.007 | 0.629±0.023 | 0.668±0.039 | K=10, σ=0.5 |
| Algorithm | Statlog | Pendigits | ||||||
| NMI | ARI | F1 | Params | NMI | ARI | F1 | Params | |
| M3W | 0.693 | 0.686 | 0.752 | k=25, L=12 | 0.865 | 0.809 | 0.827 | k=30, L=8 |
| 3W-DBSCAN | 0.620 | 0.432 | 0.580 | k=20, η=0.7, ε=0.1 | 0.732 | 0.646 | 0.683 | k=20, η=1, ε=0.1 |
| 3W-DPET | 0.556 | 0.318 | 0.510 | K=6, k=20 | 0.706 | 0.514 | 0.578 | K=10, k=5 |
| CE3 | 0.584±0.047 | 0.516±0.066 | 0.611±0.067 | K=6, q=19 | 0.692±0.008 | 0.612±0.039 | 0.654±0.036 | K=10, q=17 |
| NEO-K-Means | 0.584±0.047 | 0.528±0.086 | 0.616±0.053 | K=6, α=-0.5, β=6 | 0.691±0.012 | 0.595±0.038 | 0.637±0.027 | K=10, α=-1, β=6 |
| FCM | 0.584±0.000 | 0.509±0.000 | 0.600±0.000 | K=6, m=2 | 0.638±0.026 | 0.523±0.039 | 0.577±0.000 | K=10, m=2 |
| RKM | 0.612±0.058 | 0.530±0.090 | 0.617±0.043 | K=6, wu=0.4, ε=1 | 0.266±0.095 | 0.082±0.082 | 0.236±0.049 | K=10, wu=0.1, ε=0.8 |
| KnK-Means | 0.617±0.048 | 0.530±0.069 | 0.617±0.071 | K=6 | 0.690±0.012 | 0.581±0.049 | 0.624±0.023 | K=10 |
| SC | 0.617±0.048 | 0.500±0.052 | 0.592±0.026 | K=6, σ=1 | 0.679±0.014 | 0.574±0.037 | 0.618±0.023 | K=10, σ=1 |
| Algorithm | Htru | Shuttle | ||||||
| NMI | ARI | F1 | Params | NMI | ARI | F1 | Params | |
| M3W | 0.554 | 0.709 | 0.951 | k=25, L=12 | 0.809 | 0.773 | 0.849 | k=35, L=5 |
| 3W-DBSCAN | 0.446 | 0.554 | 0.944 | k=15, η=0.3, ε=0.1 | 0.015 | 0.001 | 0.558 | k=8, η=0.1, ε=0.4 |
| 3W-DPET | 0.436 | 0.519 | 0.941 | K=2, k=7 | 0.449 | 0.383 | 0.656 | K=3, k=20 |
| CE3 | 0.353±0.000 | 0.530±0.000 | 0.907±0.000 | K=2, q=18 | 0.010±0.003 | 0.001±0.000 | 0.558±0.014 | K=3, q=17 |
| NEO-K-Means | 0.342±0.000 | 0.530±0.001 | 0.906±0.000 | K=2, α=-0.5, β=6 | 0.000±0.139 | 0.001±0.000 | 0.558±0.014 | K=3, α=3, β=6 |
| FCM | 0.269±0.000 | 0.343±0.000 | 0.819±0.000 | K=2, m=4 | 0.236±0.001 | 0.244±0.001 | 0.526±0.000 | K=3, m=4 |
| RKM | 0.365±0.000 | 0.579±0.000 | 0.929±0.000 | K=2, wu=0.7, ε=0.8 | 0.384±0.102 | 0.364±0.099 | 0.591±0.061 | K=3, wu=0.5, ε=0.8 |
| KnK-Means | 0.290±0.000 | 0.400±0.000 | 0.848±0.000 | K=2 | 0.309±0.038 | 0.293±0.027 | 0.582±0.008 | K=3 |
| SC | 0.376±0.000 | 0.571±0.000 | 0.917±0.000 | K=2, σ=0.5 | 0.010±0.000 | 0.002±0.000 | 0.558±0.000 | K=3, σ=0.5 |
- Key Findings:
M3Wachieves the highest NMI scores on all 10 real-world datasets. It also achieves the best ARI and F1 scores on at least 8 of them. The performance gap is particularly large on datasets likePendigits,Htru, andShuttle, whereM3Woutperforms the second-best algorithm by a significant margin. This confirms that the benefits of its multistep approach are not limited to synthetic data but translate to complex, high-dimensional real-world problems.
Running Time
Figure 11 compares the running time of M3W with other three-way clustering algorithms.
该图像是图11,展示了不同算法的运行时间比较。图(a)采用线性Y轴,图(b)采用对数Y轴。随着数据量的增加,M3W、3W-DBSCAN、CE3和3WDPET四种算法的运行时间均呈上升趋势。在绝大多数数据点上,M3W的运行时间优于其他三种算法,尤其在对数尺度下M3W(红色线)的效率优势更明显,验证了其在效率方面的优势。
3WDPETis significantly slower than the others.- On a log scale,
M3Wis shown to be faster than3W-DBSCANandCE3in most cases. - The performance advantage of
M3Wbecomes more pronounced as the number of data instances increases, indicating better scalability. This aligns with its complexity.
7. Conclusion & Reflections
-
Conclusion Summary: The paper successfully introduces
M3W, a novel three-way clustering algorithm that addresses fundamental limitations of existing methods. By incorporating a progressive erosion strategy to create a multilevel data structure and a multistep allocation scheme inspired by sequential three-way decisions,M3Wcan adaptively and accurately identify clusters, especially those with complex shapes and ambiguous boundaries. Extensive experiments validate its superior performance and efficiency compared to a wide range of competitors. -
Limitations & Future Work: The authors identify several directions for future research:
- Online Learning: Extending
M3Wto handle streaming data. - Automatic Parameter Determination: Developing methods to automatically select the optimal values for the number of neighbors () and the number of levels ().
- General Framework: Building a more general framework for sequential three-way clustering that could be applied to different types of algorithms.
- Online Learning: Extending
-
Personal Insights & Critique:
- Novelty and Strength: The main strength of
M3Wis its elegant fusion of ideas from sequential decision theory and density-based clustering. The multistep, inside-out clustering process is intuitive and powerful. It provides a robust way to handle boundary points by allowing them to "wait" for more information (i.e., for their neighbors to be classified first). This is a significant conceptual advance over methods that rely on a single, global thresholding step. - Potential Limitations:
- Parameter Sensitivity: While effective, the algorithm still has two key parameters, and , that require tuning. The paper provides a range for them but doesn't offer a principled way to set them, a common challenge in clustering.
- Density Assumption: Like other density-based methods,
M3Wmay struggle with datasets containing clusters of vastly different densities, although the dynamic density estimation and adaptive thresholds are designed to mitigate this.
- Overall Impact:
M3Wpresents a strong and well-motivated contribution to the field of three-way clustering. Its core idea of sequential, information-gathering allocation could inspire new approaches in other areas of unsupervised learning where uncertainty and ambiguity are prevalent. The work is a prime example of how concepts from decision theory can enrich machine learning models.
- Novelty and Strength: The main strength of
Similar papers
Recommended via semantic vector search.