Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation
TL;DR Summary
The CoGCL framework enhances graph contrastive learning for recommendations by using discrete codes for stronger collaborative information. It employs a multi-level quantizer to map users and items, generating reliable contrastive views through virtual neighbor augmentation and s
Abstract
Graph neural network(GNN) has been a powerful approach in collaborative filtering(CF) due to its ability to model high-order user-item relationships. Recently, to alleviate the data sparsity and enhance representation learning, many efforts have been conducted to integrate contrastive learning(CL) with GNNs. Despite the promising improvements, the contrastive view generation based on structure and representation perturbations in existing methods potentially disrupts the collaborative information in contrastive views, resulting in limited effectiveness of positive alignment. To overcome this issue, we propose CoGCL, a novel framework that aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes. The core idea is to map users and items into discrete codes rich in collaborative information for reliable and informative contrastive view generation. To this end, we initially introduce a multi-level vector quantizer in an end-to-end manner to quantize user and item representations into discrete codes. Based on these discrete codes, we enhance the collaborative information of contrastive views by considering neighborhood structure and semantic relevance respectively. For neighborhood structure, we propose virtual neighbor augmentation by treating discrete codes as virtual neighbors, which expands an observed user-item interaction into multiple edges involving discrete codes. Regarding semantic relevance, we identify similar users/items based on shared discrete codes and interaction targets to generate the semantically relevant view. Through these strategies, we construct contrastive views with stronger collaborative information and develop a triple-view graph contrastive learning approach. Extensive experiments on four public datasets demonstrate the effectiveness of our proposed approach.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation
1.2. Authors
Hongyu Lu (WeChat, Tencent), Bowen Zheng (Renmin University of China), Junjie Zhang (Renmin University of China), Yu Chen (WeChat, Tencent), Ming Chen (WeChat, Tencent), Wayne Xin Zhao (Renmin University of China), Ji-Rong Wen (Renmin University of China). The authors are affiliated with prominent technology companies (Tencent) and academic institutions (Renmin University of China), indicating a strong research background in areas like recommendation systems, graph neural networks, and contrastive learning.
1.3. Journal/Conference
The paper is published in the proceedings of ACM (Conference acronym 'XX'). Given the context and authors' previous publications, it is likely a top-tier conference in information retrieval or data mining, such as SIGIR or KDD, known for their significant influence in the field of recommender systems.
1.4. Publication Year
2024
1.5. Abstract
Graph neural network (GNN) has been a powerful approach in collaborative filtering (CF) due to its ability to model high-order user-item relationships. Recently, to alleviate the data sparsity and enhance representation learning, many efforts have been conducted to integrate contrastive learning (CL) with GNNs. Despite the promising improvements, the contrastive view generation based on structure and representation perturbations in existing methods potentially disrupts the collaborative information in contrastive views, resulting in limited effectiveness of positive alignment. To overcome this issue, we propose CoGCL, a novel framework that aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes. The core idea is to map users and items into discrete codes rich in collaborative information for reliable and informative contrastive view generation. To this end, we initially introduce a multi-level vector quantizer in an end-to-end manner to quantize user and item representations into discrete codes. Based on these discrete codes, we enhance the collaborative information of contrastive views by considering neighborhood structure and semantic relevance respectively. For neighborhood structure, we propose virtual neighbor augmentation by treating discrete codes as virtual neighbors, which expands an observed user-item interaction into multiple edges involving discrete codes. Regarding semantic relevance, we identify similar users/items based on shared discrete codes and interaction targets to generate the semantically relevant view. Through these strategies, we construct contrastive views with stronger collaborative information and develop a triple-view graph contrastive learning approach. Extensive experiments on four public datasets demonstrate the effectiveness of our proposed approach.
1.6. Original Source Link
https://arxiv.org/abs/2409.05633 PDF Link: https://arxiv.org/pdf/2409.05633v2.pdf Publication Status: Preprint on arXiv.
2. Executive Summary
2.1. Background & Motivation
The paper addresses the critical challenge of enhancing recommender systems, particularly collaborative filtering (CF), by leveraging Graph Neural Networks (GNNs). GNNs are effective at modeling complex, high-order user-item relationships, but they often struggle with data sparsity—a common problem where user-item interaction data is scarce. This sparsity limits the quality of learned user and item representations, which are crucial for accurate recommendations.
To mitigate data sparsity and improve representation learning, contrastive learning (CL) has been integrated with GNNs. However, existing CL-based methods typically generate contrastive views through perturbations (e.g., stochastic node/edge dropout or adding random noise to embeddings). The core problem identified by the authors is that these perturbation-based approaches can unintentionally disrupt the underlying collaborative information within the generated contrastive views. This disruption leads to ineffective positive alignment, where the model is taught to align views that no longer fully represent the true collaborative semantics, thereby limiting the overall effectiveness of CL.
The paper's entry point is to overcome this limitation by proposing a novel method for contrastive view generation that preserves and enhances collaborative information rather than disrupting it. The innovative idea is to use discrete codes derived from user and item representations to construct more reliable and informative contrastive views.
2.2. Main Contributions / Findings
The paper introduces CoGCL (Collaborative Graph Contrastive Learning), a novel framework designed to enhance graph contrastive learning for recommendation by creating contrastive views with stronger collaborative information.
The primary contributions are:
-
A Reliable and Informative Graph CL Framework (CoGCL): CoGCL is proposed as a method to construct contrastive views that inherently contain stronger collaborative information, moving beyond perturbation-based view generation.
-
End-to-End Discrete Code Learning: An end-to-end multi-level
vector quantizeris introduced to map continuous user and item representations (learned by a GNN) into discrete codes. These codes are specifically designed to capture and represent rich collaborative semantics. -
Enhanced Contrastive View Generation via Discrete Codes: The learned discrete codes are utilized in two innovative ways to create high-quality contrastive views:
- Virtual Neighbor Augmentation: Discrete codes are treated as "virtual neighbors," expanding existing user-item interactions into multiple edges involving these codes. This strategy enriches the
neighborhood structureof nodes andalleviates data sparsityby providing more connections. This augmentation can eitherreplaceexisting neighbors with codes oraddcodes as extra neighbors. - Semantic Relevance Sampling: Users or items are identified as semantically similar if they share discrete codes or common interaction targets. This allows for positive pair sampling that focuses on
fine-grained semantic relevancerather than arbitrary pairings.
- Virtual Neighbor Augmentation: Discrete codes are treated as "virtual neighbors," expanding existing user-item interactions into multiple edges involving these codes. This strategy enriches the
-
Triple-View Graph Contrastive Learning: The framework integrates three distinct contrastive views: two augmented views derived from virtual neighbor augmentation and one semantically relevant view from similar users/items. A
triple-view graph contrastive learningapproach is developed to align these views, thereby integrating both enhanced structural and semantic collaborative information into the model.The key findings demonstrate that CoGCL consistently outperforms state-of-the-art baseline models across four public datasets, achieving significant improvements in recommendation performance. Detailed analyses further confirm that the proposed components (virtual neighbor augmentation, semantic relevance sampling, and the discrete code learning) are crucial for enhancing graph CL, particularly in scenarios with high data sparsity. The study also empirically validates that CoGCL achieves a better balance between
alignmentanduniformityin representation learning compared to previous methods.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
Collaborative Filtering (CF)
Collaborative Filtering (CF) is a fundamental technique in recommender systems that predicts a user's interest in items by collecting preferences or taste information from many users (collaborators). The basic idea is that if two users have similar preferences for some items, they are likely to have similar preferences for other items. There are two main types:
- User-based CF: Recommends items that similar users have liked.
- Item-based CF: Recommends items that are similar to items a user has liked in the past.
CF relies on the
user-item interaction matrix, which records implicit (e.g., clicks, purchases) or explicit (e.g., ratings) feedback. The goal is to fill in the missing entries in this matrix to predict preferences for unseen items.
Graph Neural Networks (GNNs)
Graph Neural Networks (GNNs) are a class of neural networks designed to operate on graph-structured data. Unlike traditional neural networks that work on Euclidean data (like images or text sequences), GNNs can process data where relationships between entities are explicitly represented as a graph. In recommender systems, user-item interaction data can be naturally modeled as a bipartite graph where users and items are nodes, and interactions are edges.
- Node Representation Learning: GNNs learn
node embeddings(vector representations) by iteratively aggregating information from a node's neighbors. This process is calledmessage passingorneighbor aggregation. - High-Order Relationships: By stacking multiple GNN layers, information can propagate across several hops, effectively capturing
high-order relationships(e.g., a user's preference for an item might be influenced by items liked by users who liked similar items). - LightGCN: A simplified GNN architecture commonly used in recommendation. It removes non-linear activation functions and feature transformations from traditional GCNs, arguing that these components can actually hurt performance in collaborative filtering tasks due to over-smoothing or introducing noise.
LightGCNfocuses purely on neighbor aggregation, essentially performing a weighted sum of neighbor embeddings. The embedding of a node at layer is an aggregation of its neighbors' embeddings from layerl-1. The final node embedding is typically a concatenation or summation of embeddings from all layers.
Contrastive Learning (CL)
Contrastive Learning (CL) is a self-supervised learning paradigm where the model learns representations by distinguishing between similar (positive) and dissimilar (negative) pairs of data samples. The core idea is to pull positive pairs (different augmented views of the same data point, or semantically similar data points) closer in the embedding space while pushing negative pairs (dissimilar data points) farther apart.
- Contrastive Views: These are different augmented versions of an original data sample. How these views are generated is crucial for CL's effectiveness.
- InfoNCE Loss: A common objective function used in CL. For a given anchor , a positive sample , and a set of negative samples
z-, theInfoNCEloss aims to maximize the similarity between and relative to the similarity between and anyz-. TheInfoNCEloss for a sample with two views and is defined as: Where:- : A similarity function, typically
cosine similarity. - : A
temperature coefficientthat scales the logits before the softmax operation, controlling the sharpness of the distribution. A smaller makes the model more sensitive to small differences in similarity. - and : The representations of the two contrastive views for node .
- : A set of
negative samplesfor node .
- : A similarity function, typically
- Alignment and Uniformity: These are two key properties of learned representations in CL, especially relevant for
InfoNCEloss.Alignment: Measures how close positive pairs are in the embedding space. Good alignment means is high.Uniformity: Measures how uniformly the representations are distributed on the unit hypersphere. Good uniformity means representations are spread out, preventingrepresentation collapse(where all embeddings become similar). TheInfoNCEloss implicitly encourages both.
Vector Quantization (VQ)
Vector Quantization (VQ) is a signal processing technique used to reduce the data rate by mapping vectors from a large vector space to a finite number of regions in that space. Each region is represented by a codevector (or codebook entry), and the collection of all codevectors forms a codebook.
- Discrete Codes: The output of VQ is a discrete index (or a sequence of indices) corresponding to the chosen codevector(s), effectively discretizing the continuous input space.
- Multi-level VQ: Techniques like
Residual Quantization (RQ)andProduct Quantization (PQ)are multi-level VQ methods.- Residual Quantization (RQ): Quantizes a vector in multiple stages. In each stage, a codevector is chosen to best represent the residual (the difference between the original vector and the sum of previously chosen codevectors). This allows for progressive refinement and potentially better accuracy with fewer codevectors per stage.
- Product Quantization (PQ): Divides the original vector into several sub-vectors and quantizes each sub-vector independently using its own small codebook. The final code is a concatenation of the codes for each sub-vector. This is efficient for high-dimensional vectors.
3.2. Previous Works
The paper frames its contribution by contrasting with existing approaches in two main areas: GNN-based CF and CL-based methods for recommendation.
Traditional Collaborative Filtering (CF) and GNN-based CF
- BPR [36]: A foundational
matrix factorization (MF)approach that optimizes for a personalized ranking by sampling positive and negative item pairs. - GCMC [41]:
Graph Convolutional Matrix Completionuses GCNs to perform matrix completion for recommendations. - NGCF [48]:
Neural Graph Collaborative Filteringexplicitly models high-order connectivity in the user-item interaction graph using GNNs. - DGCF [49]:
Disentangled Graph Collaborative Filteringaims to learn disentangled representations to capture different user intents or item aspects, thereby improving recommendation quality. - LightGCN [15]: Simplifies GNNs for recommendation by removing non-linear activations and feature transformations, focusing purely on linear message passing, which has proven highly effective.
- SimpleX [31]: A strong baseline that leverages a
cosine contrastive lossfor learning user/item embeddings without complex graph structures or CL augmentations.
Contrastive Learning (CL) for Recommendation
Existing CL-based methods are categorized by how they construct contrastive views:
- Structure Augmentation: These methods perturb the
graph structureto create different views.- SGL [51]:
Self-supervised Graph Learningfor recommendation. It randomly drops nodes or edges in the interaction graph to create augmented graphs, from which two views are derived. The paper critiques this as potentially disrupting crucial collaborative information, especially in sparse graphs. - GFormer [27]:
Graph Transformerfor recommendation that distills self-supervised signals using graph rationale discovery based on masked autoencoding. - LightGCL [4]: Employs
Singular Value Decomposition (SVD)for adjacency matrix reconstruction to generate augmented views, aiming for a lightweight approach.
- SGL [51]:
- Representation Augmentation: These methods perturb or generate additional
node representations.- SLRec [56]: Uses CL for representation regularization to learn better latent relationships in general item recommendations.
- NCL [29]:
Neighborhood-enriched Contrastive Learningenhances GNN-based recommendation by learning cluster centers based on anExpectation-Maximization (EM)algorithm as anchors. - HCCF [53]:
Hypergraph Contrastive Collaborative Filteringconstructs hypergraph-enhanced CL to capture local and global collaborative relations. - SimGCL [60]: Creates contrastive views by adding
random noiseto the node embeddings directly in the embedding space. This is a very simple yet effective method, but the paper argues that random noise can interfere with implicit collaborative semantics.
User/Item ID Discretization in Recommendation
- This area focuses on representing users/items not just by a single ID, but by a tuple of discrete codes.
- Semantic Hashing [5, 19, 37] and Vector Quantization [14, 44] are key techniques used here.
- Early work focused on efficiency (memory/time) by sharing code embeddings [1, 24, 25, 28, 38].
- More recently, discrete codes are used to improve recommendation quality by alleviating sparsity and providing prior semantics, e.g., in transferable recommendation [16], generative sequential recommendation [30, 34, 39, 47], and LLM-based recommendation [18, 64].
- The current paper's approach differs by specifically employing discrete codes for
virtual neighbor augmentationandsemantic similarity samplingwithin a graph CL framework for CF.
3.3. Technological Evolution
The evolution of recommender systems has seen a progression from traditional matrix factorization techniques to graph-based methods, and more recently, the integration of self-supervised learning paradigms like contrastive learning.
-
Early CF (e.g., BPR): Focused on learning latent factors for users and items, primarily addressing the
cold-start problemandsparsityin a basic sense. -
GNN-based CF (e.g., NGCF, LightGCN): Recognized the graph nature of user-item interactions and leveraged GNNs to explicitly model
high-order connectivityandinformation propagation, leading to richer representations. This marked a significant leap in capturing complex relationships. -
CL-enhanced GNN-based CF (e.g., SGL, SimGCL): Introduced
self-supervised signalsvia contrastive learning to further alleviate sparsity and enhance representation quality. These methods typically generate multiple "views" of the graph or node embeddings and maximize agreement between views of the same entity while pushing apart views of different entities.This paper's work (
CoGCL) fits into the third stage. It identifies a crucial limitation in existing CL-enhanced methods: their view generation strategies (perturbations) can inadvertently degrade the verycollaborative informationthey aim to leverage. CoGCL proposes a more sophisticated andcollaborative information-awareview generation mechanism using discrete codes, positioning it as an advancement in making CL more effective and reliable for recommendation.
3.4. Differentiation Analysis
The core differentiation of CoGCL from existing CL-based methods lies in its approach to contrastive view generation:
| Feature | Existing CL Methods (e.g., SGL, SimGCL) | CoGCL |
|---|---|---|
| Contrastive View Generation | - Structure Perturbation: Randomly drops nodes/edges (SGL), SVD-based reconstruction (LightGCL). - Representation Perturbation: Adds random noise to embeddings (SimGCL). | - Discrete Codes: Maps user/item representations to discrete codes rich in collaborative information. - Virtual Neighbor Augmentation: Uses discrete codes as virtual neighbors (replace or add existing neighbors) to expand interaction edges, enriching neighborhood structure and alleviating sparsity. - Semantic Relevance Sampling: Identifies similar users/items based on shared discrete codes or shared interaction targets to form positive pairs, emphasizing fine-grained semantic relevance. |
| Impact on Collaborative Info | - Disruption Risk: Perturbations can inadvertently disrupt or destroy crucial collaborative information, leading to less effective positive alignment and potentially misleading model learning. - Arbitrary Nature: Perturbations can be arbitrary and may not be well-founded in observed user-item interactions. | - Preservation & Enhancement: Explicitly aims to preserve and enhance collaborative information. Discrete codes are learned to be rich in this information. Augmentations are reliable (based on observed interactions) and informative (introducing richer structural and semantic context). |
| Positive Sample Definition | - Views of the same node from perturbed graphs/embeddings. - May indiscriminately distinguish different instances. | - Multi-faceted Positives: Considers views from augmented graphs (abundant neighborhood structure) and semantically similar users/items (fine-grained semantic relevance) as positives. This leads to a triple-view approach. - Focuses on aligning instances with strong, explicit collaborative semantics, not just augmented versions of the same entity. |
| Robustness to Sparsity | - Structural perturbations on sparse graphs can lose key interactions. - Random noise might interfere with implicit semantics. | - Explicitly addresses sparsity through virtual neighbor augmentation, which adds reliable connections via discrete codes. This makes the augmented graphs richer even for sparse nodes. |
| Learning Objective | - Primarily InfoNCE aiming for alignment and uniformity, often heavily relying on uniformity to prevent collapse. |
- Triple-view contrastive learning that integrates BPR loss, discrete code learning loss, alignment between neighbor augmented views, and alignment between semantically relevant users/items. The paper empirically shows that alignment in CoGCL is more effective, not just relying on uniformity. |
In essence, while previous methods often apply generic data augmentation techniques, CoGCL proposes a collaborative-information-centric approach to augmentation by introducing discrete codes, thereby creating more meaningful and less disruptive contrastive views.
4. Methodology
4.1. Principles
The core principle of CoGCL is to overcome the limitations of existing Graph Contrastive Learning (GCL) methods for recommendation, which often disrupt collaborative information when generating contrastive views through perturbations. Instead, CoGCL aims to construct reliable and informative contrastive views that explicitly enhance collaborative information by leveraging discrete codes. The intuition is that if users and items can be represented by a small set of discrete codes that capture their underlying collaborative semantics, these codes can then be used to create more meaningful augmented interactions and identify semantically similar entities, thereby providing stronger self-supervision signals for representation learning.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Approach Overview
The CoGCL framework is built upon three main components:
- End-To-End Discrete Code Learning: This component focuses on how to elegantly learn discrete codes that are rich in collaborative information for both users and items. It uses a multi-level vector quantizer integrated directly into the training process.
- Reliable and Informative Contrastive View Generation: Once the discrete codes are learned, they are employed to create high-quality contrastive views. This involves two sub-strategies:
virtual neighbor augmentationto enhance neighborhood structure andsemantic relevance samplingto identify truly similar entities. - Triple-View Graph Contrastive Learning: Finally, the framework develops a contrastive learning objective that aligns representations from three distinct, collaboratively-enhanced views, thereby integrating stronger self-supervised signals into the recommendation model.
4.2.2. End-To-End Discrete Code Learning
4.2.2.1. Representation Encoding via GNN
Following established practices in GNN-based collaborative filtering, CoGCL adopts LightGCN as its primary Graph Neural Network (GNN) encoder. LightGCN is chosen for its simplicity and effectiveness in propagating neighbor information across the user-item interaction graph.
Given user and item sets and respectively, an interaction matrix defines interactions. A bipartite graph is constructed where contains all users and items, and represents observed interactions.
The GNN encoder, specifically LightGCN, learns node representations by aggregating information from neighbors. The process is formulated as:
\begin{array}{r}
\boldsymbol{\mathbf{Z}}^l = \mathrm{GNN}(\rho(\boldsymbol{\mathbf{Z}}^{l-1}), G), \quad \boldsymbol{\mathbf{Z}} \mathrm{~=~ \boldsymbol{\mathbf{Readout}}(\boldsymbol{\mathbf{Z}}^0, \boldsymbol{\mathbf{Z}}^1, \ldots, \boldsymbol{\mathbf{Z}}^L])
\end{array}
Where:
- : The total number of
GNN layers. - : The matrix of node representations at the -th GNN layer, where is the embedding dimension. These representations capture -hop neighbor information.
- : Denotes a
dropout operationapplied to the input representations of each layer. This is a regularization technique used to mitigate overfitting by randomly setting a fraction of input units to zero during training. Importantly, CoGCL applies dropout on the input representation of each layer, not asedge dropouton the graph structure, which is a common practice in some prior works. - : Represents the
bipartite interaction graph. - : The initial
trainable ID embedding matrixfor all users and items. - : A function that summarizes the representations from all layers () to obtain the final node representations . Following
SimGCL, CoGCL specifically skips in the readout function for slight performance improvement. The final user and item representations, denoted as and , are then used for both the recommendation task and the multi-level discrete code learning. The predicted score for user and item is typically calculated as theinner productorcosine similarityof their representations, e.g., .
4.2.2.2. End-To-End Multi-Level Code Learning
To obtain discrete codes that are rich in collaborative information, CoGCL employs a multi-level vector quantization (VQ) method in an end-to-end manner. The paper specifically mentions Residual Quantization (RQ) [9] and Product Quantization (PQ) [21] as examples of multi-level VQ. For simplicity, the process is described using a generic multi-level VQ framework, exemplified with RQ mechanics.
Let's consider discrete code learning for a user . The process applies symmetrically to items.
For each level , there is a codebook , which consists of distinct codevectors . Each is a learnable embedding representing a specific discrete code at level .
The quantization process, which assigns a discrete code to a user representation at level , is expressed using a softmax over cosine similarity:
Where:
-
: The -th discrete code assigned to user .
-
: The user representation at the -th level. In
Residual Quantization (RQ), (the initial user representation from the GNN), and for subsequent levels, , meaning the residual from the previous quantization is passed to the next level. InProduct Quantization (PQ), would be split into sub-vectors, and each would be a sub-vector. -
: The -th codevector in the codebook at level .
-
: The
cosine similarityfunction, aligning with the similarity measure used in contrastive learning. This is a deliberate choice, differing from theEuclidean distanceoften used in traditional VQ, to synchronize the learning objectives. -
: A
temperature coefficientfor the softmax, similar to the one inInfoNCEloss.The learning objective for these discrete codes is to maximize the likelihood of assigning representations to their corresponding codebook centers. This is achieved via a
Cross-Entropy (CE)loss. For user discrete code learning, the loss is: Where: -
: The discrete code loss for the user side.
-
: The number of code levels.
-
: The probability that the user representation is assigned to its chosen code at level . A similar loss, , is calculated for items. The total discrete code learning loss is .
4.2.3. Reliable and Informative Contrastive View Generation
This is a critical component where CoGCL differentiates itself from existing methods by using discrete codes to generate contrastive views that are both reliable (rooted in observed interactions) and informative (enhancing collaborative structure and semantics).
4.2.3.1. Virtual Neighbor Augmentation via Discrete Codes
To create reliable contrastive views with an enhanced neighborhood structure, CoGCL uses discrete codes as virtual neighbors. This process aims to alleviate data sparsity by expanding the observed interaction graph.
For a given user , a subset of their interacted items (neighbors) is selected with a certain probability to form augmented data . Two operators are then applied to augment the graph edges:
-
"Replace" Operator: This operator replaces the original interacted items in with their corresponding discrete codes. The original edges to these items are removed.
-
"Add" Operator: This operator adds the discrete codes of items in as additional virtual neighbors, while retaining the original edges to the items.
The augmented edges involving user can be formally expressed as: Where:
-
: The set of new edges created between user and the discrete codes () of the selected items . Each item's discrete codes become virtual neighbors.
-
: The set of interaction edges for user resulting from the "replace" augmentation. It combines original edges to non-augmented items and the new edges to discrete codes .
-
: The set of interaction edges for user resulting from the "add" augmentation. It includes all original edges to items and the new edges to discrete codes .
These operations effectively treat discrete codes as
virtual neighborsof the user. By either replacing original neighbors or adding extra virtual neighbors, the process injects richer neighbor information and helps mitigate graph sparsity. The same augmentation logic is applied symmetrically for items.
To generate a pair of augmented nodes for Contrastive Learning (CL), two rounds of this virtual neighbor augmentation are performed. The resulting augmented graphs are denoted as:
Where:
- : The expanded node set, which now includes all users, all items, and all user/item discrete codes. and represent the sets of all discrete codes for users and items, respectively.
- : Two stochastic operators chosen from "replace" () or "add" (). This means could use "replace" and could use "add", or both could use the same operator.
- and : The edge sets generated by applying the chosen virtual neighbor augmentation operations for all users and items across the graph.
These augmented graphs provide nodes with
abundant(extensive virtual neighbors) andhomogeneous(substantial common neighbors via shared codes) structural information. The alignment objective then seeks to bring representations from these two views closer. The discrete codes and augmented graphs are updated once per training epoch.
4.2.3.2. Semantic Relevance Sampling via Discrete Codes
Beyond structure, CoGCL also generates more informative contrastive views by identifying distinct users/items with similar semantics as positive pairs. This is more fine-grained than simply treating augmented versions of the same node as positive.
Semantic relevance is assessed in two ways, both leveraging the learned discrete codes:
-
Shared Codes: The discrete codes are designed to correlate with the collaborative semantics of user/item representations. If two users (or items) share a significant number of these codes, it suggests a fine-grained semantic similarity. CoGCL specifically identifies users who share
at least H-1 codes(where H is the total number of code levels) as semantically positive. This high threshold ensures strong semantic overlap. -
Shared Target: If two users interact with a common item (a "shared target"), or two items are interacted with by a common user, they are considered
semantically relevant. This is a form of supervised positive sampling, where the inherent interaction pattern serves as a direct indicator of relevance. This approach has shown effectiveness in other CL contexts like sentence embedding and sequential recommendation.From the combined set of instances identified by these two criteria, a positive example is sampled for each user . This forms a more semantically rich positive set for contrastive learning. The same symmetric process applies to items.
4.2.4. Triple-View Graph Contrastive Learning
CoGCL integrates the enhanced structural and semantic information through a triple-view graph contrastive learning approach. This involves aligning three types of views: two augmented views from virtual neighbor augmentation and one semantically relevant view.
4.2.4.1. Multi-View Representation Encoding
To encode the representations for the augmented graphs, CoGCL introduces additional learnable embeddings specifically for the discrete codes. These are denoted as , where and are the total counts of unique user and item discrete codes respectively.
The initial embedding matrix for the augmented graphs, , is formed by concatenating the original ID embeddings with these code embeddings : .
The representations for the two augmented views ( and ) are then obtained using the same GNN encoder from Section 3.2.1, but with as the initial input and the respective augmented graphs: Where:
- and : Node representations at layer for augmented graph and respectively.
- : The dropout operation on input representations.
- The initial representations are set as .
After applying the
Readoutfunction (as described in Section 3.2.1), the final representations for these two views are denoted as and .
For the semantically relevant user/item view, CoGCL directly uses the node representation obtained from the initial (unaugmented) interaction graph, as described in Section 3.2.1. The dropout applied during the GNN encoding for this view already introduces a form of data augmentation, as different dropout masks lead to different features during two forward propagations, effectively creating distinct views.
4.2.4.2. Alignment Between Neighbor Augmented Views
The two augmented views, and , derived from virtual neighbor augmentation, contain abundant structural information. To leverage this, CoGCL introduces an alignment objective to pull these representations closer. This is done using an InfoNCE-like loss. For the user side, the loss is:
Where:
- : The augmentation alignment loss for users.
- : A specific user in the current
batch data. - : Any user, including itself, within the batch .
- and : The representations of user from the two different augmented views.
- : Cosine similarity.
- : Temperature coefficient.
This loss consists of two terms, ensuring
bidirectional alignment: is aligned with , and vice-versa. Analogously, an item-side loss is computed, and the total augmented view alignment loss is .
4.2.4.3. Alignment Between Semantically Relevant Users/Items
To incorporate collaborative semantic information, CoGCL aligns users/items with similar collaborative semantics, as identified by semantic relevance sampling. For each user , a positive example with similar semantics (based on shared codes or interaction targets) is sampled. The alignment loss, which connects the augmented views of with its semantically similar counterpart , is:
Where:
- : The semantic similarity alignment loss for users.
- : A positive user pair, where is a semantically relevant user for .
- : The sampled data in a batch, which includes and negative samples for .
- and : Representations of user from the two augmented views.
- : Representation of the semantically relevant user , obtained from the initial GNN encoder (unaugmented graph). The two terms in the equation correspond to aligning with and with , respectively. An analogous item-side loss is computed, and the total semantic alignment loss is .
4.2.4.4. Overall Optimization
The entire CoGCL framework is jointly optimized by minimizing a combined loss function that includes the Bayesian Personalized Ranking (BPR) loss for recommendation, the discrete code learning objective, and the two contrastive learning losses.
The BPR loss is a standard pairwise ranking loss for implicit feedback data, defined as:
Where:
-
: The set of observed triplets
(u, i, j), where user prefers item over item . -
: The sigmoid function.
-
: The predicted score for user and item .
-
: Regularization coefficient.
-
: All learnable parameters.
The overall optimization objective for CoGCL is: Where:
-
: The primary recommendation loss.
-
: The discrete code learning loss (from Section 3.2.2.2).
-
: The alignment loss between neighbor augmented views (from Section 3.4.2).
-
: The alignment loss between semantically relevant users/items (from Section 3.4.3).
-
: Hyperparameters that control the trade-off between these different objectives.
4.3. Discussion
The paper critically compares CoGCL with existing graph Contrastive Learning (CL) methods for Collaborative Filtering (CF), highlighting its unique contributions.
Comparison with Structural Augmentation Methods:
- Existing Methods (e.g., SGL [51], GFormer [27], LightGCL [4]): These approaches typically perturb the graph structure (e.g., stochastic node/edge dropout, SVD-based reconstruction). The paper argues that such perturbations, especially on already sparse graphs, can disrupt crucial collaborative information. This leads to
uninformative contrastive viewsbecause key interactions might be lost or the reconstructed graph might not truly capture collaborative semantics. - CoGCL's Differentiation: CoGCL offers a
reliable and informativealternative. Instead of perturbing existing structures, it enhances them by introducingdiscrete codesasvirtual neighbors. This process:- Reliability: Is strictly based on observed interactions (e.g., a code is associated with an item a user has interacted with).
- Informativeness: The virtual neighbors (codes) effectively
alleviate data sparsityby providing more connections and richer local contexts for nodes. - Benefit: Aligning representations from two such abundantly structured augmented views is expected to introduce more profound collaborative information into the model.
Comparison with Representation Augmentation Methods:
- Existing Methods (e.g., SimGCL [60], NCL [29]): These methods typically involve modeling additional representations (e.g., adding random noise to embeddings, learning hypergraph representations, or cluster centers).
SimGCLperturbs embeddings withrandom noise. The paper points out that thisrandom noisecan interfere with the implicit collaborative semantics in node representations, causingsemantic disruption.NCLlearns cluster centers as anchors, but it's based on theEM algorithm, which might not be asfine-grainedin capturing semantic relevance.- These methods often focus on separating dissimilar instances
indiscriminately.
- CoGCL's Differentiation: CoGCL's
semantic relevance samplingoffers a more nuanced approach:-
Fine-grained Relevance: It identifies semantically similar users/items based on
shared discrete codes(which are learned to embody collaborative semantics) orshared interaction targets. This ensures that positive pairs are truly collaboratively relevant. -
Targeted Alignment: By aligning users/items with explicit semantic relevance, CoGCL aims to
unleash the potential of CLmore effectively, fostering bettersemantic learningwithin the model. This is in contrast to methods that might just push all non-identical instances apart.In summary, CoGCL's novelty lies in its fundamental shift from disruptive perturbations to constructive and information-rich augmentations driven by learned discrete codes, leading to more meaningful
positive alignmentand a more potentself-supervised signalfor recommendation.
-
5. Experimental Setup
5.1. Datasets
The experiments evaluate CoGCL on four public datasets that vary in domain, scale, and sparsity, ensuring a comprehensive evaluation.
The following are the statistics from Table 1 of the original paper:
| Datasets | #Users | #Items | #Interactions | Sparsity |
|---|---|---|---|---|
| Instrument | 48,453 | 21,413 | 427,674 | 99.959% |
| Office | 181,878 | 67,409 | 1,477,820 | 99.988% |
| Gowalla | 29,858 | 40,988 | 1,027,464 | 99.916% |
| iFashion | 300,000 | 81,614 | 1,607,813 | 99.993% |
Here's a detailed description of each dataset:
- Instrument [17]: A subset from the
Amazon2023 benchmarkdataset. It represents user-item interactions within the "Musical Instruments" category.#Users: 48,453#Items: 21,413#Interactions: 427,674Sparsity: 99.959% (very sparse)- Preprocessing: Filtered users and items with less than five interactions.
- Office [17]: Another subset from the
Amazon2023 benchmarkdataset, focusing on "Office Products."#Users: 181,878#Items: 67,409#Interactions: 1,477,820Sparsity: 99.988% (extremely sparse)- Preprocessing: Filtered users and items with less than five interactions.
- Gowalla [10]: A location-based social networking dataset, where interactions represent user check-ins at various locations.
#Users: 29,858#Items: 40,988#Interactions: 1,027,464Sparsity: 99.916% (sparse)- Preprocessing: A
10-core filteringis applied, meaning only users and items with at least 10 interactions are retained. This is a common practice to ensure data quality and density.
- iFashion [8]: The
Alibaba-iFashiondataset, likely related to fashion recommendations.-
#Users: 300,000 -
#Items: 81,614 -
#Interactions: 1,607,813 -
Sparsity: 99.993% (extremely sparse, among the sparsest) -
Preprocessing: Data processed by [51], which involved randomly sampling 300k users and their interactions.
These datasets were chosen because they represent diverse domains and scales, and importantly, they exhibit varying degrees of
data sparsity, a key challenge CoGCL aims to address. The preprocessing steps (e.g., filtering low-activity users/items, 10-core filtering) are standard practices in recommendation system research to focus on more active entities and improve data quality.
-
For each dataset, interactions are split into training, validation, and testing sets with a ratio of 8:1:1.
5.2. Evaluation Metrics
The paper uses two widely accepted metrics in recommendation systems to evaluate model performance: Recall@N and Normalized Discounted Cumulative Gain (NDCG)@N. The value of is set to 5, 10, and 20. The evaluation is conducted using full ranking, meaning predictions are made over the entire item set, not just a subset of sampled negative items, for a more rigorous comparison.
Recall@N
- Conceptual Definition:
Recall@Nmeasures the proportion of relevant items that are successfully retrieved within the top recommendations. It focuses on the ability of a recommendation system to find as many relevant items as possible. A higher Recall@N indicates that the model is effective at identifying a large fraction of the items a user would like among its top suggestions. - Mathematical Formula:
- Symbol Explanation:
- : The total number of users in the evaluation set.
- : A specific user.
- : The set of top items recommended to user .
- : The set of items that are actually relevant to user (e.g., items the user interacted with in the test set).
- : Denotes the cardinality (number of elements) of a set.
Normalized Discounted Cumulative Gain (NDCG)@N
- Conceptual Definition:
NDCG@Nis a metric that evaluates the quality of a ranked list of recommendations. It considers both therelevanceof the recommended items and theirpositionin the list. More relevant items appearing at higher positions (earlier in the list) contribute more to the score. It is normalized by the ideal DCG (IDCG) to ensure scores are comparable across different queries or users. A higher NDCG@N indicates better ranking quality, where the most relevant items are placed at the top. - Mathematical Formula:
Where
DCG@N(Discounted Cumulative Gain) for a user is calculated as: AndIDCG@N(Ideal Discounted Cumulative Gain) for a user is calculated by ranking the relevant items perfectly: - Symbol Explanation:
- : The total number of users in the evaluation set.
- : A specific user.
- : The relevance score of the item at position in the recommended list for user . For implicit feedback, this is typically 1 if the item is relevant and 0 otherwise.
- : The relevance score of the item at position in the ideal ranked list (i.e., relevant items sorted by true relevance, then non-relevant items). For implicit feedback, this is 1 for all relevant items and 0 for non-relevant items, so IDCG is calculated by placing all relevant items at the top positions.
- : The discount factor, which reduces the contribution of items appearing further down the list.
5.3. Baselines
The paper compares CoGCL against a comprehensive set of competitive baseline models, categorized into Traditional CF Models and CL-based Models.
Traditional CF Models:
- BPR [36]:
Bayesian Personalized Ranking. A pairwise ranking optimization method for matrix factorization, usingBPR lossto learn latent representations. It's a fundamental baseline for implicit feedback. - GCMC [41]:
Graph Convolutional Matrix Completion. A GNN-based method that models user-item interactions as a bipartite graph and uses anauto-encoder frameworkfor matrix completion. - NGCF [48]:
Neural Graph Collaborative Filtering. Explicitly models high-order connectivity in the user-item interaction graph through message passing, learning expressive user and item embeddings. - DGCF [49]:
Disentangled Graph Collaborative Filtering. Aims to learn disentangled user and item representations, separating different user intents or item aspects to improve recommendation. - LightGCN [15]: Simplifies GCN for recommendation by removing non-linear activations and feature transformations, focusing on linear message propagation for efficiency and effectiveness. This often serves as a strong GNN baseline.
- SimpleX [31]: A straightforward yet robust baseline for collaborative filtering that uses a
cosine contrastive lossto learn representations, focusing on positive and negative examples.
CL-based Models:
-
SLRec [56]:
Self-supervised Learning for Large-scale Item Recommendations. Employs contrastive learning for representation regularization to learn improved latent relationships in large-scale recommendation systems. -
SGL [51]:
Self-supervised Graph Learning. Integrates self-supervised learning with graph collaborative filtering. The paper specifically usesSGL-ED, which likely refers toSGLwithedge dropoutas its augmentation strategy. -
NCL [29]:
Neighborhood-enriched Contrastive Learning. Enhances GNN-based recommendation by using contrastive learning withneighborhood-enrichedinformation, often involvingcluster centersas anchors. -
HCCF [53]:
Hypergraph Contrastive Collaborative Filtering. Useshypergraphsto capture complex, multi-way collaborative relations and integrates this with contrastive learning. -
GFormer [27]:
Graph Transformer for Recommendation. Leverages agraph transformerarchitecture combined withmasked autoencodingto distill self-supervised signals and learn invariant collaborative rationales. -
SimGCL [60]:
Simple Graph Contrastive Learning for Recommendation. A simple yet effective method that generates contrastive views byadding random noiseto node embeddings, often achieving strong performance. -
LightGCL [4]:
Lightweight Graph Contrastive Learning for Recommendation. EmploysSingular Value Decomposition (SVD)to generate augmented views from the adjacency matrix, aiming for a lightweight CL approach.These baselines are representative because they cover various foundational recommendation techniques (matrix factorization, graph-based methods) and the latest advancements in self-supervised learning, specifically contrastive learning applied to GNNs. This allows for a thorough comparison of CoGCL's performance against both traditional and cutting-edge approaches.
5.4. Implementation Details
The paper outlines specific implementation details to ensure reproducibility and fair comparison:
- Optimizer:
Adamoptimizer is used for model training. - Embedding Dimension: The embedding dimension () for all models is uniformly set to 64.
- Batch Size: A batch size of 4096 is used for training.
- GNN Layers: The number of
GNN layers() in GNN-based methods (including CoGCL) is set to 3. - Hyperparameter Tuning (Baselines): For all baseline models,
grid searchis employed to find optimal hyperparameters, guided by the settings reported in their original papers. This ensures that baselines are run under their best possible configurations. - CoGCL Specifics:
- Discrete Code Method:
Residual Quantization (RQ)is used as the default multi-levelvector quantizer. - Number of Code Levels (): Set to 4.
- Temperature Coefficient (): Set to 0.2, used in the
InfoNCEloss and the discrete code quantization probability calculation. - Codebook Size (): Set to 256 for
InstrumentandGowalladatasets. For larger datasets (OfficeandiFashion), is increased to 512 to accommodate their scale. - Loss Coefficients:
- (for ): Tuned in the set {5, 1, 0.5}.
- (for ): Tuned in {5, 1, 0.5, 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, 0.001}.
- (for ): Tuned in {5, 1, 0.5, 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, 0.001}.
- Augmentation Probabilities: The probabilities for "replace" and "add" operators in
virtual neighbor augmentationare tuned in {0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6}.
- Discrete Code Method:
- Full Ranking: All experiments use
full ranking evaluationover the entire item set, rather than sampling-based evaluation, for robust performance assessment.
6. Results & Analysis
6.1. Core Results Analysis
The experimental results demonstrate that CoGCL consistently achieves the best performance across all four public datasets compared to a wide range of baseline models.
The following are the results from Table 2 of the original paper:
| Dataset | Metric | BPR | GCMC | NGCF | DGCF | LightGCN | SimpleX | SLRec | SGL | NCL | HCCF | GFormer | SimGCL | LightGCL | CoGCL | Improv. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Traditional CF Models | CL-based Models | |||||||||||||||
| Instrument | Recall@5 | 0.0293 | 0.0334 | 0.0391 | 0.0401 | 0.0435 | 0.0386 | 0.0381 | 0.0449 | 0.0449 | 0.0456 | 0.0471 | 0.0470 | 0.0468 | 0.0515 | 9.34% |
| NDCG@5 | 0.0194 | 0.0218 | 0.0258 | 0.0269 | 0.0288 | 0.0244 | 0.0256 | 0.0302 | 0.0302 | 0.0303 | 0.0314 | 0.0316 | 0.0310 | 0.0345 | 9.18% | |
| Recall@10 | 0.0469 | 0.0532 | 0.0617 | 0.0628 | 0.0660 | 0.0631 | 0.0574 | 0.0692 | 0.0685 | 0.0703 | 0.0715 | 0.0717 | 0.0715 | 0.0788 | 9.90% | |
| NDCG@10 | 0.0250 | 0.0282 | 0.0331 | 0.0342 | 0.0361 | 0.0324 | 0.0319 | 0.0380 | 0.0377 | 0.0383 | 0.0393 | 0.0395 | 0.0391 | 0.0435 | 10.13% | |
| Recall@20 | 0.0705 | 0.0824 | 0.0929 | 0.0930 | 0.0979 | 0.0984 | 0.0820 | 0.1026 | 0.1011 | 0.1028 | 0.1041 | 0.1057 | 0.1042 | 0.1152 | 8.99% | |
| NDCG@20 | 0.0310 | 0.0357 | 0.0411 | 0.0419 | 0.0442 | 0.0413 | 0.0381 | 0.0466 | 0.0459 | 0.0466 | 0.0478 | 0.0482 | 0.0474 | 0.0526 | 9.13% | |
| Office | Recall@5 | 0.0204 | 0.0168 | 0.0178 | 0.0258 | 0.0277 | 0.0291 | 0.0294 | 0.0349 | 0.0293 | 0.0340 | 0.0353 | 0.0349 | 0.0338 | 0.0411 | 16.43% |
| NDCG@5 | 0.0144 | 0.0109 | 0.0116 | 0.0177 | 0.0186 | 0.0199 | 0.0209 | 0.0242 | 0.0201 | 0.0230 | 0.0245 | 0.0240 | 0.0232 | 0.0287 | 17.14% | |
| Recall@10 | 0.0285 | 0.0270 | 0.0279 | 0.0380 | 0.0417 | 0.0422 | 0.0402 | 0.0493 | 0.0434 | 0.0489 | 0.0492 | 0.0494 | 0.0490 | 0.0582 | 17.81% | |
| NDCG@10 | 0.0170 | 0.0141 | 0.0149 | 0.0217 | 0.0231 | 0.0241 | 0.0244 | 0.0289 | 0.0243 | 0.0282 | 0.0292 | 0.0289 | 0.0280 | 0.0343 | 17.47% | |
| Recall@20 | 0.0390 | 0.0410 | 0.0438 | 0.0544 | 0.0605 | 0.0602 | 0.0534 | 0.0681 | 0.0629 | 0.0677 | 0.0672 | 0.0689 | 0.0698 | 0.0785 | 12.46% | |
| NDCG@20 | 0.0197 | 0.0178 | 0.0189 | 0.0258 | 0.0279 | 0.0287 | 0.0277 | 0.0336 | 0.0292 | 0.0331 | 0.0338 | 0.0337 | 0.0332 | 0.0393 | 14.18% | |
| Gowalla | Recall@5 | 0.0781 | 0.0714 | 0.0783 | 0.0895 | 0.0946 | 0.0782 | 0.0689 | 0.1047 | 0.1040 | 0.0836 | 0.1042 | 0.1047 | 0.0947 | 0.1092 | 4.30% |
| NDCG@5 | 0.0707 | 0.0633 | 0.0695 | 0.0801 | 0.0854 | 0.0712 | 0.0613 | 0.0955 | 0.0933 | 0.0749 | 0.0935 | 0.0959 | 0.0860 | 0.0995 | 3.75% | |
| Recall@10 | 0.1162 | 0.1089 | 0.1150 | 0.1326 | 0.1383 | 0.1187 | 0.1045 | 0.1520 | 0.1508 | 0.1221 | 0.1515 | 0.1525 | 0.1377 | 0.1592 | 4.39% | |
| NDCG@10 | 0.0821 | 0.0749 | 0.0808 | 0.0932 | 0.0985 | 0.0834 | 0.0722 | 0.1092 | 0.1078 | 0.0866 | 0.1085 | 0.1100 | 0.0988 | 0.1145 | 4.09% | |
| Recall@20 | 0.1695 | 0.1626 | 0.1666 | 0.1914 | 0.2002 | 0.1756 | 0.1552 | 0.2160 | 0.2130 | 0.1794 | 0.2166 | 0.2181 | 0.1978 | 0.2253 | 3.30% | |
| NDCG@20 | 0.0973 | 0.0903 | 0.0956 | 0.1100 | 0.1161 | 0.0996 | 0.0868 | 0.1274 | 0.1254 | 0.1029 | 0.1271 | 0.1286 | 0.1159 | 0.1333 | 3.65% | |
| iFashion | Recall@5 | 0.0195 | 0.0240 | 0.0234 | 0.0297 | 0.0309 | 0.0345 | 0.0237 | 0.0377 | 0.0330 | 0.0419 | 0.0354 | 0.0401 | 0.0423 | 0.0463 | 9.46% |
| NDCG@5 | 0.0128 | 0.0156 | 0.0151 | 0.0197 | 0.0205 | 0.0231 | 0.0157 | 0.0252 | 0.0219 | 0.0280 | 0.0235 | 0.0267 | 0.0284 | 0.0310 | 9.15% | |
| Recall@10 | 0.0307 | 0.0393 | 0.0384 | 0.0459 | 0.0481 | 0.0525 | 0.0361 | 0.0574 | 0.0501 | 0.0636 | 0.0540 | 0.0608 | 0.0641 | 0.0696 | 8.58% | |
| NDCG@10 | 0.0164 | 0.0206 | 0.0199 | 0.0249 | 0.0260 | 0.0289 | 0.0198 | 0.0315 | 0.0274 | 0.0350 | 0.0294 | 0.0334 | 0.0354 | 0.0386 | 9.04% | |
| Recall@20 | 0.0470 | 0.0623 | 0.0608 | 0.0685 | 0.0716 | 0.0770 | 0.0535 | 0.0846 | 0.0742 | 0.0929 | 0.0790 | 0.0897 | 0.0932 | 0.1010 | 8.37% | |
| NDCG@20 | 0.0206 | 0.0264 | 0.0256 | 0.0307 | 0.0320 | 0.0351 | 0.0242 | 0.0384 | 0.0335 | 0.0425 | 0.0358 | 0.0407 | 0.0428 | 0.0465 | 8.64% | |
Key Observations from Overall Performance:
- CL-based Methods Outperform Traditional Methods: Generally,
contrastive learning (CL)-based methods (SGL, NCL, SimGCL, LightGCL, etc.) show superior performance compared totraditional matrix factorization (MF)methods (BPR, SimpleX) andGNN-onlymethods (NGCF, LightGCN). This confirms the value ofself-supervised signalsin alleviating data sparsity and enhancing representation learning for recommendation. - Varied Strengths of CL-based Methods:
SimGCL(a representation augmentation method) performs best among baselines onInstrumentandGowalla, suggesting thatrandom noisefor uniformity is effective in some contexts.GFormerandLightGCL(structure augmentation methods) are more competitive onOfficeandiFashionrespectively, implying that carefully designed structural augmentations can be beneficial.SGL(stochastic edge/node dropout) sometimes underperforms, reinforcing the paper's argument that naive structural perturbations can disrupt crucial information.
- CoGCL's Consistent Superiority: CoGCL consistently achieves the highest scores across all datasets and all metrics (
Recall@N,NDCG@N). The improvement percentages range from 3.30% to 17.81% over the best baseline, highlighting its robustness and effectiveness.- The largest improvements are seen on the
Officedataset (e.g., 17.81% in Recall@10, 17.47% in NDCG@10), an extremely sparse dataset (99.988% sparsity). This suggests CoGCL is particularly effective in highly sparse environments. - The improvements are attributed to CoGCL's ability to construct contrastive views with stronger collaborative information through
discrete codes,virtual neighbor augmentation, andsemantic relevance sampling, which collectively provide richer self-supervised signals.
- The largest improvements are seen on the
6.2. Ablation Studies / Parameter Analysis
6.2.1. Ablation Study of Data Augmentation
This study investigates the contribution of each specific data augmentation method within CoGCL. The following chart (Figure 4 from the original paper) shows the impact of different data augmentation methods on NDCG@10 for Instrument and Office datasets:
该图像是图表,展示了不同数据增强方法对 NDCG@10 的影响,左侧为 Instrument 数据集,右侧为 Office 数据集。图中包含 'w/o Replace'、'w/o Add'、'w/o Shared-C'、'w/o Shared-T' 和 'CoGCL' 的对比结果。
Alt text: Figure 4: Ablation study of data augmentation methods.
w/o Replace: Removes the "replace" operator invirtual neighbor augmentation.w/o Add: Removes the "add" operator invirtual neighbor augmentation.w/o Shared-C: Removessemantic relevance samplingbased onshared codes.w/o Shared-T: Removessemantic relevance samplingbased onshared interaction targets.
Analysis:
The results in Figure 4 consistently show that removing any of these data augmentation components leads to a decrease in performance across both Instrument and Office datasets. This empirically validates that:
- Both "replace" and "add" strategies in
virtual neighbor augmentationcontribute positively to the model's effectiveness, enriching the neighborhood structure. - Both
shared codesandshared targetsare important for identifyingsemantic relevanceand generating informative positive pairs. The decline in performance confirms that all designed data augmentation methods within CoGCL are useful and contribute to the overall performance improvement, suggesting they successfully introduce stronger collaborative information without disruption.
6.2.2. Ablation Study of Triple-View Graph Contrastive Learning
This study delves into the roles of alignment and uniformity within CoGCL's contrastive learning objectives ( and ). The paper uses variants where the gradient for either alignment or uniformity is stopped.
The following are the results from Table 3 of the original paper:
| Methods | Instrument | Office | ||
|---|---|---|---|---|
| Recall@10 | NDCG@10 | Recall@10 | NDCG@10 | |
| LightGCN | 0.0660 | 0.0361 | 0.0417 | 0.0231 |
| CoGCL | 0.0788 | 0.0435 | 0.0582 | 0.0343 |
| w/o A | 0.0726 | 0.0401 | 0.0490 | 0.0280 |
| w/o U | 0.0703 | 0.0384 | 0.0465 | 0.0267 |
| w/o AA | 0.0741 | 0.0411 | 0.0536 | 0.0315 |
| w/o AU | 0.0762 | 0.0421 | 0.0542 | 0.0306 |
| w/o SA | 0.0767 | 0.0422 | 0.0554 | 0.0329 |
| w/o SU | 0.0779 | 0.0429 | 0.0574 | 0.0336 |
Analysis:
- Overall Impact of CL Components: Compared to
LightGCN, CoGCL shows substantial improvements, demonstrating the effectiveness of the entiretriple-view graph contrastive learningapproach. - Importance of Alignment and Uniformity:
w/o A(disabling alignment in both and ): Leads to a significant performance drop (e.g., NDCG@10 drops from 0.0435 to 0.0401 on Instrument).w/o U(disabling uniformity in both and ): Also causes a notable performance degradation (e.g., NDCG@10 drops from 0.0435 to 0.0384 on Instrument). These results indicate that bothalignment(pulling positives together) anduniformity(spreading representations out) are crucial for CoGCL's effectiveness, unlike some previous findings where uniformity was primarily responsible for gains (as discussed in Section 2.2).
- Individual Contributions of Alignment:
w/o AA(disabling alignment for only): Shows a pronounced decrease in performance.w/o SA(disabling alignment for only): Also incurs a noticeable drop. This provides strong evidence that the explicitalignmentbetween the two types of positives (neighbor-augmented views and semantically relevant users/items) successfully introduces enhanced collaborative information into the model.
- Individual Contributions of Uniformity:
-
w/o AU(disabling uniformity for only) andw/o SU(disabling uniformity for only) lead to smaller performance drops compared to disabling alignment. The paper suggests this might be due to ashared uniformity effectbetween the two CL losses, where they mutually reinforce each other, meaning that some uniformity is still implicitly maintained even if one is explicitly disabled.Overall, the ablation study confirms that CoGCL's strength comes from a balanced interplay of alignment and uniformity, with its
collaborative information-awarealignment being a particularly strong driver of performance.
-
6.2.3. Performance Comparison w.r.t. Different Discrete Code Learning Methods
This analysis compares CoGCL's proposed end-to-end discrete code learning method with several alternatives. The following chart (Figure 5 from the original paper) shows the performance comparison of different discrete code learning methods:
该图像是图表,展示了不同离散编码学习方法在Instrument和Office数据集上的性能比较,包括Recall@10和NDCG@10指标。使用不同编码方法的结果显示,CoGCL方法在多个指标上表现优异。
Alt text: Figure 5: Performance comparison of different discrete code learning methods.
Non-Learnable Code: Uses theFaiss libraryto generate discrete codes from pre-trainedLightGCNembeddings. These codes are fixed during training.Euclidean Code: UsesEuclidean distanceinstead ofcosine similarityin the quantization step (Eq. 6), which is common in traditionalRQ.PQ Code: EmploysProduct Quantization (PQ)instead ofResidual Quantization (RQ)as the multi-level quantizer.CoGCL: Uses the proposed end-to-endRQwithcosine similarity.
Analysis:
-
Importance of Learnable Codes:
Non-Learnable Codeperforms worse than CoGCL. This highlights the importance of learning discrete codesend-to-endalongside the recommendation task. Allowing the codes to adapt and improve collaboratively with the GNN representations ensures they remain informative and reliable throughout training. Fixed codes might not capture the evolving collaborative semantics. -
Cosine vs. Euclidean Similarity:
Euclidean Codeperforms worse than CoGCL. This validates the design choice of usingcosine similarityfor quantization. By synchronizing the similarity measure in VQ with that used inInfoNCEloss, the discrete codes become more aligned with the geometric properties (directions) of the embedding space relevant for contrastive learning. -
RQ vs. PQ:
PQ Codealso underperforms CoGCL. This suggests thatResidual Quantization (RQ)is more suitable for CoGCL's objectives thanProduct Quantization (PQ). RQ establishesconditional probability relationshipsbetween codes at different levels (quantizing residuals), allowing for a more granular and refined semantic modeling than PQ, which treats sub-vectors (and thus their codes) independently.These results confirm the advancedness and effectiveness of CoGCL's specific approach to end-to-end discrete code learning, particularly the use of
RQandcosine similarity.
6.2.4. Performance Comparison w.r.t. Data Sparsity
This study examines CoGCL's robustness and effectiveness across different levels of data sparsity. Users are divided into five groups based on their number of interactions, with each group having the same number of users.
The following chart (Figure 6 from the original paper) shows the performance comparison on user groups with different sparsity levels:
该图像是一个条形图,展示了在用户组不同稀疏程度下,SimGCL和CoGCL的NDCG@10性能比较。左侧为Instrument数据集,右侧为Office数据集。图中可以看出,CoGCL在各稀疏程度下的表现优于SimGCL。
Alt text: Figure 6: Performance comparison on user groups with different sparsity levels.
- The x-axis represents user groups from
least sparsetomost sparse(Group 1 to Group 5). - The y-axis represents
NDCG@10. SimGCLis used as a strong baseline for comparison.
Analysis:
- CoGCL's Consistent Outperformance: CoGCL consistently outperforms
SimGCLacross all sparsity levels on bothInstrumentandOfficedatasets. - Superiority in High Sparsity: The performance gap between CoGCL and
SimGCLappears to be most significant for thehighly sparse user groups(Groups 4 and 5). For example, on theOfficedataset, CoGCL maintains a much higher NDCG@10 for the sparsest users. This phenomenon indicates that CoGCL is particularly effective in scenarios withsparse interactions. The ability to introduceadditional insightsthrough itscollaborative information-aware contrastive views(viavirtual neighbor augmentationandsemantic relevance sampling) allows it to learn high-quality representations even when direct interaction data is scarce. This directly addresses one of the core motivations of the paper.
6.2.5. Hyperparameter Tuning
6.2.5.1. CL loss coefficients and
The following chart (Figure 7 from the original paper) shows the performance comparison of different CL loss coefficients:
该图像是图表,展示了不同 CL 损失系数 oldsymbol{eta} 对 Instrument 和 Office 数据集的 Recall@10 和 NDCG@10 性能的影响。图中分别展示了不同参数下的回忆率和归一化折扣累积增益的变化趋势。
Alt text: Figure 7: Performance comparison of different CL loss coefficients.
- The chart shows the effect of varying (weight for ) and (weight for ) on
Recall@10andNDCG@10forInstrumentandOfficedatasets.
Analysis:
- Sensitivity to : The performance is sensitive to . Both
too largeandtoo smallvalues of lead to suboptimal performance. This indicates that thealignment between augmented viewsneeds to be carefully balanced; an overly strong or weak signal can degrade overall results. ForInstrument, optimal is 0.1; forOffice, it's 1.0. - Sensitivity to : An excessively large value of causes a
sharp dropin performance. This suggests that thealignment with semantically relevant users/itemsis very powerful but can be detrimental if overemphasized, potentially leading toover-clusteringorrepresentation collapseif not properly balanced. Optimal values are 0.02 forInstrumentand 0.2 forOffice. - Relative Magnitude: The optimal value of is generally
smallerthan that of . This implies that whilesemantic relevanceis crucial, its influence needs to be more carefully constrained compared to the structural augmentation. These findings emphasize the importance of careful hyperparameter tuning to achieve the optimal balance between therecommendation loss,discrete code learning,structural augmentation alignment, andsemantic relevance alignment.
6.2.5.2. Augmentation probabilities
The following chart (Figure 8 from the original paper) shows the performance comparison of different augmentation probabilities:
该图像是图表,展示了不同增强概率下的性能比较,包含两种情况:替代(replace)和添加(add)。图表中展示了两个数据集(Instrument 和 Office)的 Recall@10 和 NDCG@10 的变化情况,左侧为概率 'replace',右侧为概率 'add'。
Alt text: Figure 8: Performance comparison of different augmentation probabilities.
- The chart shows the effect of varying probabilities for the "replace" and "add" operators in
virtual neighbor augmentationonRecall@10andNDCG@10forInstrumentandOfficedatasets.
Analysis:
- Optimal Range: The performance degrades if the probabilities for either "replace" or "add" are
excessively highortoo low. This suggests there is an optimal range where the right amount of virtual neighbor augmentation is beneficial. - Dataset Specificity:
- For
Instrument, the ideal probability for "replace" is 0.3, and for "add" is 0.2. - For
Office, the optimal probability for "replace" is 0.2, and for "add" is 0.5. This indicates that the optimalaugmentation strategycan bedataset-dependent, likely reflecting differences in graph density and structure. A moderate level of augmentation is generally preferred to introduce richer context without overwhelming or distorting the original collaborative signals.
- For
6.2.6. Embedding Distribution w.r.t. Augmentation Ratio
This analysis uses t-SNE [43] and Gaussian Kernel Density Estimation (KDE) [3] to visualize the user embedding distributions under different augmentation ratios, providing an intuitive understanding of CoGCL's learned representations.
The following chart (Figure 9 from the original paper) shows the embedding distribution of different data augmentation ratios on Instrument dataset:
该图像是一个图表,展示了不同数据增强比例下的嵌入分布,包括LightGCN、CoGCL 2p、CoGCL、CoGCL 0.5p和SimGCL五种方法。颜色从绿色渐变为蓝色,表示嵌入密度的逐渐增加。
Alt text: Figure 9: Embedding distribution of different data augmentation ratios on Instrument dataset. The transition from green to blue signifies a gradual increase in embedding density.
LightGCN: Baseline without CL.SimGCL: CL baseline with random noise.CoGCL: Proposed method at optimal augmentation ratio.CoGCL 2p: CoGCL with probabilities (for both "replace" and "add") set to twice the optimal values.CoGCL 0.5p: CoGCL with probabilities set to half the optimal values.
Analysis:
-
Uniformity by CL: Both
CoGCLandSimGCLexhibitmore uniform embedding distributionscompared toLightGCN. This confirms thatcontrastive learninggenerally helps to spread out representations, preventingrepresentation collapseand improvinguniformity. -
CoGCL's Balance: CoGCL achieves a good
trade-off between clustering and uniformity. While it maintains uniformity, it also allows for meaningful clusters, suggesting it captures underlying collaborative semantics effectively. In contrast,SimGCLmight lean more heavily on uniformity through random noise, potentially at the cost of some semantic coherence. -
Effect of Augmentation Ratio: As the augmentation ratio
rises(fromCoGCL 0.5ptoCoGCLtoCoGCL 2p), the embeddings tend to exhibit amore clustered pattern. This indicates thathigher augmentation probabilities(i.e., more virtual neighbors or more semantic positives) lead to a stronger tendency forclusteringof similar users/items. This is likely because increased augmentation provides more positive signals, pushing related entities closer. However, too much clustering (as implied by the suboptimal performance at2pin the hyperparameter tuning) can be detrimental, suggesting thatover-augmentationmight merge distinct semantic groups.This visualization provides qualitative evidence that CoGCL learns high-quality, well-structured representations by effectively balancing the desire for uniform distribution with the need to cluster semantically similar entities, thanks to its reliable and informative augmentation strategies.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper introduces CoGCL, a novel framework aimed at enhancing graph contrastive learning (CL) for collaborative filtering (CF). The central idea is to generate reliable and informative contrastive views by leveraging discrete codes that are rich in collaborative information.
The key contributions and findings are:
- End-to-End Discrete Code Learning: A
multi-level vector quantizeris integrated into the training process to map user and item representations into discrete codes, specifically designed to capture fine-grained collaborative semantics. - Enhanced Contrastive View Generation: These discrete codes are used to:
- Perform
virtual neighbor augmentation, treating codes as virtual neighbors to enrich graph structure andalleviate data sparsity. - Enable
semantic relevance samplingby identifying users/items sharing codes or interaction targets, creating more meaningful positive pairs.
- Perform
- Triple-View Graph Contrastive Learning: A novel objective function is proposed to align representations from two augmented structural views and one semantically relevant view, effectively integrating diverse collaborative signals.
Extensive experiments across four public datasets demonstrate CoGCL's superior performance compared to state-of-the-art baselines. Further analyses confirm that all proposed components contribute positively, particularly highlighting CoGCL's robustness in
sparse data scenariosand its ability to achieve a better balance betweenalignmentanduniformityin the learned embeddings.
7.2. Limitations & Future Work
The authors explicitly mention that a primary direction for future work is to improve the scalability of CoGCL. This suggests that, while effective, the current framework might have computational bottlenecks that limit its application to extremely large-scale industrial scenarios or different recommendation tasks.
Specifically, they aim to extend CoGCL to:
-
Click-Through Rate (CTR) prediction: This task often involves rich feature sets beyond just user-item interactions and requires models to predict the probability of a click. Adapting CoGCL to handle auxiliary information and a different prediction objective would be a significant extension.
-
Sequential recommendation: This task focuses on predicting the next item a user will interact with, based on their historical sequence of interactions. Integrating the discrete codes and augmented views into sequential models would require adapting the graph construction and contrastive learning objectives to sequence dynamics.
Potential limitations implied by the paper's focus and future work include:
-
Computational Overhead: Learning and managing multi-level discrete codebooks ( parameters for codes, plus the additional graph propagation for augmented views) could introduce significant computational costs during training, especially with larger or . The scalability issue likely stems from this.
-
Hyperparameter Sensitivity: The ablation studies show that the model is sensitive to the
loss coefficients (\mu, \eta)andaugmentation probabilities. Optimal performance requires careful tuning, which can be resource-intensive. -
Code Cold-Start: While discrete codes alleviate data sparsity for existing users/items, the paper doesn't explicitly discuss how new users or items (with no interactions, thus no codes) are handled in the
discrete code learningphase.
7.3. Personal Insights & Critique
This paper presents a compelling and well-justified approach to enhancing graph contrastive learning for recommendation. The core insight—that arbitrary perturbations can harm collaborative information—is crucial, and the solution using learned discrete codes is elegant.
Strengths:
- Strong Motivation: The empirical analysis in Section 2.2, demonstrating the ineffectiveness of alignment in existing perturbation-based CL, effectively sets the stage for the proposed method.
- Principled Augmentation: Moving beyond random perturbations to
collaborative information-awareaugmentation (virtual neighbors, semantic relevance) is a significant conceptual advancement. It makes the self-supervised signals more meaningful. - Multi-Level Discrete Codes: The use of
multi-level vector quantizationallows for a fine-grained, learnable discretization of user/item semantics, which is a powerful way to capture and share collaborative knowledge. The choice ofcosine similarityin VQ aligns well with the downstream CL objectives. - Comprehensive Evaluation: The experiments are thorough, covering multiple datasets, a wide range of baselines, and detailed ablation studies.
Potential Issues/Areas for Improvement:
- Computational Efficiency: While the paper mentions that and , which keeps the codebook parameters manageable, the process of constructing augmented graphs with codes as virtual nodes and running GNNs on these expanded graphs could still be computationally demanding during training. Future work on scalability is indeed critical.
- Interpretability of Discrete Codes: While codes are "rich in collaborative information," the paper doesn't delve deeply into what specific collaborative patterns or semantic clusters these codes represent. Further analysis of the codebook embeddings or clusters could offer more interpretability.
- Generalizability of "H-1 Shared Codes" rule: The rule of "at least H-1 codes" for semantic relevance sampling is an empirical heuristic. While it worked well, exploring adaptive or more sophisticated ways to determine semantic similarity based on codes could be an interesting avenue.
- Dynamic Codebook: The codebook is updated once per epoch. Investigating more dynamic or online updating strategies for codebooks, especially in highly dynamic recommendation environments, might yield further benefits.
Transferability and Broader Impact:
The methodology of using learnable discrete codes for reliable and informative data augmentation has broad transferability.
-
Other Graph-based Tasks: This approach could be applied to other graph-based tasks beyond recommendation where
graph augmentationis used, but preserving underlying semantic relationships is critical (e.g., node classification, link prediction in knowledge graphs). -
Sequential Recommendation: As mentioned by the authors, discrete codes could simplify item representation in
sequential models, allowing for more efficient modeling of long sequences and capturing user interests at different levels of granularity. -
LLM-based Recommendation: The concept of discrete codes is already gaining traction in
Large Language Model (LLM)-based recommendation for indexing and representing items. CoGCL's method of learning these codes to embed collaborative semantics could further enhance LLM capabilities by providing more structured and semantically rich item representations. -
Fairness and Explainability: Discrete codes might offer new avenues for
explainable recommendationby mapping user/item properties to interpretable code combinations. They could also potentially be designed to encode diverse preferences, contributing tofairnessby ensuring broader representation.Overall, CoGCL offers a significant step forward in making
contrastive learninga more powerful and trustworthy tool forrecommender systems, particularly by addressing the often-overlooked problem of maintaining collaborative information integrity during view generation.
Similar papers
Recommended via semantic vector search.