AiPaper
Paper status: completed

Neural Graph Collaborative Filtering

Published:05/20/2019
Original LinkPDF
Price: 0.10
Price: 0.10
3 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

The Neural Graph Collaborative Filtering (NGCF) framework integrates user-item interactions by propagating embeddings on a graph, effectively capturing collaborative signals and demonstrating significant improvements across several benchmark datasets.

Abstract

Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user's (or an item's) embedding by mapping from pre-existing features that describe the user (or the item), such as ID and attributes. We argue that an inherent drawback of such methods is that, the collaborative signal, which is latent in user-item interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect. In this work, we propose to integrate the user-item interactions -- more specifically the bipartite graph structure -- into the embedding process. We develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the user-item graph structure by propagating embeddings on it. This leads to the expressive modeling of high-order connectivity in user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner. We conduct extensive experiments on three public benchmarks, demonstrating significant improvements over several state-of-the-art models like HOP-Rec and Collaborative Memory Network. Further analysis verifies the importance of embedding propagation for learning better user and item representations, justifying the rationality and effectiveness of NGCF. Codes are available at https://github.com/xiangwang1223/neural_graph_collaborative_filtering.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

Neural Graph Collaborative Filtering

1.2. Authors

Xiang Wang, Meng Wang, Xiangnan He, Fuli Feng, and Tat-Seng Chua. Their affiliations include National University of Singapore and University of Science and Technology of China, indicating a strong academic background in computer science, particularly in areas like recommender systems, data mining, and multimedia.

1.3. Journal/Conference

Published in the Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19), July 21-25, 2019, Paris, France. SIGIR is a highly reputable and influential conference in the field of information retrieval and recommender systems, known for publishing cutting-edge research.

1.4. Publication Year

2019

1.5. Abstract

The paper addresses a crucial limitation in modern recommender systems: existing methods for learning user and item vector representations (embeddings), such as matrix factorization (MF) and deep learning-based approaches, typically derive embeddings from pre-existing features like IDs and attributes. This approach, the authors argue, fails to explicitly encode the collaborative signal—the latent information within user-item interactions—into the embedding process itself. Consequently, the resulting embeddings may not adequately capture the collaborative filtering (CF) effect.

To overcome this, the paper proposes Neural Graph Collaborative Filtering (NGCF). This novel framework integrates the user-item bipartite graph structure directly into the embedding learning process by propagating embeddings along the graph. This propagation mechanism allows for the expressive modeling of high-order connectivity within the user-item graph, thereby explicitly injecting the collaborative signal into the embeddings. Extensive experiments on three public benchmark datasets demonstrate that NGCF significantly outperforms several state-of-the-art models, including HOP-Rec and Collaborative Memory Network. Further analysis confirms the critical role of embedding propagation in learning superior user and item representations, thus validating NGCF's effectiveness and underlying rationale.

Official Source: https://doi.org/10.1145/3331184.3331267 PDF Link: https://arxiv.org/pdf/1905.08108v2.pdf Publication Status: Officially published at SIGIR '19.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper addresses is the suboptimality of user and item embeddings in existing recommender systems. Modern recommender systems, ranging from traditional Matrix Factorization (MF) to recent deep learning approaches, primarily learn embeddings from static features like user/item IDs or attributes. The crucial collaborative signal, which inherently exists within user-item interactions and reveals behavioral similarities, is largely not explicitly encoded during the embedding generation process. Instead, these interactions are often only used to define the objective function for training, leaving the embedding function itself unaware of the rich relational information. This omission means that the embeddings might lack the necessary information to fully capture the collaborative filtering (CF) effect, requiring the interaction function to compensate for this deficiency.

This problem is important because the quality of embeddings directly impacts the accuracy and effectiveness of recommendations. If embeddings are insufficient, even sophisticated interaction functions might struggle to provide optimal predictions. The challenge lies in efficiently integrating the massive scale of user-item interactions (which can reach millions or more in real-world applications) into the embedding process in a meaningful way.

The paper's innovative idea or entry point is to explicitly leverage the high-order connectivity present in the user-item interaction graph. Instead of just using interactions for training a model that operates on static embeddings, NGCF proposes to propagate embeddings directly on this graph structure. This allows the embeddings to dynamically evolve and absorb collaborative signals from neighbors up to LL hops away, thereby enriching them with contextual relational information that was previously implicit or ignored.

2.2. Main Contributions / Findings

The paper makes the following primary contributions and key findings:

  1. Highlighting the Importance of Explicit Collaborative Signal: The authors explicitly identify and articulate the critical importance of incorporating the collaborative signal directly into the embedding function of model-based collaborative filtering (CF) methods. They argue that prior methods largely neglect this, leading to suboptimal embeddings.
  2. Introducing NGCF Framework: They propose Neural Graph Collaborative Filtering (NGCF), a novel recommendation framework built upon graph neural networks (GNNs). NGCF explicitly encodes the collaborative signal by modeling high-order connectivities through an embedding propagation mechanism on the user-item interaction graph.
  3. Empirical Validation and State-of-the-Art Performance: Through extensive experiments on three large-scale, real-world benchmark datasets (Gowalla, Yelp2018*, Amazon-Book), NGCF demonstrates significant performance improvements over several state-of-the-art CF models, including HOP-Rec and Collaborative Memory Network (CMN).
  4. Verification of Embedding Quality: Further analysis confirms that the embedding propagation process is crucial for learning superior user and item representations. This verifies the rationality and effectiveness of NGCF's design, showing that explicitly integrating graph structure improves embedding quality and subsequently recommendation accuracy, particularly for inactive users (alleviating the sparsity issue).

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To fully grasp Neural Graph Collaborative Filtering (NGCF), a reader should understand the following foundational concepts:

  • Recommender Systems: Systems designed to predict user preferences or ratings for items, helping users discover new items they might like. They are widely used in e-commerce, streaming services, and social media.
  • Collaborative Filtering (CF): A common technique in recommender systems that predicts user preferences by leveraging the preferences of similar users or the characteristics of similar items. The core idea is "users who liked X and Y also liked Z" (user-based CF) or "people who liked X also liked Y" (item-based CF).
  • Embeddings (Vector Representations): In machine learning, an embedding is a mapping from discrete objects (like users, items, words, etc.) to a continuous vector space. Each user or item is represented by a dense vector of real numbers. The idea is that objects with similar properties or relationships will have similar vector representations in this space. For example, if two users have similar tastes, their user embeddings should be close to each other.
  • Matrix Factorization (MF): A popular CF technique. It decomposes the sparse user-item interaction matrix (where rows are users, columns are items, and entries are ratings or implicit feedback) into two lower-rank matrices: a user matrix and an item matrix. Each row in the user matrix is a user embedding, and each column in the item matrix is an item embedding. The product of a user embedding and an item embedding (typically an inner product) reconstructs the original interaction.
  • Deep Learning: A subfield of machine learning that uses neural networks with multiple layers (hence "deep") to learn complex patterns from data. These networks can learn hierarchical representations and capture non-linear relationships.
  • Neural Networks: Computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers. Each connection has a weight, and neurons apply an activation function to the weighted sum of their inputs.
  • Activation Functions: Non-linear functions applied by neurons in a neural network. They introduce non-linearity, allowing the network to learn complex patterns. LeakyReLU (Leaky Rectified Linear Unit) is an activation function defined as f(x)=max(αx,x)f(x) = \max(\alpha x, x) where α\alpha is a small positive constant. It addresses the "dying ReLU" problem by allowing a small, non-zero gradient when the input is negative.
  • Optimization (Adam): An adaptive learning rate optimization algorithm used to train neural networks. It computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients. It's widely used due to its efficiency and good performance.
  • Regularization (L2 Regularization): A technique used to prevent overfitting in machine learning models. L2 regularization (also known as weight decay) adds a penalty term proportional to the square of the magnitude of the model's weights to the loss function. This encourages the model to use smaller weights, making it simpler and less prone to capturing noise in the training data.
  • Dropout: A regularization technique for neural networks where randomly selected neurons are "dropped out" (i.e., temporarily ignored along with their incoming and outgoing connections) during training. This prevents complex co-adaptations on training data and improves the model's generalization ability.
  • Graph Theory:
    • Bipartite Graph: A graph whose vertices can be divided into two disjoint and independent sets, UU (users) and II (items), such that every edge connects a vertex in UU to one in II. There are no edges within UU or within II. The user-item interaction network is naturally a bipartite graph.
    • High-order Connectivity: Refers to paths in a graph that involve multiple steps or "hops." In a user-item graph, a 1-hop connection is a direct interaction (user-item). A 2-hop connection (user-item-user) implies behavioral similarity between two users (they interacted with the same item). A 3-hop connection (user-item-user-item) suggests that a user might be interested in an item that a similar user has interacted with. These longer paths carry rich collaborative signals.
    • Adjacency Matrix (A\mathbf{A}): A square matrix used to represent a finite graph. If there are NN nodes, A\mathbf{A} is N×NN \times N. An entry Aij=1A_{ij}=1 if there is an edge from node ii to node jj, and 0 otherwise. For a user-item bipartite graph, a combined adjacency matrix can be formed that includes connections between users and items.
    • Degree Matrix (D\mathbf{D}): A diagonal matrix where DiiD_{ii} is the degree of vertex ii (the number of edges connected to it).
    • Graph Laplacian (L\mathcal{L}): For an unweighted graph, the graph Laplacian is defined as L=DA\mathcal{L} = \mathbf{D} - \mathbf{A}. A normalized version, often used in Graph Neural Networks, is L=D1/2AD1/2\mathcal{L} = \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}. This matrix is crucial for understanding the connectivity and structure of a graph and for spectral graph theory.
  • Graph Neural Networks (GNNs): A class of deep learning methods designed to perform inference on graph-structured data. They extend the idea of neural networks to graphs by iteratively aggregating information from a node's neighbors to update its representation.
  • Message Passing: A fundamental paradigm in GNNs. It involves nodes exchanging "messages" (information/embeddings) with their neighbors and then aggregating these messages to update their own representations. This process can be repeated over multiple layers, allowing information to propagate across longer distances in the graph.

3.2. Previous Works

The paper discusses existing work in three main categories, positioning NGCF in relation to them:

3.2.1. Model-based CF Methods

These methods parameterize users and items with vector representations (embeddings) and reconstruct interactions based on these parameters.

  • Matrix Factorization (MF) [20, 26]: The foundational model. It embeds user/item IDs as vectors and uses their inner product to predict interactions.

    • Formula (Simplified): For a user uu and item ii, the predicted rating r^ui\hat{r}_{ui} is given by r^ui=puqi\hat{r}_{ui} = \mathbf{p}_u^\top \mathbf{q}_i, where pu\mathbf{p}_u is the user embedding and qi\mathbf{q}_i is the item embedding.
  • Enhanced MF: Efforts to incorporate side information (e.g., item content [2, 30], social relations [34], item relations [37], user reviews [3], external knowledge graph [32, 35]) to enrich embeddings.

  • Neural Collaborative Filtering (NCF) methods (e.g., NeuMF [14]): Replace the inner product in MF with non-linear neural networks to model interactions, capturing more complex relationships.

    • Formula (Conceptual for NeuMF): y^ui=f(pu,qi)\hat{y}_{ui} = f(\mathbf{p}_u, \mathbf{q}_i), where ff is a multi-layer perceptron (MLP) that concatenates or element-wise multiplies user and item embeddings and processes them through non-linear layers. For example, f(pu,qi)=h(g(puqi))f(\mathbf{p}_u, \mathbf{q}_i) = h(g(\mathbf{p}_u \odot \mathbf{q}_i)) or h(g(puqi))h(g(\mathbf{p}_u \oplus \mathbf{q}_i)), where gg is an MLP layer, hh is the final output layer, \odot is element-wise product, and \oplus is concatenation.
  • Translation-based CF (e.g., LRML [28]): Use Euclidean distance metrics as the interaction function.

    NGCF's Critique: These methods primarily build embedding functions using descriptive features (ID, attributes) and use interactions only for the objective function. They lack explicit encoding of collaborative signals in the embedding process, meaning transitivity (e.g., if A likes B and B likes C, A might like C) is not guaranteed in the embedding space.

3.2.2. Graph-Based CF Methods

These methods explicitly use the user-item interaction graph to infer preferences.

  • Label Propagation methods (e.g., ItemRank [7], BiRank [12]): Propagate "labels" (e.g., interacted items) on the graph to score items for a user. These are essentially neighbor-based methods and lack model parameters to optimize a recommendation objective.
  • HOP-Rec [40]: A recent graph-based model that performs random walks on the graph to identify multi-hop connected items. These high-order neighbors are then used to enrich the training data for an MF model.
    • Key Idea: Augments the implicit feedback data by adding synthetic user-item interactions based on paths discovered by random walks. The core prediction model remains MF, but trained on an enriched dataset.
    • Formula (Conceptual): It would still use r^ui=puqi\hat{r}_{ui} = \mathbf{p}_u^\top \mathbf{q}_i, but the training pairs (u, i) would be expanded to include (u, i') where ii' is a high-order neighbor of uu. NGCF's Critique: HOP-Rec enriches training data but does not integrate high-order connectivity into the prediction model's embedding function itself. Its performance heavily relies on random walk parameters (e.g., decay factor).

3.2.3. Graph Convolutional Networks (GCNs) for Recommendation

These methods apply graph convolution operations to the user-item graph.

  • GC-MC [29]: Applies GCN [18] on a user-item graph, but only uses one convolutional layer, thus exploiting only first-order connections (direct user-item interactions).
    • GCN Layer (Simplified from Kipf & Welling [18]): H(l+1)=σ(D~12A~D~12H(l)W(l))\mathbf{H}^{(l+1)} = \sigma(\tilde{\mathbf{D}}^{-\frac{1}{2}}\tilde{\mathbf{A}}\tilde{\mathbf{D}}^{-\frac{1}{2}}\mathbf{H}^{(l)}\mathbf{W}^{(l)}), where A~\tilde{\mathbf{A}} is adjacency matrix with self-loops, D~\tilde{\mathbf{D}} is its degree matrix, H(l)\mathbf{H}^{(l)} is node features at layer ll, W(l)\mathbf{W}^{(l)} is weight matrix, and σ\sigma is activation. GC-MC adapts this to a bipartite graph.
  • PinSage [42]: An industrial solution from Pinterest. It employs multiple graph convolution layers but primarily on an item-item graph (derived from co-occurrence patterns) for item recommendation. The CF effect is captured at the item relation level, not directly on collective user behaviors.
    • Key Idea (from GraphSAGE [8]): Aggregates information from a node's local neighborhood. A generic GraphSAGE layer for node vv would involve: hN(v)(k)=AGGREGATEk({hu(k1),uN(v)})h_{\mathcal{N}(v)}^{(k)} = \text{AGGREGATE}_k(\{h_u^{(k-1)}, \forall u \in \mathcal{N}(v)\}) and hv(k)=σ(WkCONCAT(hv(k1),hN(v)(k)))h_v^{(k)} = \sigma(\mathbf{W}_k \cdot \text{CONCAT}(h_v^{(k-1)}, h_{\mathcal{N}(v)}^{(k)})). PinSage applies this on a large-scale item graph.
  • SpectralCF [43]: Proposes a spectral convolution operation to find connections between users and items in the spectral domain (using eigen-decomposition of the graph adjacency matrix).
    • Key Idea: Utilizes the eigenvectors and eigenvalues of the graph Laplacian to define convolution. NGCF's Critique: GC-MC is limited to first-order connectivity. PinSage focuses on item-item graphs. SpectralCF has high computational complexity due to eigen-decomposition, making it unsuitable for large-scale graphs.

3.3. Technological Evolution

The evolution of collaborative filtering (CF) has generally moved from simpler statistical methods to more complex model-based approaches:

  1. Early Methods (e.g., Neighborhood-based CF): Relied on calculating similarities between users or items directly from interaction patterns.

  2. Matrix Factorization (MF): Introduced the concept of embeddings for users and items, decomposing the interaction matrix to learn these latent factors. This marked a shift towards explicit model learning.

  3. MF with Side Information: Enhanced MF by incorporating additional data (e.g., item attributes, social networks) to enrich embeddings and alleviate sparsity.

  4. Neural Collaborative Filtering (NCF): Replaced the linear inner product interaction function of MF with non-linear neural networks, allowing for more expressive modeling of user-item relationships. This leveraged the power of deep learning for the interaction function.

  5. Graph-based Methods (e.g., HOP-Rec, early GCNs): Began to explicitly use the user-item interaction graph structure. HOP-Rec used random walks to augment training data, while GC-MC applied GCNs but often limited to first-order connections.

    NGCF's Position: NGCF represents a significant step in this evolution by integrating graph neural networks to enhance the embedding function itself, rather than just the interaction function or the training data. It leverages the message-passing mechanism on the user-item bipartite graph to explicitly capture high-order collaborative signals directly into the user and item embeddings. This means the embeddings are no longer static representations from features but are dynamically refined by the graph structure, making them inherently "collaborative-aware." This positions NGCF at the forefront of GNN-based CF models, focusing on richer, structural information within the embeddings.

3.4. Differentiation Analysis

NGCF distinguishes itself from previous methods primarily through its explicit and integrated approach to embedding the collaborative signal using high-order connectivity on the user-item graph.

  • Compared to MF and NeuMF:
    • Difference: MF and NeuMF learn embeddings primarily from ID mapping or static features, and use interactions only to train an interaction function (inner product for MF, MLP for NeuMF). The collaborative signal (e.g., transitivity like user A and user B both like item I, so A and B are similar) is only implicitly captured if the interaction function is powerful enough to reconstruct interactions perfectly.
    • Innovation: NGCF explicitly injects this collaborative signal into the embedding process itself by propagating embeddings across the user-item graph. This means the embeddings of users and items are constantly refined by their neighbors' embeddings, making them inherently aware of high-order relationships.
  • Compared to HOP-Rec:
    • Difference: HOP-Rec also considers high-order connectivity by performing random walks to enrich the training data, but its underlying prediction model remains MF. The high-order information is used to augment the dataset, not to modify the embedding function of the model.
    • Innovation: NGCF integrates high-order connectivity directly into the model's embedding function through embedding propagation layers. This leads to better embeddings and superior performance, as the collaborative signal becomes an intrinsic part of how embeddings are learned.
  • Compared to GC-MC and PinSage:
    • Difference: GC-MC applies GCNs but typically uses only one convolutional layer, thus capturing only first-order neighbors. PinSage uses multiple layers but operates on an item-item graph (capturing item relations) rather than the user-item bipartite graph for direct CF signals.
    • Innovation: NGCF uses multiple embedding propagation layers on the user-item bipartite graph, explicitly modeling high-order connectivities between users and items. Furthermore, NGCF uses a layer-aggregation mechanism (concatenation) to combine embeddings from different propagation layers, which PinSage typically doesn't, allowing NGCF to capture multi-grained representations. NGCF also incorporates an element-wise product (\odot) between user and item embeddings in its message construction, making messages dependent on the affinity between the source and target, which GC-MC lacks.
  • Compared to SpectralCF:
    • Difference: SpectralCF also aims to discover connectivity but does so in the spectral domain using eigen-decomposition of the graph adjacency matrix.
    • Innovation: NGCF operates efficiently in the spatial domain with an efficient matrix-form propagation rule, avoiding the high computational complexity of eigen-decomposition that makes SpectralCF impractical for large-scale recommendation.
  • Compared to SVD++ (as discussed in Section 2.5.1):
    • Difference: SVD++SVD++ augments user embeddings with implicitly feedback from items they interacted with. It's essentially a first-order model with a specific aggregation mechanism.

    • Innovation: NGCF generalizes SVD++SVD++. By setting L=1L=1 and simplifying its components (disabling transformation matrices and non-linear activation), NGCF can recover SVD++SVD++. This shows that NGCF's framework is more general and can extend beyond first-order connections to high-order ones.

      In summary, NGCF's core innovation lies in explicitly and effectively injecting collaborative signals derived from high-order graph connectivity directly into the embedding learning process through a scalable message-passing Graph Neural Network architecture.

4. Methodology

4.1. Principles

The core principle of Neural Graph Collaborative Filtering (NGCF) is to overcome the limitation of traditional recommender systems where the collaborative signal (information about behavioral similarities derived from user-item interactions) is not explicitly encoded within the embedding function. Instead, NGCF proposes to integrate the user-item interaction graph structure directly into the embedding process. The theoretical basis is that high-order connectivity within this graph (e.g., paths like User-Item-User or User-Item-User-Item) contains rich semantic information that reveals deeper collaborative filtering (CF) effects.

The intuition is that a user's preference is not only influenced by the items they directly interacted with (first-order connections) but also by the items that users similar to them interacted with (second-order connections), and so on. Similarly, an item's characteristics can be inferred not just from its direct users, but also from the users who interact with similar items. NGCF achieves this by using a message-passing mechanism inspired by Graph Neural Networks (GNNs). It iteratively refines user and item embeddings by propagating information (messages) between connected nodes (users and items) on the bipartite interaction graph. Each propagation layer allows embeddings to absorb information from further 'hops' in the graph, thus explicitly embedding the high-order collaborative signals.

4.2. Core Methodology In-depth (Layer by Layer)

The NGCF framework consists of three main components: an embedding layer, multiple embedding propagation layers, and a prediction layer.

4.2.1. Embedding Layer

The embedding layer serves as the initialization point for user and item representations.

  • Each user uu is associated with an initial embedding vector euRd\mathbf{e}_u \in \mathbb{R}^d.
  • Each item ii is associated with an initial embedding vector eiRd\mathbf{e}_i \in \mathbb{R}^d.
  • Here, dd denotes the embedding size (dimension).
  • These embeddings are stored in an embedding look-up table E\mathbf{E}: E=[eu1,,euN,ei1,,eiM] \mathbf{E} = \left[ \mathbf{e}_{u_1}, \dots, \mathbf{e}_{u_N}, \mathbf{e}_{i_1}, \dots, \mathbf{e}_{i_M} \right] where NN is the total number of users and MM is the total number of items. This E\mathbf{E} is the initial state for the embeddings and will be optimized end-to-end. Unlike traditional models where these ID embeddings might directly feed into an interaction layer, in NGCF, they are further refined by embedding propagation layers.

4.2.2. Embedding Propagation Layers

This is the core of NGCF, where collaborative signals are captured and embeddings are refined through message passing on the user-item graph.

4.2.2.1. First-order Propagation

This stage describes how embeddings are refined by considering first-order neighbors (direct connections). It involves two main operations: message construction and message aggregation.

  • Message Construction: For a directly connected user-item pair (u, i), a message embedding mui\mathbf{m}_{u \leftarrow i} is constructed, representing the information propagated from item ii to user uu. The function f()f(\cdot) encodes this message, taking ei\mathbf{e}_i (item embedding) and eu\mathbf{e}_u (user embedding) as input, and includes a coefficient ϕui\phi_{ui} to control the decay factor during propagation on the edge (u, i). The paper implements f()f(\cdot) as: mui=1NuNi(W1ei+W2(eieu)) \mathbf{m}_{u \leftarrow i} = \frac{1}{\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}} \left( \mathbf{W}_1 \mathbf{e}_i + \mathbf{W}_2 (\mathbf{e}_i \odot \mathbf{e}_u) \right)

    • mui\mathbf{m}_{u \leftarrow i}: The message embedding from item ii to user uu.
    • Nu\mathcal{N}_u: The set of first-hop neighbors (items) of user uu.
    • Ni\mathcal{N}_i: The set of first-hop neighbors (users) of item ii.
    • 1NuNi\frac{1}{\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}}: This is the coefficient ϕui\phi_{ui}, derived from the graph Laplacian norm. It acts as a discount factor, meaning messages should decay with path length and connectivity strength.
    • W1,W2Rd×d\mathbf{W}_1, \mathbf{W}_2 \in \mathbb{R}^{d' \times d}: Trainable weight matrices that transform the embeddings. dd' is the transformation size.
    • ei,eu\mathbf{e}_i, \mathbf{e}_u: Initial embedding vectors of item ii and user uu, respectively.
    • \odot: The element-wise product (Hadamard product) between ei\mathbf{e}_i and eu\mathbf{e}_u. This term is crucial as it makes the message dependent on the affinity between the item and user, allowing more messages to pass from similar items.
  • Message Aggregation: After messages are constructed, they are aggregated from a node's neighborhood to refine its representation. For user uu, the aggregation function for its first-layer representation eu(1)\mathbf{e}_u^{(1)} is: eu(1)=LeakyReLU(muu+iNumui) \mathbf{e}_u^{(1)} = \mathrm{LeakyReLU} \left( \mathbf{m}_{u \leftarrow u} + \sum_{i \in \mathcal{N}_u} \mathbf{m}_{u \leftarrow i} \right)

    • eu(1)\mathbf{e}_u^{(1)}: The representation of user uu after the first embedding propagation layer.
    • LeakyReLU()\mathrm{LeakyReLU}(\cdot): The Leaky Rectified Linear Unit activation function, which allows both positive and small negative signals to be encoded.
    • iNumui\sum_{i \in \mathcal{N}_u} \mathbf{m}_{u \leftarrow i}: The sum of messages propagated from all items ii in user uu's neighborhood Nu\mathcal{N}_u.
    • muu\mathbf{m}_{u \leftarrow u}: A self-connection term, defined as W1eu\mathbf{W}_1 \mathbf{e}_u. This term ensures that the user's original features (initial embedding) are retained and contribute to the refined representation. The weight matrix W1\mathbf{W}_1 is shared with the first term in message construction.
    • A similar process is applied to obtain ei(1)\mathbf{e}_i^{(1)} for an item ii. This explicitly uses first-order connectivity to relate user and item representations.

4.2.2.2. High-order Propagation

To capture high-order connectivity and more complex collaborative signals, NGCF stacks multiple embedding propagation layers.

  • By stacking LL layers, a user (or item) can receive messages from its L-hop neighbors.

  • For the ll-th step (layer), the representation of user uu is recursively formulated as: eu(l)=LeakyReLU(muu(l)+iNumui(l)) \mathbf{e}_u^{(l)} = \mathrm{LeakyReLU} \left( \mathbf{m}_{u \leftarrow u}^{(l)} + \sum_{i \in \mathcal{N}_u} \mathbf{m}_{u \leftarrow i}^{(l)} \right) wherein the messages being propagated are defined using the representations from the previous layer (l-1): mui(l)=1NuNi(W1(l)ei(l1)+W2(l)(ei(l1)eu(l1)))muu(l)=W1(l)eu(l1) \begin{array}{rl} \mathbf{m}_{u \leftarrow i}^{(l)} &= \frac{1}{\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}} \left( \mathbf{W}_1^{(l)} \mathbf{e}_i^{(l-1)} + \mathbf{W}_2^{(l)} (\mathbf{e}_i^{(l-1)} \odot \mathbf{e}_u^{(l-1)}) \right) \\ \mathbf{m}_{u \leftarrow u}^{(l)} &= \mathbf{W}_1^{(l)} \mathbf{e}_u^{(l-1)} \end{array}

    • eu(l)\mathbf{e}_u^{(l)}: The representation of user uu after the ll-th layer.
    • mui(l)\mathbf{m}_{u \leftarrow i}^{(l)}: Message from item ii to user uu at layer ll.
    • muu(l)\mathbf{m}_{u \leftarrow u}^{(l)}: Self-connection message for user uu at layer ll.
    • W1(l),W2(l)Rdl×dl1\mathbf{W}_1^{(l)}, \mathbf{W}_2^{(l)} \in \mathbb{R}^{d_l \times d_{l-1}}: Trainable transformation matrices for layer ll. dld_l is the transformation size for layer ll.
    • ei(l1),eu(l1)\mathbf{e}_i^{(l-1)}, \mathbf{e}_u^{(l-1)}: Item and user representations obtained from the previous layer (l-1). These representations already incorporate messages from (l-1)-hop neighbors.
    • The term 1NuNi\frac{1}{\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}} remains the same as in first-order propagation as it is a structural coefficient. This recursive process allows information to flow across high-order connectivities, explicitly injecting collaborative signals into the representation learning. An analogous process occurs for items.
  • Propagation Rule in Matrix Form: For efficient batch implementation, the layer-wise propagation rule (Equations (5) and (6)) can be expressed in matrix form: E(l)=LeakyReLU((L+I)E(l1)W1(l)+L(E(l1)E(l1))W2(l)) \mathbf{E}^{(l)} = \mathrm{LeakyReLU} \left( (\mathcal{L} + \mathbf{I}) \mathbf{E}^{(l-1)} \mathbf{W}_1^{(l)} + \mathcal{L} (\mathbf{E}^{(l-1)} \odot \mathbf{E}^{(l-1)}) \mathbf{W}_2^{(l)} \right)

    • E(l)R(N+M)×dl\mathbf{E}^{(l)} \in \mathbb{R}^{(N+M) \times d_l}: The matrix containing the representations for all users and items after ll steps of embedding propagation.
    • E(0)\mathbf{E}^{(0)}: The initial embedding matrix E\mathbf{E} from the embedding layer, where eu(0)=eu\mathbf{e}_u^{(0)} = \mathbf{e}_u and ei(0)=ei\mathbf{e}_i^{(0)} = \mathbf{e}_i.
    • I\mathbf{I}: An identity matrix of appropriate size, representing the self-connections.
    • L\mathcal{L}: The Laplacian matrix for the user-item graph. It is formulated using the adjacency matrix A\mathbf{A} and degree matrix D\mathbf{D}: L=D12AD12andA=[0RR0] \mathcal{L} = \mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}} \quad \text{and} \quad \mathbf{A} = \begin{bmatrix} 0 & \mathbf{R} \\ \mathbf{R}^\top & 0 \end{bmatrix}
      • RRN×M\mathbf{R} \in \mathbb{R}^{N \times M}: The user-item interaction matrix.
      • 0: An all-zero matrix.
      • A\mathbf{A}: The adjacency matrix of the combined user-item graph.
      • D\mathbf{D}: The diagonal degree matrix, where Dtt=NtD_{tt} = \lvert \mathcal{N}_t \rvert (the degree of node tt).
      • The non-zero off-diagonal entries of L\mathcal{L} are Lui=1/NuNi\mathcal{L}_{ui} = 1/\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}, which corresponds to the ϕui\phi_{ui} coefficient used in message construction.
    • W1(l),W2(l)\mathbf{W}_1^{(l)}, \mathbf{W}_2^{(l)}: Trainable weight matrices for layer ll.
    • \odot: Element-wise product between the two E(l1)\mathbf{E}^{(l-1)} matrices, which applies the interaction term across all user and item embeddings simultaneously. This matrix form allows for efficient computation without explicit node sampling typically required in large-scale GCNs.

4.2.3. Model Prediction

After LL embedding propagation layers, multiple representations are obtained for each user uu (i.e., {eu(0),eu(1),,eu(L)}\{ \mathbf{e}_u^{(0)}, \mathbf{e}_u^{(1)}, \dots, \mathbf{e}_u^{(L)} \}) and for each item ii (i.e., {ei(0),ei(1),,ei(L)}\{ \mathbf{e}_i^{(0)}, \mathbf{e}_i^{(1)}, \dots, \mathbf{e}_i^{(L)} \}).

  • Since representations from different layers capture different orders of connectivity and contribute uniquely to user preference, they are combined.
  • The paper uses concatenation to form the final embedding for a user and an item: eu=eu(0)eu(L)ei=ei(0)ei(L) \mathbf{e}_u^* = \mathbf{e}_u^{(0)} \parallel \dots \parallel \mathbf{e}_u^{(L)} \\ \mathbf{e}_i^* = \mathbf{e}_i^{(0)} \parallel \dots \parallel \mathbf{e}_i^{(L)}
    • \parallel: The concatenation operation. This layer-aggregation mechanism enriches the initial embeddings and allows the model to control the range of propagation by adjusting LL. It's a simple yet effective choice as it involves no additional parameters.
  • Finally, the user's preference towards a target item is estimated using an inner product between their final embeddings: y^NGCF(u,i)=(eu)ei \hat{y}_{\mathrm{NGCF}}(u, i) = (\mathbf{e}_u^*)^\top \mathbf{e}_i^*
    • y^NGCF(u,i)\hat{y}_{\mathrm{NGCF}}(u, i): The predicted affinity score for user uu and item ii.
    • ()(\cdot)^\top: Transpose operation. The paper emphasizes embedding function learning and uses a simple inner product for interaction function, leaving more complex options for future work.

4.2.4. Optimization

To train the NGCF model, the pairwise BPR (Bayesian Personalized Ranking) loss [26] is optimized. BPR is widely used in recommender systems for implicit feedback, assuming that observed interactions should have higher prediction values than unobserved ones.

  • The objective function is: Loss=(u,i,j)Olnσ(y^uiy^uj)+λΘ22 Loss = \sum_{(u, i, j) \in O} - \ln \sigma(\hat{y}_{ui} - \hat{y}_{uj}) + \lambda \lVert \Theta \rVert_2^2
    • O={(u,i,j)(u,i)R+,(u,j)R}O = \{ (u, i, j) \mid (u, i) \in \mathcal{R}^+, (u, j) \in \mathcal{R}^- \} : The set of pairwise training data.
      • R+\mathcal{R}^+: The set of observed user-item interactions (positive instances).
      • R\mathcal{R}^-: The set of unobserved user-item interactions (negative instances), typically generated by negative sampling.
      • (u, i, j): A triplet indicating that user uu prefers item ii over item jj.
    • σ()\sigma(\cdot): The sigmoid function, which squashes values between 0 and 1.
    • y^ui\hat{y}_{ui}: The predicted score for user uu and item ii.
    • y^uj\hat{y}_{uj}: The predicted score for user uu and item jj.
    • lnσ(y^uiy^uj)- \ln \sigma(\hat{y}_{ui} - \hat{y}_{uj}): The log-loss for the BPR objective, which maximizes the difference between positive and negative item scores.
    • λ\lambda: The coefficient controlling the strength of L2 regularization.
    • Θ22\lVert \Theta \rVert_2^2: The L2 norm of all trainable model parameters Θ\Theta.
    • Θ={E,{W1(l),W2(l)}l=1L}\Theta = \{ \mathbf{E}, \{ \mathbf{W}_1^{(l)}, \mathbf{W}_2^{(l)} \}_{l=1}^L \}: The set of all trainable parameters, including the initial embedding matrix and all weight matrices from the LL propagation layers.
  • The model is optimized using mini-batch Adam [17]. For each mini-batch of sampled triplets, the user and item representations are established through LL propagation steps, and then model parameters are updated based on the gradients of the loss function.

4.2.4.1. Model Size

Despite generating an embedding matrix at each layer, NGCF introduces very few additional parameters compared to MF.

  • The additional parameters are primarily the two weight matrices {W1(l),W2(l)}\{ \mathbf{W}_1^{(l)}, \mathbf{W}_2^{(l)} \} for each of the LL propagation layers.
  • Total additional parameters: 2L×dl×dl12L \times d_l \times d_{l-1}.
  • Since LL is typically small (e.g., < 5) and dld_l (transformation size) is often the embedding size (much smaller than number of users/items), this additional cost is negligible. For instance, on a dataset like Gowalla with 64-dim embeddings and 3 layers, NGCF adds only 0.024 million parameters compared to MF's 4.5 million.

4.2.4.2. Message and Node Dropout

To combat overfitting, two dropout techniques are employed:

  • Message Dropout: Randomly drops out outgoing messages during propagation.
    • For the ll-th propagation layer, messages mui(l)\mathbf{m}_{u \leftarrow i}^{(l)} (Equation 6) are dropped with a probability p1p_1. This means only a partial set of messages contributes to refined representations, making the model more robust to individual connections.
  • Node Dropout: Randomly blocks a particular node and discards all its outgoing messages.
    • For the ll-th layer, (M+N)p2(M+N)p_2 nodes (from the total N+MN+M users and items) are randomly dropped from the Laplacian matrix. This reduces the influence of specific users or items, enhancing robustness.
  • Dropout is only used during training and disabled during testing. Message dropout strengthens representation robustness against single connections, while node dropout addresses the influence of specific nodes.

4.2.5. Discussions

4.2.5.1. NGCF Generalizes SVD++

The paper claims that NGCF can generalize SVD++SVD++ [19], a well-known collaborative filtering model that augments user representations with implicit feedback from items they interacted with.

  • To recover a simplified version of SVD++SVD++ (termed NGCF-SVD), NGCF is configured as follows:
    • Set the number of propagation layers L=1L=1.
    • Disable the transformation matrices (W1,W2\mathbf{W}_1, \mathbf{W}_2) and the non-linear activation function (LeakyReLU\mathrm{LeakyReLU}).
    • Under these conditions, the representations eu(1)\mathbf{e}_u^{(1)} and ei(1)\mathbf{e}_i^{(1)} after the first layer simplify.
  • The simplified model NGCF-SVD can be formulated as: y^NGCFSVD=(eu+iNuPuiei)(ei+uNiPiueu) \hat{y}_{\mathrm{NGCF-SVD}} = \left( \mathbf{e}_u + \sum_{i' \in \mathcal{N}_u} \mathcal{P}_{ui'} \mathbf{e}_{i'} \right)^\top \left( \mathbf{e}_i + \sum_{u' \in \mathcal{N}_i} \mathcal{P}_{iu'} \mathbf{e}_{u'} \right)
    • y^NGCFSVD\hat{y}_{\mathrm{NGCF-SVD}}: Predicted score by the NGCF-SVD model.
    • eu,ei\mathbf{e}_u, \mathbf{e}_i: Initial user and item embeddings.
    • Nu\mathcal{N}_u: Set of items user uu interacted with.
    • Ni\mathcal{N}_i: Set of users who interacted with item ii.
    • Pui,Piu\mathcal{P}_{ui'}, \mathcal{P}_{iu'}: Coefficients controlling the contribution of neighboring item ii' to user uu's representation, and neighboring user uu' to item ii's representation, respectively.
  • By setting Pui=1/Nu\mathcal{P}_{ui'} = 1/\sqrt{\lvert \mathcal{N}_u \rvert} (which is the decay factor from the message construction) and Piu=0\mathcal{P}_{iu'} = 0 (or a similar specific decay), NGCF-SVD can exactly recover the SVD++SVD++ model.
  • Furthermore, FISM [16] (a widely used item-based CF model) can also be seen as a special case where Piu\mathcal{P}_{iu'} is always 0, effectively considering only the item-side aggregation. This demonstrates NGCF's generality and its ability to encompass existing first-order models within its framework.

4.2.5.2. Time Complexity Analysis

The computational complexity of NGCF is primarily determined by its layer-wise propagation rule (Equation 7).

  • For the ll-th propagation layer, the main operation is matrix multiplication involving the Laplacian matrix and embedding matrices.
  • The computational complexity of this operation is O(R+dldl1)O(\lvert \mathcal{R}^+ \rvert d_l d_{l-1}), where R+\lvert \mathcal{R}^+ \rvert is the number of non-zero entries in the Laplacian matrix (which corresponds to the number of observed interactions), dld_l is the current transformation size, and dl1d_{l-1} is the previous transformation size.
  • For the prediction layer, only an inner product is involved.
  • Therefore, the overall complexity for evaluating NGCF during training for one epoch is: O(l=1LR+dldl1+l=1LR+dl) O \left( \sum_{l=1}^L \lvert \mathcal{R}^+ \rvert d_l d_{l-1} + \sum_{l=1}^L \lvert \mathcal{R}^+ \rvert d_l \right)
  • Empirically, the paper notes that for the Gowalla dataset:
    • Training: MF costs about 20s per epoch, while NGCF costs about 80s per epoch.
    • Inference: MF costs about 80s for all testing instances, while NGCF costs about 260s. This shows that while NGCF is more computationally intensive than MF due to LL propagation layers, it remains tractable.

5. Experimental Setup

5.1. Datasets

The experiments were conducted on three publicly accessible, real-world benchmark datasets that vary in domain, size, and sparsity. For all datasets, a 10-core setting was applied, meaning only users and items with at least ten interactions were retained to ensure data quality. For each dataset, 80%80\% of historical interactions per user were used for the training set, and the remaining 20%20\% for the test set. From the training set, 10%10\% of interactions were randomly selected as a validation set for hyperparameter tuning. For each observed user-item interaction (positive instance), negative sampling was performed to pair it with one unobserved item (negative instance).

  • Gowalla:
    • Source: Check-in dataset from Gowalla, a location-based social networking service.
    • Characteristics: Captures user check-ins at various locations.
    • Domain: Location-based social interactions.
  • Yelp2018:*
    • Source: Adopted from the 2018 edition of the Yelp challenge.
    • Characteristics: Local businesses (restaurants, bars) are treated as items.
    • Domain: Business reviews and local services.
  • Amazon-Book:
    • Source: Part of the Amazon-review dataset, specifically focusing on book reviews.

    • Characteristics: User ratings/reviews for books.

    • Domain: E-commerce product reviews (books).

      The following are the results from Table 1 of the original paper:

      Dataset #Users #Items #Interactions Density
      Gowalla 29,858 40,981 1,027,370 0.00084
      Yelp2018* 31,668 38,048 1,561,406 0.00130
      Amazon-Book 52,643 91,599 2,984,108 0.00062

These datasets were chosen because they are widely accessible, cover different domains, and represent various scales and sparsity levels, making them effective for validating the generality and performance of recommender systems. No concrete example of a data sample (e.g., a specific check-in record or review text) was provided in the paper.

5.2. Evaluation Metrics

To evaluate the effectiveness of top-KK recommendation and preference ranking, two widely used evaluation protocols are adopted: recall@K and ndcg@K. For the experiments, KK is set to 20 by default. The average metrics for all users in the test set are reported.

5.2.1. Recall@K

  • Conceptual Definition: Recall@K measures the proportion of relevant items (items a user actually interacted with in the test set) that are successfully retrieved among the top KK recommended items. It focuses on how many of the "good" items were found within the recommendation list, regardless of their rank within that list.
  • Mathematical Formula: Recall@K=1UtestuUtestRelevantItemsuTopKRecommendationsuRelevantItemsu \text{Recall@K} = \frac{1}{|U_{test}|} \sum_{u \in U_{test}} \frac{|\text{RelevantItems}_u \cap \text{TopKRecommendations}_u|}{|\text{RelevantItems}_u|}
  • Symbol Explanation:
    • UtestU_{test}: The set of users in the test set.
    • |\cdot|: Denotes the cardinality (number of elements) of a set.
    • RelevantItems_u: The set of items that user uu interacted with in the test set (ground truth).
    • TopKRecommendations_u: The set of the top KK items recommended to user uu by the model.
    • \cap: Set intersection.
    • uUtest\sum_{u \in U_{test}}: Summation over all users in the test set.

5.2.2. NDCG@K (Normalized Discounted Cumulative Gain at K)

  • Conceptual Definition: NDCG@K is a measure of ranking quality, considering not only the presence of relevant items but also their position in the ranked list. It assigns higher scores to relevant items that appear earlier (at higher ranks) in the recommendation list. It is "normalized" by comparing the calculated DCG to the ideal DCG (where all relevant items are perfectly ranked at the top) to produce a score between 0 and 1.
  • Mathematical Formula: DCG@K=j=1K2relj1log2(j+1)IDCG@K=j=1min(RelevantItemsu,K)2relj1log2(j+1)NDCG@K=1UtestuUtestDCG@KuIDCG@Ku \text{DCG@K} = \sum_{j=1}^K \frac{2^{\text{rel}_j} - 1}{\log_2(j+1)} \\ \text{IDCG@K} = \sum_{j=1}^{\min(|\text{RelevantItems}_u|, K)} \frac{2^{\text{rel}_j} - 1}{\log_2(j+1)} \\ \text{NDCG@K} = \frac{1}{|U_{test}|} \sum_{u \in U_{test}} \frac{\text{DCG@K}_u}{\text{IDCG@K}_u}
  • Symbol Explanation:
    • relj\text{rel}_j: The relevance score of the item at position jj in the ranked list. In implicit feedback scenarios, this is typically 1 if the item is relevant (observed in the test set) and 0 otherwise.
    • jj: The rank position of an item in the recommendation list (from 1 to KK).
    • log2(j+1)\log_2(j+1): A logarithmic discount factor, penalizing relevant items at lower ranks.
    • DCG@Ku\text{DCG@K}_u: Discounted Cumulative Gain for user uu at rank KK.
    • IDCG@Ku\text{IDCG@K}_u: Ideal Discounted Cumulative Gain for user uu at rank KK, calculated by placing all relevant items in the test set at the top of the list in decreasing order of relevance (which is 1 for all in implicit feedback).
    • min(RelevantItemsu,K)\min(|\text{RelevantItems}_u|, K): Ensures that IDCG@K sums up to KK or the total number of relevant items if less than KK.
    • UtestU_{test}: The set of users in the test set.
    • |\cdot|: Denotes the cardinality of a set.
    • uUtest\sum_{u \in U_{test}}: Summation over all users in the test set.

5.3. Baselines

NGCF was compared against several representative state-of-the-art collaborative filtering methods:

  • MF (Matrix Factorization) [26]:
    • Description: A basic MF model optimized with Bayesian Personalized Ranking (BPR) loss. It models user-item direct interactions as the target value for its interaction function (inner product).
    • Representativeness: A fundamental and widely used baseline in CF.
  • NeuMF (Neural Matrix Factorization) [14]:
    • Description: A state-of-the-art neural CF model that uses multiple hidden layers above element-wise product and concatenation of user/item embeddings to capture non-linear feature interactions. A two-layered architecture is used with consistent hidden dimension size.
    • Representativeness: A strong baseline representing deep learning advancements in CF's interaction function.
  • CMN (Collaborative Memory Network) [5]:
    • Description: A state-of-the-art memory-based model. It constructs user representations by attentively combining memory slots of neighboring users via memory layers. It uses first-order connections to find similar users.
    • Representativeness: A strong memory network based approach that explicitly considers neighbors.
  • HOP-Rec (High-Order Proximity for Implicit Recommendation) [40]:
    • Description: A state-of-the-art graph-based model. It utilizes random walks to derive high-order neighbors, which are then used to enrich the user-item interaction data for training an MF model with BPR.
    • Representativeness: A direct competitor that also attempts to leverage high-order connectivity, but in a different manner.
  • PinSage [42]:
    • Description: An industrial GraphSAGE [8] implementation for item-item graphs (Pinterest image recommendation). The paper adapts it to the user-item interaction graph, using two graph convolution layers with hidden dimension equal to embedding size.
    • Representativeness: A strong GNN-based model, particularly for large-scale recommendation.
  • GC-MC (Graph Convolutional Matrix Completion) [29]:
    • Description: Applies GCN [18] as an encoder for user and item representations, but it primarily considers first-order neighbors with one graph convolution layer.
    • Representativeness: An early and influential GCN-based CF model.
  • SpectralCF [43]:
    • Description: This model was considered but not selected for comparison due to its high computational and resource cost associated with eigen-decomposition of the graph adjacency matrix, making it impractical for large-scale datasets.
    • Representativeness: A spectral GCN approach, excluded due to practical limitations.

Parameter Settings:

  • Embedding size: Fixed to 64 for all models.
  • HOP-Rec: Random walk steps searched in {1,2,3}\{1, 2, 3\}; learning rate in {0.025,0.020,0.015,0.010}\{0.025, 0.020, 0.015, 0.010\}.
  • Optimizer: Adam for all models except HOP-Rec.
  • Batch size: Fixed at 1024.
  • Hyperparameters (grid search):
    • Learning rate: {0.0001,0.0005,0.001,0.005}\{0.0001, 0.0005, 0.001, 0.005\}.
    • L2 regularization coefficient (λ\lambda): {105,104,,101,102}\{10^{-5}, 10^{-4}, \dots, 10^1, 10^2\}.
    • Dropout ratio (for message dropout p1p_1 and node dropout p2p_2): {0.0,0.1,,0.8}\{0.0, 0.1, \dots, 0.8\}.
  • Initializer: Xavier initializer [6] for model parameters.
  • Early stopping: Applied if recall@20 on the validation data does not improve for 50 successive epochs.
  • NGCF depth (L): Set to 3 for modeling third-order connectivity (unless specified otherwise for ablation studies).
  • Default dropout for NGCF: node dropout ratio 0.0, message dropout ratio 0.1.

6. Results & Analysis

6.1. Core Results Analysis

6.1.1. Overall Performance Comparison

The paper conducted a comprehensive comparison of NGCF against several baselines on three datasets. The results, summarized in Table 2, demonstrate NGCF's superior performance across all benchmarks.

The following are the results from Table 2 of the original paper:

Gowalla Yelp2018* Amazon-Book
recall ndcg recall ndcg recall ndcg
MF 0.1291 0.1109 0.0433 0.0354 0.0250 0.0196
NeuMF 0.1399 0.1212 0.0451 0.0363 0.0258 0.0200
CMN 0.1405 0.1221 0.0457 0.0369 0.0267 0.0218
HOP-Rec 0.1399 0.1214 0.0517 0.0428 0.0309 0.0232
GC-MC 0.1395 0.1204 0.0462 0.0379 0.0288 0.0224
PinSage 0.1380 0.1196 0.0471 0.0393 0.0282 0.0219
NGCF-3 0.1569* 0.1327* 0.0579* 0.0477* 0.0337* 0.0261*
%mprov. 11.68% 8.64% 11.97% 11.29% 9.61% 12.50%
p-value 2.01e-7 3.03e-3 5.34e-3 4.62e-4 3.48e-5 1.26e-4

Key Observations:

  • MF's Weakness: MF performs poorly across all datasets, indicating its linear interaction function and lack of explicit collaborative signal encoding result in insufficient embeddings.
  • NeuMF's Improvement: NeuMF consistently outperforms MF, highlighting the benefit of non-linear feature interactions in the interaction function. However, it still doesn't explicitly encode collaborative signals in embeddings.
  • Impact of First-order Neighbors: GC-MC shows improvements over MF and NeuMF in some cases (e.g., Amazon-Book), confirming that incorporating first-order neighbors helps. However, its performance can be mixed (e.g., ndcg@20 on Yelp2018* is lower than NeuMF), possibly due to its failure to fully capture non-linear interactions.
  • Memory Networks and High-order Connectivity: CMN generally performs better than GC-MC, possibly due to its neural attention mechanism for weighting neighbors. HOP-Rec achieves significant improvements, demonstrating the value of high-order connectivity for enriching training data. PinSage also shows good performance, especially on Yelp2018*, as it leverages high-order connectivity in the embedding function.
  • NGCF's Dominance: NGCF-3 (NGCF with 3 propagation layers) consistently achieves the best performance on all datasets, significantly outperforming all baselines.
    • It improves recall@20 over the strongest baselines by 11.68% (Gowalla), 11.97% (Yelp2018*), and 9.61% (Amazon-Book).
    • The p-values (all <0.05<0.05) indicate these improvements are statistically significant.
  • Justification for NGCF: The superior performance is attributed to NGCF's ability to explicitly model high-order connectivity through embedding propagation layers, effectively injecting collaborative signals into the embedding learning process. Unlike CMN and GC-MC which are limited to first-order neighbors, NGCF explores deeper connections. Compared to PinSage, NGCF's layer-aggregation mechanism combines multi-grained representations from different layers, providing a richer final embedding. Its advantage over HOP-Rec further validates that explicitly encoding CF in the embedding function is more effective than merely enriching training data.

6.1.2. Performance Comparison w.r.t. Interaction Sparsity Levels

The paper investigates how NGCF performs across different sparsity levels by dividing users into groups based on their number of interactions. The results for ndcg@20 are shown in Figure 4.

The following figure (Figure 4 from the original paper) shows ndcg@20 on different user groups in Gowalla, Yelp2018*, and Amazon-Book datasets.

该图像是性能比较图,展示了不同推荐模型在多个用户组上的ndcg@20指标,包括MF、NeuMF、CMN、HOP-Rec、PinSage、GC-MC和NGCF。图中显示了在游玩平台(GoWalla)、Yelp2018和亚马逊图书(Amazon-book)数据集上的实验结果,NGCF模型表现优异。
该图像是性能比较图,展示了不同推荐模型在多个用户组上的ndcg@20指标,包括MF、NeuMF、CMN、HOP-Rec、PinSage、GC-MC和NGCF。图中显示了在游玩平台(GoWalla)、Yelp2018和亚马逊图书(Amazon-book)数据集上的实验结果,NGCF模型表现优异。

hidicahevolv onrae @20 .

Key Observations:

  • Overall Superiority in Sparsity: Both NGCF and HOP-Rec consistently outperform other baselines across all user groups and sparsity levels. This indicates that leveraging connectivity information (especially high-order) is highly beneficial for representation learning, particularly for users with few interactions. This suggests NGCF can effectively alleviate the sparsity issue in recommender systems.
  • Greater Impact on Inactive Users: NGCF shows more significant improvements in the groups with fewer interactions (e.g., <24< 24 and <50< 50 interactions in Gowalla, where improvements are 8.49% and 7.79% respectively over best baselines). For highly active users (e.g., <1014< 1014 in Gowalla), the improvements are still present but less dramatic (e.g., 1.29%). This verifies that embedding propagation is particularly advantageous for relatively inactive users, as it can synthesize collaborative signals from their sparse neighborhoods.

6.2. Ablation Studies / Parameter Analysis

6.2.1. Effect of Layer Numbers

The paper investigates the impact of the number of embedding propagation layers (LL) on NGCF's performance, varying LL from 1 to 4.

The following are the results from Table 3 of the original paper:

Gowalla Yelp2018* Amazon-Book
recall ndcg recall ndcg recall ndcg
NGCF-1 0.1556 0.1315 0.0543 0.0442 0.0313 0.0241
NGCF-2 0.1547 0.1307 0.0566 0.0465 0.0330 0.0254
NGCF-3 0.1569 0.1327 0.0579 0.0477 0.0337 0.0261
NGCF-4 0.1570 0.1327 0.0566 0.0461 0.0344 0.0263

Key Observations:

  • Benefits of Depth: Increasing the depth from NGCF-1 to NGCF-2 and NGCF-3 generally enhances recommendation performance across all datasets. NGCF-1 only captures first-order neighbors, while NGCF-2 and NGCF-3 capture second- and third-order connectivity, which carry more complex collaborative signals (user similarity, potential recommendations).
  • Overfitting with Excessive Depth: NGCF-4 shows slight improvements on Amazon-Book but leads to overfitting on Yelp2018* (performance drops compared to NGCF-3). This suggests that excessively deep architectures might introduce noise into the representation learning process. Three layers (NGCF-3) appear to be a good balance for capturing sufficient CF signals without introducing too much noise or overfitting.
  • Consistent Superiority: Even with varying layer numbers, NGCF consistently outperforms other methods (as seen by comparing Table 3 to Table 2), reaffirming the effectiveness of explicitly modeling high-order connectivity.

6.2.2. Effect of Embedding Propagation Layer and Layer-Aggregation Mechanism

To understand the specific contributions of the proposed embedding propagation layer design, variants of NGCF-1 (single layer) were tested.

The following are the results from Table 4 of the original paper:

Gowalla Yelp2018* Amazon-Book
recall ndcg recall ndcg recall ndcg
NGCF-1 0.1556 0.1315 0.0543 0.0442 0.0313 0.0241
NGCF-1SVD++ 0.1517 0.1265 0.0504 0.0414 0.0297 0.0232
NGCF-1GC-MC 0.1523 0.1307 0.0518 0.0420 0.0305 0.0234
NGCF-1PinSage 0.1534 0.1308 0.0516 0.0420 0.0293 0.0231

Key Observations:

  • Importance of Representation Interactions: NGCF-1 consistently outperforms all its variants (NGCF1SVD++NGCF-1_SVD++, NGCF1GCMCNGCF-1_GC-MC, NGCF1PinSageNGCF-1_PinSage). This highlights the crucial role of the representation interaction term (eieu\mathbf{e}_i \odot \mathbf{e}_u) within the message construction function. This term makes messages dependent on the affinity between the source and target embeddings, acting like an attention mechanism and distilling more relevant information. The variants, by contrast, rely only on linear transformations.
  • Significance of Self-Connections and Non-linearity: NGCF1SVD++NGCF-1_SVD++ generally performs worse than NGCF1GCMCNGCF-1_GC-MC and NGCF1PinSageNGCF-1_PinSage. This suggests the importance of retaining original features (via self-connections) and non-linear transformations in the propagation layer, which NGCF1SVD++NGCF-1_SVD++ omits to simplify to an SVD++SVD++-like form.
  • Effectiveness of Layer Aggregation: When comparing NGCF-1 variants with their multi-layer counterparts (e.g., NGCF1GCMCNGCF-1_GC-MC vs. GC-MC, NGCF1PinSageNGCF-1_PinSage vs. PinSage from Table 2), the NGCF-1 variants, when combined through layer aggregation (concatenating output of different layers, even if only one layer is used for comparison), often show better performance. This underscores the significance of the layer-aggregation mechanism (concatenation) for combining information from different layers, a finding consistent with other GNN research.

6.2.3. Effect of Dropout

The paper investigates the effect of message dropout (p1p_1) and node dropout (p2p_2) on NGCF's performance. The results are plotted in Figure 5.

The following figure (Figure 5 from the original paper) shows the effect of node dropout and message dropout ratios on recall@20 across different datasets.

Figure 5: Effect of node dropout and message dropout ratios.
该图像是图表,展示了在不同数据集上MF和NGCF模型在每个训练周期的召回率(recall@20)。左图为Gowalla数据集,中图为Yelp2018数据集,右图为Amazon-Book数据集。可以观察到NGCF模型在所有数据集上均优于MF模型。

Figure 5: Effect of node dropout and message dropout ratios.

Key Observations:

  • Node Dropout's Superiority: Node dropout generally offers better performance than message dropout. For example, on Gowalla, a node dropout ratio of 0.2 yields a recall@20 of 0.1514, which is higher than the best message dropout result of 0.1506.
  • Robustness: This suggests that dropping out entire nodes (and all their outgoing messages) makes the representations more robust not only against specific connections but also against the influence of particular users or items. Node dropout appears to be a more effective strategy for preventing overfitting in graph neural networks by promoting more generalized node representations.

6.2.4. Test Performance w.r.t. Epoch

The convergence behavior of NGCF compared to MF over training epochs is shown in Figure 6 (mis-captioned as Figure 5 in the original VLM descriptions, but correctly referenced as Figure 6 in the text).

The following figure (Figure 6 from the original paper) shows the test performance (recall@20) of each epoch for MF and NGCF.

Figure 7: Visualization of the learned t-SNE transformed representations derived from MF and NGCF-3. Each star represents a user from Gowalla dataset, while the points with the same color denote the relevant items. Best view in color.
该图像是图表,展示了从MF (NGCF-0) 和 NGCF-3 获得的t-SNE转换表示。图中星形标记代表Gowalla数据集中的用户,各颜色的点表示相关项目,最佳效果在彩色显示中展现。

Figure 6: Test performance of each epoch of MF and NGCF.

Key Observations:

  • Faster Convergence for NGCF: NGCF exhibits faster convergence than MF across all three datasets. This is likely because NGCF explicitly incorporates indirectly connected users and items through embedding propagation when optimizing interaction pairs in each mini-batch.
  • Better Model Capacity: This observation highlights NGCF's superior model capacity and the effectiveness of performing embedding propagation directly in the embedding space. The richer information flow allows NGCF to learn effective representations more quickly.

6.3. Effect of High-order Connectivity

To qualitatively understand how embedding propagation facilitates representation learning, the paper visualizes t-SNE transformed representations of randomly selected users and their relevant items from the Gowalla dataset. Figure 7 (mis-captioned as Figure 6 in the original VLM descriptions, but correctly referenced as Figure 7 in the text) compares MF (as NGCF-0, i.e., no propagation layers) with NGCF-3.

The following figure (Figure 7 from the original paper) shows the visualization of the learned t-SNE transformed representations derived from MF and NGCF-3. Each star represents a user from Gowalla dataset, while the points with the same color denote the relevant items. Best view in color.

Figure 7: Visualization of the learned t-SNE transformed representations derived from MF and NGCF-3. Each star represents a user from Gowalla dataset, while the points with the same color denote the relevant items. Best view in color.

Key Observations:

  • Clustering in Embedding Space: With NGCF-3, the representations of users and their relevant items (even those not seen during training for that specific user) are well-reflected in the embedding space. Points with the same colors (representing items consumed by the same users) tend to form distinct clusters. This means that users and their preferred items (and items preferred by similar users) are embedded into close regions of the vector space.
  • Enhanced Closeness with Propagation: Comparing MF (Figure 7a) and NGCF-3 (Figure 7b), it's evident that NGCF-3 causes the embeddings of a user's historical items to be much closer to each other and to the user's own embedding. For example, for users like 12201 and 6880, their respective item clusters are more compact and distinct in NGCF-3.
  • Qualitative Verification: This visualization qualitatively verifies that the proposed embedding propagation layer effectively injects explicit collaborative signals into the representations. The high-order connectivity learned through multiple layers brings related entities closer in the embedding space, which is crucial for accurate recommendations.

7. Conclusion & Reflections

7.1. Conclusion Summary

This work makes a significant contribution to model-based collaborative filtering (CF) by explicitly incorporating the collaborative signal into the embedding function itself. The authors introduce Neural Graph Collaborative Filtering (NGCF), a novel framework built upon graph neural networks (GNNs). NGCF leverages the user-item interaction graph to model high-order connectivities through an embedding propagation mechanism. This process enriches user and item embeddings by allowing them to iteratively absorb collaborative signals from their multi-hop neighbors. Extensive experiments on three real-world datasets consistently demonstrate that NGCF achieves state-of-the-art performance, significantly outperforming existing CF methods. The analysis further validates that embedding propagation is crucial for learning higher-quality user and item representations, especially benefiting inactive users and improving cold-start scenarios.

7.2. Limitations & Future Work

The authors acknowledge several areas for future improvement and research:

  • Attention Mechanism: Incorporating an attention mechanism [2] to learn variable weights for neighbors during embedding propagation. This would allow the model to dynamically prioritize more important neighbors rather than using fixed (Laplacian-based) weights, potentially improving model generalization and interpretability.
  • Adversarial Learning: Exploring adversarial learning [13] on user/item embeddings and the graph structure to enhance NGCF's robustness. This could make the model less susceptible to noisy or malicious interactions.
  • Exploiting Other Structural Knowledge: Expanding NGCF to leverage other forms of structural information beyond simple user-item interactions. This includes:
    • Cross features [41] for context-aware and semantics-rich recommendation [22, 27].
    • Item knowledge graphs [32] to establish knowledge-aware connectivities between users and items, which could unveil user decision-making processes.
    • Social networks [34] to incorporate social influence into recommendations. The authors hope that NGCF's development will contribute to more effective and interpretable recommendation systems by facilitating the reasoning of user online behavior.

7.3. Personal Insights & Critique

7.3.1. Personal Insights

  • Paradigm Shift in Embedding Learning: NGCF provides a compelling argument for a paradigm shift in embedding learning for CF. Instead of treating embeddings as static attributes and interactions as mere training signals, NGCF demonstrates the power of making embeddings inherently dynamic and context-aware by integrating graph structure. This is a crucial insight: the "signal" should be in the representation itself, not just an artifact of the loss function.
  • Power of High-Order Connectivity: The paper effectively illustrates the semantic richness of high-order connectivity. The example of u1-i2-u2-i4 clearly shows how multi-hop paths can capture nuanced collaborative signals that simpler models miss. This highlights the importance of going beyond direct neighbors.
  • GNNs as a Natural Fit for CF: NGCF reinforces the idea that Graph Neural Networks are a natural and powerful tool for recommender systems. The user-item interaction data is inherently graph-structured, and GNNs are designed to process such data by message passing, making them highly suitable for capturing collaborative filtering effects.
  • Addressing Sparsity: The empirical results demonstrating NGCF's significant benefits for inactive users (those with fewer interactions) are particularly insightful. This suggests embedding propagation can effectively alleviate the sparsity issue by allowing users to indirectly gain information from a wider network of similar users and items.

7.3.2. Critique

  • Complexity of Message Construction: While the element-wise product term (eieu\mathbf{e}_i \odot \mathbf{e}_u) in message construction is shown to be effective, its exact theoretical justification beyond "affinity" could be explored more deeply. Is it the optimal interaction? More complex neural interaction functions within the message could be considered, though the paper mentions this for future work in the prediction layer.

  • Fixed Decay Factor: The use of the graph Laplacian norm (1/NuNi1/\sqrt{|\mathcal{N}_u||\mathcal{N}_i|}) as a fixed decay factor ϕui\phi_{ui} is a standard choice but might not be optimal. Learning this decay factor or making it context-dependent (e.g., using attention) could lead to further improvements, as mentioned in the future work.

  • Scalability for Extremely Large Graphs: While the matrix-form propagation rule avoids node sampling and is more efficient than some GCNs, the adjacency matrix A\mathbf{A} and Laplacian matrix L\mathcal{L} can still be very large for industrial-scale recommendation systems with billions of users and items. The current formulation still processes the full graph (or at least its edges) in each layer. Further optimizations for handling truly massive, dynamic graphs might be necessary, such as mini-batching over edges or subgraphs.

  • Interpretability of High-Order Paths: While NGCF explicitly captures high-order connectivity, the final concatenated embedding might lose some interpretability regarding which specific paths contribute most to a recommendation. Future work on attention could help in this regard.

  • Hyperparameter Sensitivity: As with many deep learning models, NGCF has several hyperparameters (number of layers, dropout ratios, learning rate, L2 regularization). Optimal performance depends on careful tuning, and the overfitting observed with NGCF-4 highlights this sensitivity.

    Overall, NGCF is a rigorously designed and empirically validated model that significantly advances the field of collaborative filtering by elegantly integrating graph neural networks to enhance embedding learning. It opens many promising avenues for future research in graph-based recommendation.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.