Neural Graph Collaborative Filtering

Tat-Seng Chua

Paper status: completed

Neural Graph Collaborative Filtering

Published:05/20/2019

Embedding Propagation Mechanism (2)Neural Graph Collaborative Filtering (1)User Behavior Modeling (1)Graph-Structured Recommendation Systems (1)High-Order Connectivity Modeling (1)

Original Link PDF

Price: 0.10

3 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

The Neural Graph Collaborative Filtering (NGCF) framework integrates user-item interactions by propagating embeddings on a graph, effectively capturing collaborative signals and demonstrating significant improvements across several benchmark datasets.

Abstract

Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user's (or an item's) embedding by mapping from pre-existing features that describe the user (or the item), such as ID and attributes. We argue that an inherent drawback of such methods is that, the collaborative signal, which is latent in user-item interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect. In this work, we propose to integrate the user-item interactions -- more specifically the bipartite graph structure -- into the embedding process. We develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the user-item graph structure by propagating embeddings on it. This leads to the expressive modeling of high-order connectivity in user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner. We conduct extensive experiments on three public benchmarks, demonstrating significant improvements over several state-of-the-art models like HOP-Rec and Collaborative Memory Network. Further analysis verifies the importance of embedding propagation for learning better user and item representations, justifying the rationality and effectiveness of NGCF. Codes are available at https://github.com/xiangwang1223/neural_graph_collaborative_filtering.

Mind Map

In-depth Reading

English Analysis~31 min read · 42,350 chars

1. Bibliographic Information

1.1. Title

Neural Graph Collaborative Filtering

1.2. Authors

Xiang Wang, Meng Wang, Xiangnan He, Fuli Feng, and Tat-Seng Chua. Their affiliations include National University of Singapore and University of Science and Technology of China, indicating a strong academic background in computer science, particularly in areas like recommender systems, data mining, and multimedia.

1.3. Journal/Conference

Published in the Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19), July 21-25, 2019, Paris, France. SIGIR is a highly reputable and influential conference in the field of information retrieval and recommender systems, known for publishing cutting-edge research.

1.4. Publication Year

2019

1.5. Abstract

The paper addresses a crucial limitation in modern recommender systems: existing methods for learning user and item vector representations (embeddings), such as matrix factorization (MF) and deep learning-based approaches, typically derive embeddings from pre-existing features like IDs and attributes. This approach, the authors argue, fails to explicitly encode the collaborative signal—the latent information within user-item interactions—into the embedding process itself. Consequently, the resulting embeddings may not adequately capture the collaborative filtering (CF) effect.

To overcome this, the paper proposes Neural Graph Collaborative Filtering (NGCF). This novel framework integrates the user-item bipartite graph structure directly into the embedding learning process by propagating embeddings along the graph. This propagation mechanism allows for the expressive modeling of high-order connectivity within the user-item graph, thereby explicitly injecting the collaborative signal into the embeddings. Extensive experiments on three public benchmark datasets demonstrate that NGCF significantly outperforms several state-of-the-art models, including HOP-Rec and Collaborative Memory Network. Further analysis confirms the critical role of embedding propagation in learning superior user and item representations, thus validating NGCF's effectiveness and underlying rationale.

1.6. Original Source Link

Official Source: https://doi.org/10.1145/3331184.3331267 PDF Link: https://arxiv.org/pdf/1905.08108v2.pdf Publication Status: Officially published at SIGIR '19.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper addresses is the suboptimality of user and item embeddings in existing recommender systems. Modern recommender systems, ranging from traditional Matrix Factorization (MF) to recent deep learning approaches, primarily learn embeddings from static features like user/item IDs or attributes. The crucial collaborative signal, which inherently exists within user-item interactions and reveals behavioral similarities, is largely not explicitly encoded during the embedding generation process. Instead, these interactions are often only used to define the objective function for training, leaving the embedding function itself unaware of the rich relational information. This omission means that the embeddings might lack the necessary information to fully capture the collaborative filtering (CF) effect, requiring the interaction function to compensate for this deficiency.

This problem is important because the quality of embeddings directly impacts the accuracy and effectiveness of recommendations. If embeddings are insufficient, even sophisticated interaction functions might struggle to provide optimal predictions. The challenge lies in efficiently integrating the massive scale of user-item interactions (which can reach millions or more in real-world applications) into the embedding process in a meaningful way.

The paper's innovative idea or entry point is to explicitly leverage the high-order connectivity present in the user-item interaction graph. Instead of just using interactions for training a model that operates on static embeddings, NGCF proposes to propagate embeddings directly on this graph structure. This allows the embeddings to dynamically evolve and absorb collaborative signals from neighbors up to $L$ hops away, thereby enriching them with contextual relational information that was previously implicit or ignored.

2.2. Main Contributions / Findings

The paper makes the following primary contributions and key findings:

Highlighting the Importance of Explicit Collaborative Signal: The authors explicitly identify and articulate the critical importance of incorporating the collaborative signal directly into the embedding function of model-based collaborative filtering (CF) methods. They argue that prior methods largely neglect this, leading to suboptimal embeddings.
Introducing NGCF Framework: They propose Neural Graph Collaborative Filtering (NGCF), a novel recommendation framework built upon graph neural networks (GNNs). NGCF explicitly encodes the collaborative signal by modeling high-order connectivities through an embedding propagation mechanism on the user-item interaction graph.
Empirical Validation and State-of-the-Art Performance: Through extensive experiments on three large-scale, real-world benchmark datasets (Gowalla, Yelp2018*, Amazon-Book), NGCF demonstrates significant performance improvements over several state-of-the-art CF models, including HOP-Rec and Collaborative Memory Network (CMN).
Verification of Embedding Quality: Further analysis confirms that the embedding propagation process is crucial for learning superior user and item representations. This verifies the rationality and effectiveness of NGCF's design, showing that explicitly integrating graph structure improves embedding quality and subsequently recommendation accuracy, particularly for inactive users (alleviating the sparsity issue).

3.1. Foundational Concepts

To fully grasp Neural Graph Collaborative Filtering (NGCF), a reader should understand the following foundational concepts:

Recommender Systems: Systems designed to predict user preferences or ratings for items, helping users discover new items they might like. They are widely used in e-commerce, streaming services, and social media.
Collaborative Filtering (CF): A common technique in recommender systems that predicts user preferences by leveraging the preferences of similar users or the characteristics of similar items. The core idea is "users who liked X and Y also liked Z" (user-based CF) or "people who liked X also liked Y" (item-based CF).
Embeddings (Vector Representations): In machine learning, an embedding is a mapping from discrete objects (like users, items, words, etc.) to a continuous vector space. Each user or item is represented by a dense vector of real numbers. The idea is that objects with similar properties or relationships will have similar vector representations in this space. For example, if two users have similar tastes, their user embeddings should be close to each other.
Matrix Factorization (MF): A popular CF technique. It decomposes the sparse user-item interaction matrix (where rows are users, columns are items, and entries are ratings or implicit feedback) into two lower-rank matrices: a user matrix and an item matrix. Each row in the user matrix is a user embedding, and each column in the item matrix is an item embedding. The product of a user embedding and an item embedding (typically an inner product) reconstructs the original interaction.
Deep Learning: A subfield of machine learning that uses neural networks with multiple layers (hence "deep") to learn complex patterns from data. These networks can learn hierarchical representations and capture non-linear relationships.
Neural Networks: Computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers. Each connection has a weight, and neurons apply an activation function to the weighted sum of their inputs.
Activation Functions: Non-linear functions applied by neurons in a neural network. They introduce non-linearity, allowing the network to learn complex patterns. LeakyReLU (Leaky Rectified Linear Unit) is an activation function defined as $f(x) = \max(\alpha x, x)$ where $\alpha$ is a small positive constant. It addresses the "dying ReLU" problem by allowing a small, non-zero gradient when the input is negative.
Optimization (Adam): An adaptive learning rate optimization algorithm used to train neural networks. It computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients. It's widely used due to its efficiency and good performance.
Regularization (L2 Regularization): A technique used to prevent overfitting in machine learning models. L2 regularization (also known as weight decay) adds a penalty term proportional to the square of the magnitude of the model's weights to the loss function. This encourages the model to use smaller weights, making it simpler and less prone to capturing noise in the training data.
Dropout: A regularization technique for neural networks where randomly selected neurons are "dropped out" (i.e., temporarily ignored along with their incoming and outgoing connections) during training. This prevents complex co-adaptations on training data and improves the model's generalization ability.
Graph Theory:
- Bipartite Graph: A graph whose vertices can be divided into two disjoint and independent sets, $U$ (users) and $I$ (items), such that every edge connects a vertex in $U$ to one in $I$ . There are no edges within $U$ or within $I$ . The user-item interaction network is naturally a bipartite graph.
- High-order Connectivity: Refers to paths in a graph that involve multiple steps or "hops." In a user-item graph, a 1-hop connection is a direct interaction (user-item). A 2-hop connection (user-item-user) implies behavioral similarity between two users (they interacted with the same item). A 3-hop connection (user-item-user-item) suggests that a user might be interested in an item that a similar user has interacted with. These longer paths carry rich collaborative signals.
- Adjacency Matrix ( $\mathbf{A}$ ): A square matrix used to represent a finite graph. If there are $N$ nodes, $\mathbf{A}$ is $N \times N$ . An entry $A_{ij}=1$ if there is an edge from node $i$ to node $j$ , and 0 otherwise. For a user-item bipartite graph, a combined adjacency matrix can be formed that includes connections between users and items.
- Degree Matrix ( $\mathbf{D}$ ): A diagonal matrix where $D_{ii}$ is the degree of vertex $i$ (the number of edges connected to it).
- Graph Laplacian ( $\mathcal{L}$ ): For an unweighted graph, the graph Laplacian is defined as $\mathcal{L} = \mathbf{D} - \mathbf{A}$ . A normalized version, often used in Graph Neural Networks, is $\mathcal{L} = \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}$ . This matrix is crucial for understanding the connectivity and structure of a graph and for spectral graph theory.
Graph Neural Networks (GNNs): A class of deep learning methods designed to perform inference on graph-structured data. They extend the idea of neural networks to graphs by iteratively aggregating information from a node's neighbors to update its representation.
Message Passing: A fundamental paradigm in GNNs. It involves nodes exchanging "messages" (information/embeddings) with their neighbors and then aggregating these messages to update their own representations. This process can be repeated over multiple layers, allowing information to propagate across longer distances in the graph.

3.2. Previous Works

The paper discusses existing work in three main categories, positioning NGCF in relation to them:

3.2.1. Model-based CF Methods

These methods parameterize users and items with vector representations (embeddings) and reconstruct interactions based on these parameters.

Matrix Factorization (MF) [20, 26]: The foundational model. It embeds user/item IDs as vectors and uses their inner product to predict interactions.
- Formula (Simplified): For a user $u$ and item $i$ , the predicted rating $\hat{r}_{ui}$ is given by $\hat{r}_{ui} = \mathbf{p}_u^\top \mathbf{q}_i$ , where $\mathbf{p}_u$ is the user embedding and $\mathbf{q}_i$ is the item embedding.
Enhanced MF: Efforts to incorporate side information (e.g., item content [2, 30], social relations [34], item relations [37], user reviews [3], external knowledge graph [32, 35]) to enrich embeddings.
Neural Collaborative Filtering (NCF) methods (e.g., NeuMF [14]): Replace the inner product in MF with non-linear neural networks to model interactions, capturing more complex relationships.
- Formula (Conceptual for NeuMF): $\hat{y}_{ui} = f(\mathbf{p}_u, \mathbf{q}_i)$ , where $f$ is a multi-layer perceptron (MLP) that concatenates or element-wise multiplies user and item embeddings and processes them through non-linear layers. For example, $f(\mathbf{p}_u, \mathbf{q}_i) = h(g(\mathbf{p}_u \odot \mathbf{q}_i))$ or $h(g(\mathbf{p}_u \oplus \mathbf{q}_i))$ , where $g$ is an MLP layer, $h$ is the final output layer, $\odot$ is element-wise product, and $\oplus$ is concatenation.
Translation-based CF (e.g., LRML [28]): Use Euclidean distance metrics as the interaction function.

NGCF's Critique: These methods primarily build embedding functions using descriptive features (ID, attributes) and use interactions only for the objective function. They lack explicit encoding of collaborative signals in the embedding process, meaning transitivity (e.g., if A likes B and B likes C, A might like C) is not guaranteed in the embedding space.

3.2.2. Graph-Based CF Methods

These methods explicitly use the user-item interaction graph to infer preferences.

Label Propagation methods (e.g., ItemRank [7], BiRank [12]): Propagate "labels" (e.g., interacted items) on the graph to score items for a user. These are essentially neighbor-based methods and lack model parameters to optimize a recommendation objective.
HOP-Rec [40]: A recent graph-based model that performs random walks on the graph to identify multi-hop connected items. These high-order neighbors are then used to enrich the training data for an MF model.
- Key Idea: Augments the implicit feedback data by adding synthetic user-item interactions based on paths discovered by random walks. The core prediction model remains MF, but trained on an enriched dataset.
- Formula (Conceptual): It would still use $\hat{r}_{ui} = \mathbf{p}_u^\top \mathbf{q}_i$ , but the training pairs (u, i) would be expanded to include (u, i') where $i'$ is a high-order neighbor of $u$ . NGCF's Critique: HOP-Rec enriches training data but does not integrate high-order connectivity into the prediction model's embedding function itself. Its performance heavily relies on random walk parameters (e.g., decay factor).

3.2.3. Graph Convolutional Networks (GCNs) for Recommendation

These methods apply graph convolution operations to the user-item graph.

GC-MC [29]: Applies GCN [18] on a user-item graph, but only uses one convolutional layer, thus exploiting only first-order connections (direct user-item interactions).
- GCN Layer (Simplified from Kipf & Welling [18]): $\mathbf{H}^{(l+1)} = \sigma(\tilde{\mathbf{D}}^{-\frac{1}{2}}\tilde{\mathbf{A}}\tilde{\mathbf{D}}^{-\frac{1}{2}}\mathbf{H}^{(l)}\mathbf{W}^{(l)})$ , where $\tilde{\mathbf{A}}$ is adjacency matrix with self-loops, $\tilde{\mathbf{D}}$ is its degree matrix, $\mathbf{H}^{(l)}$ is node features at layer $l$ , $\mathbf{W}^{(l)}$ is weight matrix, and $\sigma$ is activation. GC-MC adapts this to a bipartite graph.
PinSage [42]: An industrial solution from Pinterest. It employs multiple graph convolution layers but primarily on an item-item graph (derived from co-occurrence patterns) for item recommendation. The CF effect is captured at the item relation level, not directly on collective user behaviors.
- Key Idea (from GraphSAGE [8]): Aggregates information from a node's local neighborhood. A generic GraphSAGE layer for node $v$ would involve: $h_{\mathcal{N}(v)}^{(k)} = \text{AGGREGATE}_k(\{h_u^{(k-1)}, \forall u \in \mathcal{N}(v)\})$ and $h_v^{(k)} = \sigma(\mathbf{W}_k \cdot \text{CONCAT}(h_v^{(k-1)}, h_{\mathcal{N}(v)}^{(k)}))$ . PinSage applies this on a large-scale item graph.
SpectralCF [43]: Proposes a spectral convolution operation to find connections between users and items in the spectral domain (using eigen-decomposition of the graph adjacency matrix).
- Key Idea: Utilizes the eigenvectors and eigenvalues of the graph Laplacian to define convolution. NGCF's Critique: GC-MC is limited to first-order connectivity. PinSage focuses on item-item graphs. SpectralCF has high computational complexity due to eigen-decomposition, making it unsuitable for large-scale graphs.

3.3. Technological Evolution

The evolution of collaborative filtering (CF) has generally moved from simpler statistical methods to more complex model-based approaches:

Early Methods (e.g., Neighborhood-based CF): Relied on calculating similarities between users or items directly from interaction patterns.
Matrix Factorization (MF): Introduced the concept of embeddings for users and items, decomposing the interaction matrix to learn these latent factors. This marked a shift towards explicit model learning.
MF with Side Information: Enhanced MF by incorporating additional data (e.g., item attributes, social networks) to enrich embeddings and alleviate sparsity.
Neural Collaborative Filtering (NCF): Replaced the linear inner product interaction function of MF with non-linear neural networks, allowing for more expressive modeling of user-item relationships. This leveraged the power of deep learning for the interaction function.
Graph-based Methods (e.g., HOP-Rec, early GCNs): Began to explicitly use the user-item interaction graph structure. HOP-Rec used random walks to augment training data, while GC-MC applied GCNs but often limited to first-order connections.

NGCF's Position: NGCF represents a significant step in this evolution by integrating graph neural networks to enhance the embedding function itself, rather than just the interaction function or the training data. It leverages the message-passing mechanism on the user-item bipartite graph to explicitly capture high-order collaborative signals directly into the user and item embeddings. This means the embeddings are no longer static representations from features but are dynamically refined by the graph structure, making them inherently "collaborative-aware." This positions NGCF at the forefront of GNN-based CF models, focusing on richer, structural information within the embeddings.

3.4. Differentiation Analysis

NGCF distinguishes itself from previous methods primarily through its explicit and integrated approach to embedding the collaborative signal using high-order connectivity on the user-item graph.

Compared to MF and NeuMF:
- Difference: MF and NeuMF learn embeddings primarily from ID mapping or static features, and use interactions only to train an interaction function (inner product for MF, MLP for NeuMF). The collaborative signal (e.g., transitivity like user A and user B both like item I, so A and B are similar) is only implicitly captured if the interaction function is powerful enough to reconstruct interactions perfectly.
- Innovation: NGCF explicitly injects this collaborative signal into the embedding process itself by propagating embeddings across the user-item graph. This means the embeddings of users and items are constantly refined by their neighbors' embeddings, making them inherently aware of high-order relationships.
Compared to HOP-Rec:
- Difference: HOP-Rec also considers high-order connectivity by performing random walks to enrich the training data, but its underlying prediction model remains MF. The high-order information is used to augment the dataset, not to modify the embedding function of the model.
- Innovation: NGCF integrates high-order connectivity directly into the model's embedding function through embedding propagation layers. This leads to better embeddings and superior performance, as the collaborative signal becomes an intrinsic part of how embeddings are learned.
Compared to GC-MC and PinSage:
- Difference: GC-MC applies GCNs but typically uses only one convolutional layer, thus capturing only first-order neighbors. PinSage uses multiple layers but operates on an item-item graph (capturing item relations) rather than the user-item bipartite graph for direct CF signals.
- Innovation: NGCF uses multiple embedding propagation layers on the user-item bipartite graph, explicitly modeling high-order connectivities between users and items. Furthermore, NGCF uses a layer-aggregation mechanism (concatenation) to combine embeddings from different propagation layers, which PinSage typically doesn't, allowing NGCF to capture multi-grained representations. NGCF also incorporates an element-wise product ( $\odot$ ) between user and item embeddings in its message construction, making messages dependent on the affinity between the source and target, which GC-MC lacks.
Compared to SpectralCF:
- Difference: SpectralCF also aims to discover connectivity but does so in the spectral domain using eigen-decomposition of the graph adjacency matrix.
- Innovation: NGCF operates efficiently in the spatial domain with an efficient matrix-form propagation rule, avoiding the high computational complexity of eigen-decomposition that makes SpectralCF impractical for large-scale recommendation.
Compared to SVD++ (as discussed in Section 2.5.1):
- Difference: $SVD++$ augments user embeddings with implicitly feedback from items they interacted with. It's essentially a first-order model with a specific aggregation mechanism.
- Innovation: NGCF generalizes $SVD++$ . By setting $L=1$ and simplifying its components (disabling transformation matrices and non-linear activation), NGCF can recover $SVD++$ . This shows that NGCF's framework is more general and can extend beyond first-order connections to high-order ones.
  
  In summary, NGCF's core innovation lies in explicitly and effectively injecting collaborative signals derived from high-order graph connectivity directly into the embedding learning process through a scalable message-passing Graph Neural Network architecture.

4. Methodology

4.1. Principles

The core principle of Neural Graph Collaborative Filtering (NGCF) is to overcome the limitation of traditional recommender systems where the collaborative signal (information about behavioral similarities derived from user-item interactions) is not explicitly encoded within the embedding function. Instead, NGCF proposes to integrate the user-item interaction graph structure directly into the embedding process. The theoretical basis is that high-order connectivity within this graph (e.g., paths like User-Item-User or User-Item-User-Item) contains rich semantic information that reveals deeper collaborative filtering (CF) effects.

The intuition is that a user's preference is not only influenced by the items they directly interacted with (first-order connections) but also by the items that users similar to them interacted with (second-order connections), and so on. Similarly, an item's characteristics can be inferred not just from its direct users, but also from the users who interact with similar items. NGCF achieves this by using a message-passing mechanism inspired by Graph Neural Networks (GNNs). It iteratively refines user and item embeddings by propagating information (messages) between connected nodes (users and items) on the bipartite interaction graph. Each propagation layer allows embeddings to absorb information from further 'hops' in the graph, thus explicitly embedding the high-order collaborative signals.

4.2. Core Methodology In-depth (Layer by Layer)

The NGCF framework consists of three main components: an embedding layer, multiple embedding propagation layers, and a prediction layer.

4.2.1. Embedding Layer

The embedding layer serves as the initialization point for user and item representations.

Each user $u$ is associated with an initial embedding vector $\mathbf{e}_u \in \mathbb{R}^d$ .
Each item $i$ is associated with an initial embedding vector $\mathbf{e}_i \in \mathbb{R}^d$ .
Here, $d$ denotes the embedding size (dimension).
These embeddings are stored in an embedding look-up table $\mathbf{E}$ : $\mathbf{E} = \left[ \mathbf{e}_{u_1}, \dots, \mathbf{e}_{u_N}, \mathbf{e}_{i_1}, \dots, \mathbf{e}_{i_M} \right]$ where $N$ is the total number of users and $M$ is the total number of items. This $\mathbf{E}$ is the initial state for the embeddings and will be optimized end-to-end. Unlike traditional models where these ID embeddings might directly feed into an interaction layer, in NGCF, they are further refined by embedding propagation layers.

4.2.2. Embedding Propagation Layers

This is the core of NGCF, where collaborative signals are captured and embeddings are refined through message passing on the user-item graph.

4.2.2.1. First-order Propagation

This stage describes how embeddings are refined by considering first-order neighbors (direct connections). It involves two main operations: message construction and message aggregation.

Message Construction: For a directly connected user-item pair (u, i), a message embedding $\mathbf{m}_{u \leftarrow i}$ is constructed, representing the information propagated from item $i$ to user $u$ . The function $f(\cdot)$ encodes this message, taking $\mathbf{e}_i$ (item embedding) and $\mathbf{e}_u$ (user embedding) as input, and includes a coefficient $\phi_{ui}$ to control the decay factor during propagation on the edge (u, i). The paper implements $f(\cdot)$ as: $\mathbf{m}_{u \leftarrow i} = \frac{1}{\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}} \left( \mathbf{W}_1 \mathbf{e}_i + \mathbf{W}_2 (\mathbf{e}_i \odot \mathbf{e}_u) \right)$
- $\mathbf{m}_{u \leftarrow i}$ : The message embedding from item $i$ to user $u$ .
- $\mathcal{N}_u$ : The set of first-hop neighbors (items) of user $u$ .
- $\mathcal{N}_i$ : The set of first-hop neighbors (users) of item $i$ .
- $\frac{1}{\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}}$ : This is the coefficient $\phi_{ui}$ , derived from the graph Laplacian norm. It acts as a discount factor, meaning messages should decay with path length and connectivity strength.
- $\mathbf{W}_1, \mathbf{W}_2 \in \mathbb{R}^{d' \times d}$ : Trainable weight matrices that transform the embeddings. $d'$ is the transformation size.
- $\mathbf{e}_i, \mathbf{e}_u$ : Initial embedding vectors of item $i$ and user $u$ , respectively.
- $\odot$ : The element-wise product (Hadamard product) between $\mathbf{e}_i$ and $\mathbf{e}_u$ . This term is crucial as it makes the message dependent on the affinity between the item and user, allowing more messages to pass from similar items.
Message Aggregation: After messages are constructed, they are aggregated from a node's neighborhood to refine its representation. For user $u$ , the aggregation function for its first-layer representation $\mathbf{e}_u^{(1)}$ is: $\mathbf{e}_u^{(1)} = \mathrm{LeakyReLU} \left( \mathbf{m}_{u \leftarrow u} + \sum_{i \in \mathcal{N}_u} \mathbf{m}_{u \leftarrow i} \right)$
- $\mathbf{e}_u^{(1)}$ : The representation of user $u$ after the first embedding propagation layer.
- $\mathrm{LeakyReLU}(\cdot)$ : The Leaky Rectified Linear Unit activation function, which allows both positive and small negative signals to be encoded.
- $\sum_{i \in \mathcal{N}_u} \mathbf{m}_{u \leftarrow i}$ : The sum of messages propagated from all items $i$ in user $u$ 's neighborhood $\mathcal{N}_u$ .
- $\mathbf{m}_{u \leftarrow u}$ : A self-connection term, defined as $\mathbf{W}_1 \mathbf{e}_u$ . This term ensures that the user's original features (initial embedding) are retained and contribute to the refined representation. The weight matrix $\mathbf{W}_1$ is shared with the first term in message construction.
- A similar process is applied to obtain $\mathbf{e}_i^{(1)}$ for an item $i$ . This explicitly uses first-order connectivity to relate user and item representations.

4.2.2.2. High-order Propagation

To capture high-order connectivity and more complex collaborative signals, NGCF stacks multiple embedding propagation layers.

By stacking $L$ layers, a user (or item) can receive messages from its L-hop neighbors.
For the $l$ -th step (layer), the representation of user $u$ is recursively formulated as: $\mathbf{e}_u^{(l)} = \mathrm{LeakyReLU} \left( \mathbf{m}_{u \leftarrow u}^{(l)} + \sum_{i \in \mathcal{N}_u} \mathbf{m}_{u \leftarrow i}^{(l)} \right)$ wherein the messages being propagated are defined using the representations from the previous layer (l-1): $\begin{array}{rl} \mathbf{m}_{u \leftarrow i}^{(l)} &= \frac{1}{\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}} \left( \mathbf{W}_1^{(l)} \mathbf{e}_i^{(l-1)} + \mathbf{W}_2^{(l)} (\mathbf{e}_i^{(l-1)} \odot \mathbf{e}_u^{(l-1)}) \right) \\ \mathbf{m}_{u \leftarrow u}^{(l)} &= \mathbf{W}_1^{(l)} \mathbf{e}_u^{(l-1)} \end{array}$
- $\mathbf{e}_u^{(l)}$ : The representation of user $u$ after the $l$ -th layer.
- $\mathbf{m}_{u \leftarrow i}^{(l)}$ : Message from item $i$ to user $u$ at layer $l$ .
- $\mathbf{m}_{u \leftarrow u}^{(l)}$ : Self-connection message for user $u$ at layer $l$ .
- $\mathbf{W}_1^{(l)}, \mathbf{W}_2^{(l)} \in \mathbb{R}^{d_l \times d_{l-1}}$ : Trainable transformation matrices for layer $l$ . $d_l$ is the transformation size for layer $l$ .
- $\mathbf{e}_i^{(l-1)}, \mathbf{e}_u^{(l-1)}$ : Item and user representations obtained from the previous layer (l-1). These representations already incorporate messages from (l-1)-hop neighbors.
- The term $\frac{1}{\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}}$ remains the same as in first-order propagation as it is a structural coefficient. This recursive process allows information to flow across high-order connectivities, explicitly injecting collaborative signals into the representation learning. An analogous process occurs for items.
Propagation Rule in Matrix Form: For efficient batch implementation, the layer-wise propagation rule (Equations (5) and (6)) can be expressed in matrix form: $\mathbf{E}^{(l)} = \mathrm{LeakyReLU} \left( (\mathcal{L} + \mathbf{I}) \mathbf{E}^{(l-1)} \mathbf{W}_1^{(l)} + \mathcal{L} (\mathbf{E}^{(l-1)} \odot \mathbf{E}^{(l-1)}) \mathbf{W}_2^{(l)} \right)$
- $\mathbf{E}^{(l)} \in \mathbb{R}^{(N+M) \times d_l}$ : The matrix containing the representations for all users and items after $l$ steps of embedding propagation.
- $\mathbf{E}^{(0)}$ : The initial embedding matrix $\mathbf{E}$ from the embedding layer, where $\mathbf{e}_u^{(0)} = \mathbf{e}_u$ and $\mathbf{e}_i^{(0)} = \mathbf{e}_i$ .
- $\mathbf{I}$ : An identity matrix of appropriate size, representing the self-connections.
- $\mathcal{L}$ $L$ : The Laplacian matrix for the user-item graph. It is formulated using the adjacency matrix $\mathbf{A}$ $A$ and degree matrix $\mathbf{D}$ $D$ : $\mathcal{L} = \mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}} \quad \text{and} \quad \mathbf{A} = \begin{bmatrix} 0 & \mathbf{R} \\ \mathbf{R}^\top & 0 \end{bmatrix}$
  - $\mathbf{R} \in \mathbb{R}^{N \times M}$ : The user-item interaction matrix.
  - 0: An all-zero matrix.
  - $\mathbf{A}$ : The adjacency matrix of the combined user-item graph.
  - $\mathbf{D}$ : The diagonal degree matrix, where $D_{tt} = \lvert \mathcal{N}_t \rvert$ (the degree of node $t$ ).
  - The non-zero off-diagonal entries of $\mathcal{L}$ are $\mathcal{L}_{ui} = 1/\sqrt{\lvert \mathcal{N}_u \rvert \lvert \mathcal{N}_i \rvert}$ , which corresponds to the $\phi_{ui}$ coefficient used in message construction.
- $\mathbf{W}_1^{(l)}, \mathbf{W}_2^{(l)}$ : Trainable weight matrices for layer $l$ .
- $\odot$ : Element-wise product between the two $\mathbf{E}^{(l-1)}$ matrices, which applies the interaction term across all user and item embeddings simultaneously. This matrix form allows for efficient computation without explicit node sampling typically required in large-scale GCNs.

4.2.3. Model Prediction

After $L$ embedding propagation layers, multiple representations are obtained for each user $u$ (i.e., $\{ \mathbf{e}_u^{(0)}, \mathbf{e}_u^{(1)}, \dots, \mathbf{e}_u^{(L)} \}$ ) and for each item $i$ (i.e., $\{ \mathbf{e}_i^{(0)}, \mathbf{e}_i^{(1)}, \dots, \mathbf{e}_i^{(L)} \}$ ).

Since representations from different layers capture different orders of connectivity and contribute uniquely to user preference, they are combined.
The paper uses concatenation to form the final embedding for a user and an item: $\mathbf{e}_u^* = \mathbf{e}_u^{(0)} \parallel \dots \parallel \mathbf{e}_u^{(L)} \\ \mathbf{e}_i^* = \mathbf{e}_i^{(0)} \parallel \dots \parallel \mathbf{e}_i^{(L)}$
- $\parallel$ : The concatenation operation. This layer-aggregation mechanism enriches the initial embeddings and allows the model to control the range of propagation by adjusting $L$ . It's a simple yet effective choice as it involves no additional parameters.
Finally, the user's preference towards a target item is estimated using an inner product between their final embeddings: $\hat{y}_{\mathrm{NGCF}}(u, i) = (\mathbf{e}_u^*)^\top \mathbf{e}_i^*$
- $\hat{y}_{\mathrm{NGCF}}(u, i)$ : The predicted affinity score for user $u$ and item $i$ .
- $(\cdot)^\top$ : Transpose operation. The paper emphasizes embedding function learning and uses a simple inner product for interaction function, leaving more complex options for future work.

4.2.4. Optimization

To train the NGCF model, the pairwise BPR (Bayesian Personalized Ranking) loss [26] is optimized. BPR is widely used in recommender systems for implicit feedback, assuming that observed interactions should have higher prediction values than unobserved ones.

The objective function is: $Loss = \sum_{(u, i, j) \in O} - \ln \sigma(\hat{y}_{ui} - \hat{y}_{uj}) + \lambda \lVert \Theta \rVert_2^2$
- $O = \{ (u, i, j) \mid (u, i) \in \mathcal{R}^+, (u, j) \in \mathcal{R}^- \}$ $O = {(u, i, j) ∣ (u, i) \in R^{+}, (u, j) \in R^{-}}$ : The set of pairwise training data.
  - $\mathcal{R}^+$ : The set of observed user-item interactions (positive instances).
  - $\mathcal{R}^-$ : The set of unobserved user-item interactions (negative instances), typically generated by negative sampling.
  - (u, i, j): A triplet indicating that user $u$ prefers item $i$ over item $j$ .
- $\sigma(\cdot)$ : The sigmoid function, which squashes values between 0 and 1.
- $\hat{y}_{ui}$ : The predicted score for user $u$ and item $i$ .
- $\hat{y}_{uj}$ : The predicted score for user $u$ and item $j$ .
- $- \ln \sigma(\hat{y}_{ui} - \hat{y}_{uj})$ : The log-loss for the BPR objective, which maximizes the difference between positive and negative item scores.
- $\lambda$ : The coefficient controlling the strength of L2 regularization.
- $\lVert \Theta \rVert_2^2$ : The L2 norm of all trainable model parameters $\Theta$ .
- $\Theta = \{ \mathbf{E}, \{ \mathbf{W}_1^{(l)}, \mathbf{W}_2^{(l)} \}_{l=1}^L \}$ : The set of all trainable parameters, including the initial embedding matrix and all weight matrices from the $L$ propagation layers.
The model is optimized using mini-batch Adam [17]. For each mini-batch of sampled triplets, the user and item representations are established through $L$ propagation steps, and then model parameters are updated based on the gradients of the loss function.

4.2.4.1. Model Size

Despite generating an embedding matrix at each layer, NGCF introduces very few additional parameters compared to MF.

The additional parameters are primarily the two weight matrices $\{ \mathbf{W}_1^{(l)}, \mathbf{W}_2^{(l)} \}$ for each of the $L$ propagation layers.
Total additional parameters: $2L \times d_l \times d_{l-1}$ .
Since $L$ is typically small (e.g., < 5) and $d_l$ (transformation size) is often the embedding size (much smaller than number of users/items), this additional cost is negligible. For instance, on a dataset like Gowalla with 64-dim embeddings and 3 layers, NGCF adds only 0.024 million parameters compared to MF's 4.5 million.

4.2.4.2. Message and Node Dropout

To combat overfitting, two dropout techniques are employed:

Message Dropout: Randomly drops out outgoing messages during propagation.
- For the $l$ -th propagation layer, messages $\mathbf{m}_{u \leftarrow i}^{(l)}$ (Equation 6) are dropped with a probability $p_1$ . This means only a partial set of messages contributes to refined representations, making the model more robust to individual connections.
Node Dropout: Randomly blocks a particular node and discards all its outgoing messages.
- For the $l$ -th layer, $(M+N)p_2$ nodes (from the total $N+M$ users and items) are randomly dropped from the Laplacian matrix. This reduces the influence of specific users or items, enhancing robustness.
Dropout is only used during training and disabled during testing. Message dropout strengthens representation robustness against single connections, while node dropout addresses the influence of specific nodes.

4.2.5. Discussions

4.2.5.1. NGCF Generalizes SVD++

The paper claims that NGCF can generalize $SVD++$ [19], a well-known collaborative filtering model that augments user representations with implicit feedback from items they interacted with.

To recover a simplified version of $SVD++$ $S V D + +$ (termed NGCF-SVD), NGCF is configured as follows:
- Set the number of propagation layers $L=1$ .
- Disable the transformation matrices ( $\mathbf{W}_1, \mathbf{W}_2$ ) and the non-linear activation function ( $\mathrm{LeakyReLU}$ ).
- Under these conditions, the representations $\mathbf{e}_u^{(1)}$ and $\mathbf{e}_i^{(1)}$ after the first layer simplify.
The simplified model NGCF-SVD can be formulated as: $\hat{y}_{\mathrm{NGCF-SVD}} = \left( \mathbf{e}_u + \sum_{i' \in \mathcal{N}_u} \mathcal{P}_{ui'} \mathbf{e}_{i'} \right)^\top \left( \mathbf{e}_i + \sum_{u' \in \mathcal{N}_i} \mathcal{P}_{iu'} \mathbf{e}_{u'} \right)$
- $\hat{y}_{\mathrm{NGCF-SVD}}$ : Predicted score by the NGCF-SVD model.
- $\mathbf{e}_u, \mathbf{e}_i$ : Initial user and item embeddings.
- $\mathcal{N}_u$ : Set of items user $u$ interacted with.
- $\mathcal{N}_i$ : Set of users who interacted with item $i$ .
- $\mathcal{P}_{ui'}, \mathcal{P}_{iu'}$ : Coefficients controlling the contribution of neighboring item $i'$ to user $u$ 's representation, and neighboring user $u'$ to item $i$ 's representation, respectively.
By setting $\mathcal{P}_{ui'} = 1/\sqrt{\lvert \mathcal{N}_u \rvert}$ (which is the decay factor from the message construction) and $\mathcal{P}_{iu'} = 0$ (or a similar specific decay), NGCF-SVD can exactly recover the $SVD++$ model.
Furthermore, FISM [16] (a widely used item-based CF model) can also be seen as a special case where $\mathcal{P}_{iu'}$ is always 0, effectively considering only the item-side aggregation. This demonstrates NGCF's generality and its ability to encompass existing first-order models within its framework.

4.2.5.2. Time Complexity Analysis

The computational complexity of NGCF is primarily determined by its layer-wise propagation rule (Equation 7).

For the $l$ -th propagation layer, the main operation is matrix multiplication involving the Laplacian matrix and embedding matrices.
The computational complexity of this operation is $O(\lvert \mathcal{R}^+ \rvert d_l d_{l-1})$ , where $\lvert \mathcal{R}^+ \rvert$ is the number of non-zero entries in the Laplacian matrix (which corresponds to the number of observed interactions), $d_l$ is the current transformation size, and $d_{l-1}$ is the previous transformation size.
For the prediction layer, only an inner product is involved.
Therefore, the overall complexity for evaluating NGCF during training for one epoch is: $O \left( \sum_{l=1}^L \lvert \mathcal{R}^+ \rvert d_l d_{l-1} + \sum_{l=1}^L \lvert \mathcal{R}^+ \rvert d_l \right)$
Empirically, the paper notes that for the Gowalla dataset:
- Training: MF costs about 20s per epoch, while NGCF costs about 80s per epoch.
- Inference: MF costs about 80s for all testing instances, while NGCF costs about 260s. This shows that while NGCF is more computationally intensive than MF due to $L$ propagation layers, it remains tractable.

5. Experimental Setup

5.1. Datasets

The experiments were conducted on three publicly accessible, real-world benchmark datasets that vary in domain, size, and sparsity. For all datasets, a 10-core setting was applied, meaning only users and items with at least ten interactions were retained to ensure data quality. For each dataset, $80\%$ of historical interactions per user were used for the training set, and the remaining $20\%$ for the test set. From the training set, $10\%$ of interactions were randomly selected as a validation set for hyperparameter tuning. For each observed user-item interaction (positive instance), negative sampling was performed to pair it with one unobserved item (negative instance).

Gowalla:
- Source: Check-in dataset from Gowalla, a location-based social networking service.
- Characteristics: Captures user check-ins at various locations.
- Domain: Location-based social interactions.
Yelp2018:*
- Source: Adopted from the 2018 edition of the Yelp challenge.
- Characteristics: Local businesses (restaurants, bars) are treated as items.
- Domain: Business reviews and local services.
Amazon-Book:
- Source: Part of the Amazon-review dataset, specifically focusing on book reviews.
- Characteristics: User ratings/reviews for books.
- Domain: E-commerce product reviews (books).
  
  The following are the results from Table 1 of the original paper:
  
  Dataset #Users #Items #Interactions Density
  
  Gowalla 29,858 40,981 1,027,370 0.00084
  
  Yelp2018* 31,668 38,048 1,561,406 0.00130
  
  Amazon-Book 52,643 91,599 2,984,108 0.00062

These datasets were chosen because they are widely accessible, cover different domains, and represent various scales and sparsity levels, making them effective for validating the generality and performance of recommender systems. No concrete example of a data sample (e.g., a specific check-in record or review text) was provided in the paper.

5.2. Evaluation Metrics

To evaluate the effectiveness of top- $K$ recommendation and preference ranking, two widely used evaluation protocols are adopted: recall@K and ndcg@K. For the experiments, $K$ is set to 20 by default. The average metrics for all users in the test set are reported.

5.2.1. Recall@K

Conceptual Definition: Recall@K measures the proportion of relevant items (items a user actually interacted with in the test set) that are successfully retrieved among the top $K$ recommended items. It focuses on how many of the "good" items were found within the recommendation list, regardless of their rank within that list.
Mathematical Formula: $\text{Recall@K} = \frac{1}{|U_{test}|} \sum_{u \in U_{test}} \frac{|\text{RelevantItems}_u \cap \text{TopKRecommendations}_u|}{|\text{RelevantItems}_u|}$
Symbol Explanation:
- $U_{test}$ : The set of users in the test set.
- $|\cdot|$ : Denotes the cardinality (number of elements) of a set.
- RelevantItems_u: The set of items that user $u$ interacted with in the test set (ground truth).
- TopKRecommendations_u: The set of the top $K$ items recommended to user $u$ by the model.
- $\cap$ : Set intersection.
- $\sum_{u \in U_{test}}$ : Summation over all users in the test set.

5.2.2. NDCG@K (Normalized Discounted Cumulative Gain at K)

Conceptual Definition: NDCG@K is a measure of ranking quality, considering not only the presence of relevant items but also their position in the ranked list. It assigns higher scores to relevant items that appear earlier (at higher ranks) in the recommendation list. It is "normalized" by comparing the calculated DCG to the ideal DCG (where all relevant items are perfectly ranked at the top) to produce a score between 0 and 1.
Mathematical Formula: $\text{DCG@K} = \sum_{j=1}^K \frac{2^{\text{rel}_j} - 1}{\log_2(j+1)} \\ \text{IDCG@K} = \sum_{j=1}^{\min(|\text{RelevantItems}_u|, K)} \frac{2^{\text{rel}_j} - 1}{\log_2(j+1)} \\ \text{NDCG@K} = \frac{1}{|U_{test}|} \sum_{u \in U_{test}} \frac{\text{DCG@K}_u}{\text{IDCG@K}_u}$
Symbol Explanation:
- $\text{rel}_j$ : The relevance score of the item at position $j$ in the ranked list. In implicit feedback scenarios, this is typically 1 if the item is relevant (observed in the test set) and 0 otherwise.
- $j$ : The rank position of an item in the recommendation list (from 1 to $K$ ).
- $\log_2(j+1)$ : A logarithmic discount factor, penalizing relevant items at lower ranks.
- $\text{DCG@K}_u$ : Discounted Cumulative Gain for user $u$ at rank $K$ .
- $\text{IDCG@K}_u$ : Ideal Discounted Cumulative Gain for user $u$ at rank $K$ , calculated by placing all relevant items in the test set at the top of the list in decreasing order of relevance (which is 1 for all in implicit feedback).
- $\min(|\text{RelevantItems}_u|, K)$ : Ensures that IDCG@K sums up to $K$ or the total number of relevant items if less than $K$ .
- $U_{test}$ : The set of users in the test set.
- $|\cdot|$ : Denotes the cardinality of a set.
- $\sum_{u \in U_{test}}$ : Summation over all users in the test set.

5.3. Baselines

NGCF was compared against several representative state-of-the-art collaborative filtering methods:

MF (Matrix Factorization) [26]:
- Description: A basic MF model optimized with Bayesian Personalized Ranking (BPR) loss. It models user-item direct interactions as the target value for its interaction function (inner product).
- Representativeness: A fundamental and widely used baseline in CF.
NeuMF (Neural Matrix Factorization) [14]:
- Description: A state-of-the-art neural CF model that uses multiple hidden layers above element-wise product and concatenation of user/item embeddings to capture non-linear feature interactions. A two-layered architecture is used with consistent hidden dimension size.
- Representativeness: A strong baseline representing deep learning advancements in CF's interaction function.
CMN (Collaborative Memory Network) [5]:
- Description: A state-of-the-art memory-based model. It constructs user representations by attentively combining memory slots of neighboring users via memory layers. It uses first-order connections to find similar users.
- Representativeness: A strong memory network based approach that explicitly considers neighbors.
HOP-Rec (High-Order Proximity for Implicit Recommendation) [40]:
- Description: A state-of-the-art graph-based model. It utilizes random walks to derive high-order neighbors, which are then used to enrich the user-item interaction data for training an MF model with BPR.
- Representativeness: A direct competitor that also attempts to leverage high-order connectivity, but in a different manner.
PinSage [42]:
- Description: An industrial GraphSAGE [8] implementation for item-item graphs (Pinterest image recommendation). The paper adapts it to the user-item interaction graph, using two graph convolution layers with hidden dimension equal to embedding size.
- Representativeness: A strong GNN-based model, particularly for large-scale recommendation.
GC-MC (Graph Convolutional Matrix Completion) [29]:
- Description: Applies GCN [18] as an encoder for user and item representations, but it primarily considers first-order neighbors with one graph convolution layer.
- Representativeness: An early and influential GCN-based CF model.
SpectralCF [43]:
- Description: This model was considered but not selected for comparison due to its high computational and resource cost associated with eigen-decomposition of the graph adjacency matrix, making it impractical for large-scale datasets.
- Representativeness: A spectral GCN approach, excluded due to practical limitations.

Parameter Settings:

Embedding size: Fixed to 64 for all models.
HOP-Rec: Random walk steps searched in $\{1, 2, 3\}$ ; learning rate in $\{0.025, 0.020, 0.015, 0.010\}$ .
Optimizer: Adam for all models except HOP-Rec.
Batch size: Fixed at 1024.
Hyperparameters (grid search):
- Learning rate: $\{0.0001, 0.0005, 0.001, 0.005\}$ .
- L2 regularization coefficient ( $\lambda$ ): $\{10^{-5}, 10^{-4}, \dots, 10^1, 10^2\}$ .
- Dropout ratio (for message dropout $p_1$ and node dropout $p_2$ ): $\{0.0, 0.1, \dots, 0.8\}$ .
Initializer: Xavier initializer [6] for model parameters.
Early stopping: Applied if recall@20 on the validation data does not improve for 50 successive epochs.
NGCF depth (L): Set to 3 for modeling third-order connectivity (unless specified otherwise for ablation studies).
Default dropout for NGCF: node dropout ratio 0.0, message dropout ratio 0.1.

6. Results & Analysis

6.1. Core Results Analysis

6.1.1. Overall Performance Comparison

The paper conducted a comprehensive comparison of NGCF against several baselines on three datasets. The results, summarized in Table 2, demonstrate NGCF's superior performance across all benchmarks.

The following are the results from Table 2 of the original paper:

	Gowalla		Yelp2018*		Amazon-Book
	recall	ndcg	recall	ndcg	recall	ndcg
MF	0.1291	0.1109	0.0433	0.0354	0.0250	0.0196
NeuMF	0.1399	0.1212	0.0451	0.0363	0.0258	0.0200
CMN	0.1405	0.1221	0.0457	0.0369	0.0267	0.0218
HOP-Rec	0.1399	0.1214	0.0517	0.0428	0.0309	0.0232
GC-MC	0.1395	0.1204	0.0462	0.0379	0.0288	0.0224
PinSage	0.1380	0.1196	0.0471	0.0393	0.0282	0.0219
NGCF-3	0.1569*	0.1327*	0.0579*	0.0477*	0.0337*	0.0261*
%mprov.	11.68%	8.64%	11.97%	11.29%	9.61%	12.50%
p-value	2.01e-7	3.03e-3	5.34e-3	4.62e-4	3.48e-5	1.26e-4

Key Observations:

MF's Weakness: MF performs poorly across all datasets, indicating its linear interaction function and lack of explicit collaborative signal encoding result in insufficient embeddings.
NeuMF's Improvement: NeuMF consistently outperforms MF, highlighting the benefit of non-linear feature interactions in the interaction function. However, it still doesn't explicitly encode collaborative signals in embeddings.
Impact of First-order Neighbors: GC-MC shows improvements over MF and NeuMF in some cases (e.g., Amazon-Book), confirming that incorporating first-order neighbors helps. However, its performance can be mixed (e.g., ndcg@20 on Yelp2018* is lower than NeuMF), possibly due to its failure to fully capture non-linear interactions.
Memory Networks and High-order Connectivity: CMN generally performs better than GC-MC, possibly due to its neural attention mechanism for weighting neighbors. HOP-Rec achieves significant improvements, demonstrating the value of high-order connectivity for enriching training data. PinSage also shows good performance, especially on Yelp2018*, as it leverages high-order connectivity in the embedding function.
NGCF's Dominance: NGCF-3 (NGCF with 3 propagation layers) consistently achieves the best performance on all datasets, significantly outperforming all baselines.
- It improves recall@20 over the strongest baselines by 11.68% (Gowalla), 11.97% (Yelp2018*), and 9.61% (Amazon-Book).
- The p-values (all $<0.05$ ) indicate these improvements are statistically significant.
Justification for NGCF: The superior performance is attributed to NGCF's ability to explicitly model high-order connectivity through embedding propagation layers, effectively injecting collaborative signals into the embedding learning process. Unlike CMN and GC-MC which are limited to first-order neighbors, NGCF explores deeper connections. Compared to PinSage, NGCF's layer-aggregation mechanism combines multi-grained representations from different layers, providing a richer final embedding. Its advantage over HOP-Rec further validates that explicitly encoding CF in the embedding function is more effective than merely enriching training data.

6.1.2. Performance Comparison w.r.t. Interaction Sparsity Levels

The paper investigates how NGCF performs across different sparsity levels by dividing users into groups based on their number of interactions. The results for ndcg@20 are shown in Figure 4.

The following figure (Figure 4 from the original paper) shows ndcg@20 on different user groups in Gowalla, Yelp2018*, and Amazon-Book datasets.

该图像是性能比较图，展示了不同推荐模型在多个用户组上的ndcg@20指标，包括MF、NeuMF、CMN、HOP-Rec、PinSage、GC-MC和NGCF。图中显示了在游玩平台（GoWalla）、Yelp2018和亚马逊图书（Amazon-book）数据集上的实验结果，NGCF模型表现优异。

hidicahevolv onrae @20 .

Key Observations:

Overall Superiority in Sparsity: Both NGCF and HOP-Rec consistently outperform other baselines across all user groups and sparsity levels. This indicates that leveraging connectivity information (especially high-order) is highly beneficial for representation learning, particularly for users with few interactions. This suggests NGCF can effectively alleviate the sparsity issue in recommender systems.
Greater Impact on Inactive Users: NGCF shows more significant improvements in the groups with fewer interactions (e.g., $< 24$ and $< 50$ interactions in Gowalla, where improvements are 8.49% and 7.79% respectively over best baselines). For highly active users (e.g., $< 1014$ in Gowalla), the improvements are still present but less dramatic (e.g., 1.29%). This verifies that embedding propagation is particularly advantageous for relatively inactive users, as it can synthesize collaborative signals from their sparse neighborhoods.

6.2. Ablation Studies / Parameter Analysis

6.2.1. Effect of Layer Numbers

The paper investigates the impact of the number of embedding propagation layers ( $L$ ) on NGCF's performance, varying $L$ from 1 to 4.

The following are the results from Table 3 of the original paper:

	Gowalla		Yelp2018*		Amazon-Book
	recall	ndcg	recall	ndcg	recall	ndcg
NGCF-1	0.1556	0.1315	0.0543	0.0442	0.0313	0.0241
NGCF-2	0.1547	0.1307	0.0566	0.0465	0.0330	0.0254
NGCF-3	0.1569	0.1327	0.0579	0.0477	0.0337	0.0261
NGCF-4	0.1570	0.1327	0.0566	0.0461	0.0344	0.0263

Key Observations:

Benefits of Depth: Increasing the depth from NGCF-1 to NGCF-2 and NGCF-3 generally enhances recommendation performance across all datasets. NGCF-1 only captures first-order neighbors, while NGCF-2 and NGCF-3 capture second- and third-order connectivity, which carry more complex collaborative signals (user similarity, potential recommendations).
Overfitting with Excessive Depth: NGCF-4 shows slight improvements on Amazon-Book but leads to overfitting on Yelp2018* (performance drops compared to NGCF-3). This suggests that excessively deep architectures might introduce noise into the representation learning process. Three layers (NGCF-3) appear to be a good balance for capturing sufficient CF signals without introducing too much noise or overfitting.
Consistent Superiority: Even with varying layer numbers, NGCF consistently outperforms other methods (as seen by comparing Table 3 to Table 2), reaffirming the effectiveness of explicitly modeling high-order connectivity.

6.2.2. Effect of Embedding Propagation Layer and Layer-Aggregation Mechanism

To understand the specific contributions of the proposed embedding propagation layer design, variants of NGCF-1 (single layer) were tested.

The following are the results from Table 4 of the original paper:

	Gowalla		Yelp2018*		Amazon-Book
	recall	ndcg	recall	ndcg	recall	ndcg
NGCF-1	0.1556	0.1315	0.0543	0.0442	0.0313	0.0241
NGCF-1_SVD++	0.1517	0.1265	0.0504	0.0414	0.0297	0.0232
NGCF-1_GC-MC	0.1523	0.1307	0.0518	0.0420	0.0305	0.0234
NGCF-1_PinSage	0.1534	0.1308	0.0516	0.0420	0.0293	0.0231

Key Observations:

Importance of Representation Interactions: NGCF-1 consistently outperforms all its variants ( $NGCF-1_SVD++$ , $NGCF-1_GC-MC$ , $NGCF-1_PinSage$ ). This highlights the crucial role of the representation interaction term ( $\mathbf{e}_i \odot \mathbf{e}_u$ ) within the message construction function. This term makes messages dependent on the affinity between the source and target embeddings, acting like an attention mechanism and distilling more relevant information. The variants, by contrast, rely only on linear transformations.
Significance of Self-Connections and Non-linearity: $NGCF-1_SVD++$ generally performs worse than $NGCF-1_GC-MC$ and $NGCF-1_PinSage$ . This suggests the importance of retaining original features (via self-connections) and non-linear transformations in the propagation layer, which $NGCF-1_SVD++$ omits to simplify to an $SVD++$ -like form.
Effectiveness of Layer Aggregation: When comparing NGCF-1 variants with their multi-layer counterparts (e.g., $NGCF-1_GC-MC$ vs. GC-MC, $NGCF-1_PinSage$ vs. PinSage from Table 2), the NGCF-1 variants, when combined through layer aggregation (concatenating output of different layers, even if only one layer is used for comparison), often show better performance. This underscores the significance of the layer-aggregation mechanism (concatenation) for combining information from different layers, a finding consistent with other GNN research.

6.2.3. Effect of Dropout

The paper investigates the effect of message dropout ( $p_1$ ) and node dropout ( $p_2$ ) on NGCF's performance. The results are plotted in Figure 5.

The following figure (Figure 5 from the original paper) shows the effect of node dropout and message dropout ratios on recall@20 across different datasets.

Figure 5: Effect of node dropout and message dropout ratios.
该图像是图表，展示了在不同数据集上MF和NGCF模型在每个训练周期的召回率（recall@20）。左图为Gowalla数据集，中图为Yelp2018数据集，右图为Amazon-Book数据集。可以观察到NGCF模型在所有数据集上均优于MF模型。

Figure 5: Effect of node dropout and message dropout ratios.

Key Observations:

Node Dropout's Superiority: Node dropout generally offers better performance than message dropout. For example, on Gowalla, a node dropout ratio of 0.2 yields a recall@20 of 0.1514, which is higher than the best message dropout result of 0.1506.
Robustness: This suggests that dropping out entire nodes (and all their outgoing messages) makes the representations more robust not only against specific connections but also against the influence of particular users or items. Node dropout appears to be a more effective strategy for preventing overfitting in graph neural networks by promoting more generalized node representations.

6.2.4. Test Performance w.r.t. Epoch

The convergence behavior of NGCF compared to MF over training epochs is shown in Figure 6 (mis-captioned as Figure 5 in the original VLM descriptions, but correctly referenced as Figure 6 in the text).

The following figure (Figure 6 from the original paper) shows the test performance (recall@20) of each epoch for MF and NGCF.

Figure 7: Visualization of the learned t-SNE transformed representations derived from MF and NGCF-3. Each star represents a user from Gowalla dataset, while the points with the same color denote the relevant items. Best view in color.
该图像是图表，展示了从MF (NGCF-0) 和 NGCF-3 获得的t-SNE转换表示。图中星形标记代表Gowalla数据集中的用户，各颜色的点表示相关项目，最佳效果在彩色显示中展现。

Figure 6: Test performance of each epoch of MF and NGCF.

Key Observations:

Faster Convergence for NGCF: NGCF exhibits faster convergence than MF across all three datasets. This is likely because NGCF explicitly incorporates indirectly connected users and items through embedding propagation when optimizing interaction pairs in each mini-batch.
Better Model Capacity: This observation highlights NGCF's superior model capacity and the effectiveness of performing embedding propagation directly in the embedding space. The richer information flow allows NGCF to learn effective representations more quickly.

6.3. Effect of High-order Connectivity

To qualitatively understand how embedding propagation facilitates representation learning, the paper visualizes t-SNE transformed representations of randomly selected users and their relevant items from the Gowalla dataset. Figure 7 (mis-captioned as Figure 6 in the original VLM descriptions, but correctly referenced as Figure 7 in the text) compares MF (as NGCF-0, i.e., no propagation layers) with NGCF-3.

The following figure (Figure 7 from the original paper) shows the visualization of the learned t-SNE transformed representations derived from MF and NGCF-3. Each star represents a user from Gowalla dataset, while the points with the same color denote the relevant items. Best view in color.

Figure 7: Visualization of the learned t-SNE transformed representations derived from MF and NGCF-3. Each star represents a user from Gowalla dataset, while the points with the same color denote the relevant items. Best view in color.

Key Observations:

Clustering in Embedding Space: With NGCF-3, the representations of users and their relevant items (even those not seen during training for that specific user) are well-reflected in the embedding space. Points with the same colors (representing items consumed by the same users) tend to form distinct clusters. This means that users and their preferred items (and items preferred by similar users) are embedded into close regions of the vector space.
Enhanced Closeness with Propagation: Comparing MF (Figure 7a) and NGCF-3 (Figure 7b), it's evident that NGCF-3 causes the embeddings of a user's historical items to be much closer to each other and to the user's own embedding. For example, for users like 12201 and 6880, their respective item clusters are more compact and distinct in NGCF-3.
Qualitative Verification: This visualization qualitatively verifies that the proposed embedding propagation layer effectively injects explicit collaborative signals into the representations. The high-order connectivity learned through multiple layers brings related entities closer in the embedding space, which is crucial for accurate recommendations.

7. Conclusion & Reflections

7.1. Conclusion Summary

This work makes a significant contribution to model-based collaborative filtering (CF) by explicitly incorporating the collaborative signal into the embedding function itself. The authors introduce Neural Graph Collaborative Filtering (NGCF), a novel framework built upon graph neural networks (GNNs). NGCF leverages the user-item interaction graph to model high-order connectivities through an embedding propagation mechanism. This process enriches user and item embeddings by allowing them to iteratively absorb collaborative signals from their multi-hop neighbors. Extensive experiments on three real-world datasets consistently demonstrate that NGCF achieves state-of-the-art performance, significantly outperforming existing CF methods. The analysis further validates that embedding propagation is crucial for learning higher-quality user and item representations, especially benefiting inactive users and improving cold-start scenarios.

7.2. Limitations & Future Work

The authors acknowledge several areas for future improvement and research:

Attention Mechanism: Incorporating an attention mechanism [2] to learn variable weights for neighbors during embedding propagation. This would allow the model to dynamically prioritize more important neighbors rather than using fixed (Laplacian-based) weights, potentially improving model generalization and interpretability.
Adversarial Learning: Exploring adversarial learning [13] on user/item embeddings and the graph structure to enhance NGCF's robustness. This could make the model less susceptible to noisy or malicious interactions.
Exploiting Other Structural Knowledge: Expanding NGCF to leverage other forms of structural information beyond simple user-item interactions. This includes:
- Cross features [41] for context-aware and semantics-rich recommendation [22, 27].
- Item knowledge graphs [32] to establish knowledge-aware connectivities between users and items, which could unveil user decision-making processes.
- Social networks [34] to incorporate social influence into recommendations. The authors hope that NGCF's development will contribute to more effective and interpretable recommendation systems by facilitating the reasoning of user online behavior.

7.3. Personal Insights & Critique

7.3.1. Personal Insights

Paradigm Shift in Embedding Learning: NGCF provides a compelling argument for a paradigm shift in embedding learning for CF. Instead of treating embeddings as static attributes and interactions as mere training signals, NGCF demonstrates the power of making embeddings inherently dynamic and context-aware by integrating graph structure. This is a crucial insight: the "signal" should be in the representation itself, not just an artifact of the loss function.
Power of High-Order Connectivity: The paper effectively illustrates the semantic richness of high-order connectivity. The example of u1-i2-u2-i4 clearly shows how multi-hop paths can capture nuanced collaborative signals that simpler models miss. This highlights the importance of going beyond direct neighbors.
GNNs as a Natural Fit for CF: NGCF reinforces the idea that Graph Neural Networks are a natural and powerful tool for recommender systems. The user-item interaction data is inherently graph-structured, and GNNs are designed to process such data by message passing, making them highly suitable for capturing collaborative filtering effects.
Addressing Sparsity: The empirical results demonstrating NGCF's significant benefits for inactive users (those with fewer interactions) are particularly insightful. This suggests embedding propagation can effectively alleviate the sparsity issue by allowing users to indirectly gain information from a wider network of similar users and items.

7.3.2. Critique

Complexity of Message Construction: While the element-wise product term ( $\mathbf{e}_i \odot \mathbf{e}_u$ ) in message construction is shown to be effective, its exact theoretical justification beyond "affinity" could be explored more deeply. Is it the optimal interaction? More complex neural interaction functions within the message could be considered, though the paper mentions this for future work in the prediction layer.
Fixed Decay Factor: The use of the graph Laplacian norm ( $1/\sqrt{|\mathcal{N}_u||\mathcal{N}_i|}$ ) as a fixed decay factor $\phi_{ui}$ is a standard choice but might not be optimal. Learning this decay factor or making it context-dependent (e.g., using attention) could lead to further improvements, as mentioned in the future work.
Scalability for Extremely Large Graphs: While the matrix-form propagation rule avoids node sampling and is more efficient than some GCNs, the adjacency matrix $\mathbf{A}$ and Laplacian matrix $\mathcal{L}$ can still be very large for industrial-scale recommendation systems with billions of users and items. The current formulation still processes the full graph (or at least its edges) in each layer. Further optimizations for handling truly massive, dynamic graphs might be necessary, such as mini-batching over edges or subgraphs.
Interpretability of High-Order Paths: While NGCF explicitly captures high-order connectivity, the final concatenated embedding might lose some interpretability regarding which specific paths contribute most to a recommendation. Future work on attention could help in this regard.
Hyperparameter Sensitivity: As with many deep learning models, NGCF has several hyperparameters (number of layers, dropout ratios, learning rate, L2 regularization). Optimal performance depends on careful tuning, and the overfitting observed with NGCF-4 highlights this sensitivity.

Overall, NGCF is a rigorously designed and empirically validated model that significantly advances the field of collaborative filtering by elegantly integrating graph neural networks to enhance embedding learning. It opens many promising avenues for future research in graph-based recommendation.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.

Dataset	#Users	#Items	#Interactions	Density
Gowalla	29,858	40,981	1,027,370	0.00084
Yelp2018*	31,668	38,048	1,561,406	0.00130
Amazon-Book	52,643	91,599	2,984,108	0.00062

Neural Graph Collaborative Filtering

TL;DR Summary

Abstract

Mind Map

In-depth Reading

English Analysis~31 min read · 42,350 chars

1. Bibliographic Information

1.1. Title

1.2. Authors

1.3. Journal/Conference

1.4. Publication Year

1.5. Abstract

1.6. Original Source Link

2. Executive Summary

2.1. Background & Motivation

2.2. Main Contributions / Findings

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

3.2. Previous Works

3.2.1. Model-based CF Methods

3.2.2. Graph-Based CF Methods

3.2.3. Graph Convolutional Networks (GCNs) for Recommendation

3.3. Technological Evolution

3.4. Differentiation Analysis

4. Methodology

4.1. Principles

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Embedding Layer

4.2.2. Embedding Propagation Layers

4.2.2.1. First-order Propagation

4.2.2.2. High-order Propagation

4.2.3. Model Prediction

4.2.4. Optimization

4.2.4.1. Model Size

4.2.4.2. Message and Node Dropout

4.2.5. Discussions

4.2.5.1. NGCF Generalizes SVD++

4.2.5.2. Time Complexity Analysis

5. Experimental Setup

5.1. Datasets

5.2. Evaluation Metrics

5.2.1. Recall@K

5.2.2. NDCG@K (Normalized Discounted Cumulative Gain at K)

5.3. Baselines

6. Results & Analysis

6.1. Core Results Analysis

6.1.1. Overall Performance Comparison

6.1.2. Performance Comparison w.r.t. Interaction Sparsity Levels

6.2. Ablation Studies / Parameter Analysis

6.2.1. Effect of Layer Numbers

6.2.2. Effect of Embedding Propagation Layer and Layer-Aggregation Mechanism

6.2.3. Effect of Dropout

6.2.4. Test Performance w.r.t. Epoch

6.3. Effect of High-order Connectivity

7. Conclusion & Reflections

7.1. Conclusion Summary

7.2. Limitations & Future Work

7.3. Personal Insights & Critique

7.3.1. Personal Insights

7.3.2. Critique

Similar papers