Paper status: completed

Learning Intents behind Interactions with Knowledge Graph for Recommendation

Published:02/14/2021
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
4 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

KGIN models fine-grained user intents via attentive relation combinations and recursive relation path aggregation, improving long-range dependency modeling and outperforming existing GNN-based recommenders on benchmarks.

Abstract

Knowledge graph (KG) plays an increasingly important role in recommender systems. A recent technical trend is to develop end-to-end models founded on graph neural networks (GNNs). However, existing GNN-based models are coarse-grained in relational modeling, failing to (1) identify user-item relation at a fine-grained level of intents, and (2) exploit relation dependencies to preserve the semantics of long-range connectivity. In this study, we explore intents behind a user-item interaction by using auxiliary item knowledge, and propose a new model, Knowledge Graph-based Intent Network (KGIN). Technically, we model each intent as an attentive combination of KG relations, encouraging the independence of different intents for better model capability and interpretability. Furthermore, we devise a new information aggregation scheme for GNN, which recursively integrates the relation sequences of long-range connectivity (i.e., relational paths). This scheme allows us to distill useful information about user intents and encode them into the representations of users and items. Experimental results on three benchmark datasets show that, KGIN achieves significant improvements over the state-of-the-art methods like KGAT, KGNN-LS, and CKAN. Further analyses show that KGIN offers interpretable explanations for predictions by identifying influential intents and relational paths. The implementations are available at https://github.com/huangtinglin/Knowledge_Graph_based_Intent_Network.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

Learning Intents behind Interactions with Knowledge Graph for Recommendation

1.2. Authors

Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng Yuan, Zhenguang Liu, Xiangnan He, Tat-Seng Chua. The authors are affiliated with institutions such as the National University of Singapore, Zhejiang University, eBay, The Hong Kong Polytechnic University, and the University of Science and Technology of China. This suggests a collaborative effort from prominent researchers in the field of recommender systems and knowledge graphs across academia and industry.

1.3. Journal/Conference

Published at The Web Conference 2021 (WWW '21), April 19-23, 2021, Ljubljana, Slovenia. WWW is a highly reputable and influential conference in the fields of the World Wide Web, computer science, and information systems. Publication at WWW indicates that the research has undergone rigorous peer review and is recognized as a significant contribution to the field.

1.4. Publication Year

2021

1.5. Abstract

This paper addresses the limitations of existing graph neural network (GNN)-based recommender systems, which often suffer from coarse-grained relational modeling. Specifically, they fail to identify user-item relations at a fine-grained level of intents and to exploit relation dependencies to preserve the semantics of long-range connectivity. To overcome these issues, the authors propose a novel model called Knowledge Graph-based Intent Network (KGIN).

KGIN models each intent as an attentive combination of Knowledge Graph (KG) relations, promoting independence among different intents for improved model capability and interpretability. Furthermore, it introduces a new information aggregation scheme for GNNs that recursively integrates relation sequences of long-range connectivity, also known as relational paths. This allows the model to distill useful information about user intents and encode them into user and item representations. Experimental results on three benchmark datasets demonstrate that KGIN significantly outperforms state-of-the-art methods like KGAT, KGNN-LS, and CKAN. Additionally, KGIN provides interpretable explanations for predictions by identifying influential intents and relational paths.

https://arxiv.org/abs/2102.07057v1 Publication Status: This is a preprint link from arXiv, indicating it was made publicly available before or alongside its conference publication at WWW '21.

https://arxiv.org/pdf/2102.07057v1.pdf

2. Executive Summary

2.1. Background & Motivation

The core problem addressed by this paper lies within recommender systems that leverage Knowledge Graphs (KGs). While KGs offer rich entity and relation information to enhance recommendation accuracy and explainability, existing Graph Neural Network (GNN)-based models, which are a recent technical trend, fall short in two critical areas:

  1. Coarse-grained Relational Modeling of User Intents: Current GNN-based models treat the user-item relation as a single, uniform type, ignoring the fact that a user's decision to interact with an item can be driven by multiple, distinct intents or reasons. For example, a user might watch a movie because of its director and star (one intent), or because of its star and partner (another intent). This coarse-grained approach limits the ability to precisely model user preferences and the underlying reasons for interactions.

  2. Failure to Exploit Relation Dependencies in Long-range Connectivity: Existing GNNs primarily use node-based aggregation schemes, where information is collected from neighboring nodes without explicit differentiation of the relational paths it traverses. Although they can integrate multi-hop neighbors, they often model KG relations merely as decay factors in adjacency matrices, thus failing to preserve the semantic richness and dependencies inherent in sequences of relations (i.e., relational paths). This leads to a loss of structural information and holistic semantics of these paths in node representations.

    This problem is important because understanding user intents and preserving relational semantics are crucial for:

  • Improving recommendation accuracy by capturing finer-grained user preferences.

  • Enhancing the explainability of recommendations, allowing systems to articulate why an item is recommended.

  • Overcoming the limitations of node-based aggregations that do not fully leverage the structured nature of KGs.

    The paper's innovative idea is to explicitly model these intents and relational paths within a GNN framework, moving beyond coarse-grained and node-based approaches to unlock the full potential of KGs in recommendation.

2.2. Main Contributions / Findings

The paper makes several significant contributions to the field of knowledge-aware recommender systems:

  1. Revealing User Intents: It proposes to explicitly reveal user intents behind user-item interactions within the KG-based recommendation paradigm. This enhances model capacity by allowing for finer-grained characterization of user preferences and improves interpretability by associating intents with combinations of KG relations.

  2. Novel Model (KGIN): It introduces a new model named Knowledge Graph-based Intent Network (KGIN). KGIN addresses the limitations of prior GNN-based methods by simultaneously considering:

    • User-item relationships at a finer granularity of intents.
    • The long-range semantics of relational paths through a novel relational path-aware aggregation scheme.
    • An independence constraint is incorporated to ensure distinct and interpretable intents.
  3. Relational Path-aware Aggregation: KGIN devises a new information aggregation scheme for GNNs that recursively integrates relation sequences (relational paths). This scheme allows the model to distill useful information about user intents and encode relation dependencies and holistic semantics of paths into user and item representations.

  4. Empirical Validation: Extensive experimental studies on three benchmark datasets (Amazon-Book, Last-FM, and Alibaba-iFashion) demonstrate the superiority of KGIN. It significantly outperforms state-of-the-art baselines, including KGAT, KGNN-LS, and CKAN, in terms of recall@K and ndcg@K.

  5. Interpretability: KGIN offers a concrete mechanism for interpretable explanations for predictions. It can identify influential intents and relational paths that drive a user's interaction with an item, providing insights into why a particular recommendation is made.

    These findings collectively address the identified gaps in coarse-grained relational modeling and node-based aggregation, leading to more accurate and explainable knowledge-aware recommender systems.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To fully understand the KGIN model, a beginner should be familiar with several foundational concepts:

Recommender Systems (RS)

Recommender systems are information filtering systems that predict what a user might like, based on their past behavior and preferences, or the preferences of similar users. They are widely used in various applications like e-commerce (e.g., Amazon, Alibaba), streaming services (e.g., Netflix, Spotify), and social media.

  • Implicit Feedback: This refers to user actions that indirectly indicate preference, such as views, clicks, or purchases, rather than explicit ratings. Most recommender systems deal with implicit feedback because it is abundant and less burdensome for users to provide.
  • Matrix Factorization (MF): A foundational technique in recommender systems that decomposes the user-item interaction matrix into two lower-rank matrices: a user matrix and an item matrix. Each row in the user matrix represents a user embedding, and each row in the item matrix represents an item embedding. The dot product of a user embedding and an item embedding gives the predicted preference score. This technique aims to capture latent features that explain user preferences.

Knowledge Graphs (KG)

A Knowledge Graph (KG) is a structured representation of information that describes real-world entities and their relationships in a graph format. It consists of entities (nodes) and relations (edges) connecting them, often represented as (head entity, relation, tail entity) triplets.

  • Entities: Real-world objects, concepts, or abstract ideas (e.g., "Martin Freeman", "The Hobbit I", "director").
  • Relations: The types of connections or interactions between entities (e.g., "star", "director", "genre").
  • Triplets: The fundamental unit of a KG, representing a factual statement (e.g., (Martin Freeman, star, The Hobbit I)). KGs provide rich, semantic information that can greatly enhance the understanding of items and users in recommender systems.

Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs) are a class of neural networks designed to operate on graph-structured data. Unlike traditional neural networks that operate on independent data points, GNNs can learn representations (embeddings) for nodes and edges by aggregating information from their neighbors.

  • Information Aggregation (Message Passing): The core mechanism of GNNs. Each node iteratively updates its embedding by aggregating information from its neighbors and combining it with its own current embedding. This process can be repeated for multiple layers, allowing nodes to incorporate information from multi-hop neighbors.
  • Node Representation (Embedding): A low-dimensional vector that captures the structural and feature information of a node in the graph. GNNs aim to learn high-quality node representations that can then be used for downstream tasks like recommendation.

Attention Mechanism

The attention mechanism is a technique that allows a neural network to focus on specific parts of its input when making predictions. In the context of KGIN, it is used to:

  • Assign different weights to various KG relations when forming an intent embedding, indicating their relative importance.
  • Personalize the importance of intents for a specific user during aggregation. Mathematically, a common form of attention calculates scores for each input element and then uses a softmax function to convert these scores into a probability distribution, which acts as weights for a weighted sum of the input elements.

Contrastive Learning

Contrastive learning is a machine learning paradigm where the model learns to group similar samples closer together in the embedding space while pushing dissimilar samples further apart. In KGIN, a variant of this idea is used to encourage the independence of different intent representations. The goal is to make intent embeddings distinct from each other, ensuring each intent captures unique information.

Distance Correlation

Distance correlation is a measure of statistical dependence between two random variables (or vectors). Unlike Pearson correlation, which only measures linear dependence, distance correlation can detect both linear and nonlinear relationships. A key property is that the distance correlation is zero if and only if the random variables are statistically independent. KGIN uses this to regularize the intent embeddings, minimizing it to promote their independence.

3.2. Previous Works

The paper categorizes previous knowledge-aware recommendation methods into four groups: Embedding-based, Path-based, Policy-based, and GNN-based.

Embedding-based Methods

These methods primarily focus on first-order connectivity (direct relationships) in both user-item interaction data and Knowledge Graphs (KGs).

  • Concept: They employ KG embedding techniques like TransE and TransH to learn entity embeddings. These knowledge-aware embeddings are then used as prior or content information for items within a recommender model, often Matrix Factorization (MF).
  • Examples:
    • CKE [51]: Applies TransE on KG triplets and feeds knowledge-aware item embeddings into MF.
    • KTUP [4]: Uses TransH on both user-item interactions and KG triplets to jointly learn user preferences and perform KG completion. It also proposes coupling each intent with a single KG relation.
  • Limitation: While demonstrating the benefits of knowledge-aware embeddings, these methods often ignore higher-order connectivity. This means they fail to capture long-range semantics or sequential dependencies carried by paths between nodes, thereby limiting their ability to uncover complex user-item relationships.

Path-based Methods

These methods explicitly account for long-range connectivity by extracting paths that connect users and items through KG entities.

  • Concept: The extracted paths are then used to predict user preferences, often through models like recurrent neural networks (RNNs) or memory networks.
  • Examples:
    • RippleNet [36]: Memorizes item representations along paths rooted at each user and uses them to enhance user representations.
  • Limitations:
    1. Quality of Paths: Recommendation accuracy heavily depends on the quality of these paths.
    2. Brute-force Search: Brute-force path extraction can be labor-intensive and time-consuming for large-scale graphs [44].
    3. Meta-path Patterns: Using meta-path patterns to filter paths requires domain experts to predefine domain-specific patterns, leading to poor transferability across different domains [15, 17].

Policy-based Methods

Inspired by Reinforcement Learning (RL), these methods design RL agents to learn path-finding policies.

  • Concept: An RL agent learns to navigate the knowledge graph to find items of interest for a target user. These policy networks are considered efficient alternatives to brute-force search.
  • Examples:
    • PGPR [49]: Exploits a policy network to explore items of interest for a target user.
  • Limitations: RL-based methods often suffer from sparse reward signals, huge action spaces, and policy gradient-based optimization, which make them hard to train and converge to stable solutions [50, 52].

GNN-based Methods

These methods are founded on the information aggregation mechanism of Graph Neural Networks (GNNs).

  • Concept: They recursively aggregate information from one-hop neighbors to update node representations. By stacking multiple layers, information from multi-hop neighbors can be encoded, thereby modeling long-range connectivity.
  • Examples:
    • KGAT [41]: Combines user-item interactions and KG into a heterogeneous graph and applies an attentive neighborhood aggregation mechanism to generate user and item representations. User-item relationships and KG relations primarily serve as attentive weights.
    • KGNN-LS [38]: Converts KG into user-specific graphs and considers user preference on KG relations and label smoothness during information aggregation. It models relations as decay factors.
    • CKAN [47]: Builds upon KGNN-LS but uses different neighborhood aggregation strategies for the user-item graph and KG separately.
    • R-GCN [27]: Originally for knowledge graph completion, it views KG relations as different channels of information flow during aggregation.
  • Limitations:
    • Coarse-grained User-Item Relations: Most existing GNN-based methods assume only one relation between users and items, leaving hidden user intents unexplored.
    • Lack of Relational Dependency: Many GNNs model KG relations primarily as decay factors or attentive weights, failing to explicitly preserve the relational dependency and holistic semantics of paths. Their aggregation schemes are node-based, not distinguishing the paths from which information originates.

3.3. Technological Evolution

The evolution of knowledge-aware recommender systems can be traced through these categories:

  1. Early Methods (e.g., MF): Focused solely on user-item interactions, ignoring external knowledge.
  2. Embedding-based (e.g., CKE): Introduced KG embeddings as side information to enrich item representations, addressing the sparsity of interaction data and providing some semantic context. This marked the first step towards knowledge awareness.
  3. Path-based (e.g., RippleNet): Recognized the importance of multi-hop paths in KGs for capturing richer relational semantics. However, these methods struggled with the computational cost of path extraction or the need for domain-specific meta-paths.
  4. Policy-based (e.g., PGPR): Attempted to use reinforcement learning to find informative paths more efficiently, but faced challenges in training stability and exploration.
  5. GNN-based (e.g., KGAT, KGNN-LS): Leveraged the message-passing capabilities of Graph Neural Networks to implicitly model multi-hop connectivity and learn node representations end-to-end, integrating structural knowledge more seamlessly. This is a powerful paradigm but still had limitations in capturing fine-grained intents and relational path semantics.
  6. KGIN: This paper's work fits into the GNN-based category but represents an advancement by explicitly addressing the coarse-grained relational modeling and the lack of relational dependency in path semantics. It pushes the boundaries of GNNs to incorporate deeper semantic understanding from KGs.

3.4. Differentiation Analysis

Compared to the main methods in related work, KGIN introduces core innovations:

  • Compared to Embedding-based Methods (e.g., CKE, KTUP):

    • Differentiation: KGIN moves beyond first-order connectivity and implicit KG embeddings by explicitly modeling multi-hop relational paths and their dependencies through its GNN aggregation scheme. It doesn't just use KG embeddings as side information but deeply integrates KG structure into the representation learning process.
    • Innovation: KGIN's novel contribution is the fine-grained modeling of user intents as combinations of KG relations, which embedding-based methods do not address.
  • Compared to Path-based Methods (e.g., RippleNet, PGPR):

    • Differentiation: KGIN avoids the labor-intensive feature engineering or domain-specific meta-path definitions required by path-based methods. It also sidesteps the training stability issues of RL-based path-finding.
    • Innovation: KGIN implicitly encodes relational paths and their semantics directly into node representations via a GNN aggregation that respects relation dependencies, offering a more robust and end-to-end solution.
  • Compared to GNN-based Methods (e.g., KGAT, KGNN-LS, CKAN, R-GCN): This is where KGIN's core differentiation lies.

    • Differentiation: Existing GNN-based methods typically model user-item relations as homogeneous or coarse-grained (e.g., a single interact-with relation), and treat KG relations mostly as decay factors or attentive weights in node-based aggregation. This means they don't explicitly capture the multiple reasons (intents) behind an interaction or the semantic sequences of relations in multi-hop paths.
    • Innovation: KGIN introduces two fundamental improvements:
      1. User Intent Modeling: It explicitly posits that multiple latent intents drive user-item interactions. Each intent is defined as an attentive combination of KG relations, providing a fine-grained and interpretable understanding of user preferences. An independence constraint further ensures distinct intents.

      2. Relational Path-aware Aggregation: Unlike node-based aggregation, KGIN's scheme recursively integrates relation sequences (relational paths) by modeling relational messages through element-wise products with relation embeddings. This preserves the holistic semantics and dependencies of paths, which is crucial for deeply leveraging KG structure. It also uses different aggregation strategies for the intent graph and knowledge graph to capture diverse signals effectively.

        In essence, KGIN advances GNN-based recommendation by moving from coarse-grained, node-centric modeling to fine-grained, intent-aware, and relational path-preserving modeling, leading to both higher accuracy and greater interpretability.

4. Methodology

4.1. Principles

The core idea behind KGIN is to enhance recommender systems by leveraging the rich semantic information available in Knowledge Graphs (KGs) in a more fine-grained and structurally aware manner than previous Graph Neural Network (GNN)-based approaches. The method operates on two main principles:

  1. Modeling User Intents as Combinations of KG Relations: Instead of assuming a single, generic user-item interaction, KGIN postulates that users interact with items due to various underlying intents. These intents are not opaque latent vectors but are explicitly associated with distributions over KG relations. This means each intent captures a specific blend of KG attributes or connections that drive user behavior (e.g., "preference for movies by a certain director AND a specific star"). To ensure these intents are meaningful and distinct, an independence constraint is applied, encouraging different intents to represent unique aspects of user preference.

  2. Relational Path-aware Information Aggregation: KGIN recognizes that multi-hop connections in a KG form relational paths, and the sequence and dependencies of relations within these paths carry crucial semantic information. Traditional GNNs often aggregate information on a node-by-node basis, losing the holistic semantics of these paths. KGIN devises a new aggregation scheme that explicitly integrates the relation sequences of long-range connectivity. By modeling relations as operators (via element-wise product) rather than just attention weights or decay factors, it preserves the structural context and relational dependencies during message passing, allowing for more informative user and item representations.

    In essence, KGIN aims to understand why users interact with items at a deeper level by disentangling intents and how `knowledgeThinking...

Considering the Framework's Structure

I'm starting to break down the user's framework. I see the need to methodically populate each section with data from the research paper. Right now, I'm focusing on how the sections are arranged. I need to make sure I grasp the core requirements of each section of the framework. It's a structured approach, which will make the analysis clearer.

Planning the Extraction Process

I'm now devising the method for extracting data, targeting each framework section. The initial plan involves gathering bibliographic details. Subsequently, I'll tackle the executive summary, prioritizing problem identification, motivations, contributions, and key findings. Next, I'll meticulously explain prerequisite knowledge, breaking down recommender systems, knowledge graphs, and GNNs. I will then explain related work methods, including embedding-based approaches, path-based methods, policy-based techniques, and GNN-based methods.

Developing a Data Extraction Plan

I've formulated a comprehensive plan for extracting and structuring the data. First, I'll secure all bibliographic data. After that, the executive summary will follow, and then I'll delve into the necessary prerequisite knowledge. Finally, the "Related Work" section will be analyzed using a structured, comparative approach. I need to make sure I understand the different methods and how KGIN contrasts to the rest.

Structuring the Information

I'm now deep into the analysis. First, I will grab all the bibliographic data. Then, I will address the executive summary and prerequisite knowledge. After that, I will methodically delve into "Related Work". I'm structuring the extracted information to follow the provided framework, so I can ensure all areas are thoroughly covered. I want to keep the plan focused, yet flexible.

Planning the Deep Dive

I'm developing a structured plan for the detailed analysis. First, I'll extract bibliographic data. Next, the executive summary and prerequisite knowledge will be covered. I'll then move into the heart of the research: the methodology, experiments, and results, which will get my full attention. Finally, I'll provide my own reflections.

Initiating Data Collection

I've begun gathering the necessary information. Starting with the bibliographic data, I will follow with the abstract, and then focus on extracting the problem, motivations, contributions, and findings. I want to build a solid foundation before I start extracting the more complex pieces.

Starting the Extraction

I've finally initiated the data extraction phase. I'm taking the initial step by gathering and organizing bibliographic details. The structure is now set, and I'm ready to extract and meticulously dissect each key section of the research paper.

1. Bibliographic Information

1.1. Title

Learning Intents behind Interactions with Knowledge Graph for Recommendation

1.2. Authors

The paper is co-authored by:

  • Xiang Wang (National University of Singapore)

  • Tinglin Huang (Zhejiang University)

  • Dingxian Wang (eBay)

  • Yancheng Yuan (The Hong Kong Polytechnic University)

  • Zhenguang Liu (Zhejiang University)

  • Xiangnan He (University of Science and Technology of China)

  • Tat-Seng Chua (National University of Singapore)

    The authors represent a mix of academic institutions and industry research, indicating a blend of theoretical rigor and practical relevance in their research backgrounds. Xiangnan He is a prominent researcher in recommender systems.

1.3. Journal/Conference

The paper was published at WWW '21, which is the Web Conference 2021 (formerly known as World Wide Web Conference). WWW is a highly prestigious and influential conference in the fields of computer science, particularly in areas related to the World Wide Web, including web search, data mining, information retrieval, and recommender systems. Publication at WWW signifies high quality and significant impact within the research community.

1.4. Publication Year

2021

1.5. Abstract

The abstract introduces the growing role of Knowledge Graphs (KGs) in recommender systems, particularly with the trend of using Graph Neural Networks (GNNs). It highlights two key limitations of existing GNN-based models: (1) their coarse-grained relational modeling, failing to identify user-item relations at a fine-grained level of intents, and (2) their inability to exploit relation dependencies to preserve the semantics of long-range connectivity.

To address these issues, the authors propose Knowledge Graph-based Intent Network (KGIN). KGIN models each intent as an attentive combination of KG relations, promoting independence among intents for better model capability and interpretability. Furthermore, it introduces a novel information aggregation scheme for GNNs that recursively integrates relational paths (sequences of relations in long-range connectivity). This mechanism helps distill useful information about user intents into user and item representations. Experimental results on three benchmark datasets demonstrate that KGIN significantly outperforms state-of-the-art methods like KGAT, KGNN-LS, and CKAN. The paper also emphasizes KGIN's ability to provide interpretable explanations for predictions by identifying influential intents and relational paths.

https://arxiv.org/abs/2102.07057v1 The paper is available as a preprint on arXiv (version 1, published on 2021-02-14) and was subsequently published at WWW '21.

https://arxiv.org/pdf/2102.07057v1.pdf

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve is enhancing the accuracy and interpretability of recommender systems by better leveraging Knowledge Graphs (KGs). Recommender systems help users discover items of interest (e.g., movies, products, music). KGs, which store real-world facts as interconnected entities and relations, have proven valuable for providing rich contextual information and improving recommendations.

Existing Graph Neural Network (GNN)-based recommender models, while effective, suffer from two significant limitations:

  1. Coarse-grained Relational Modeling / Lack of User Intents: Current models treat user-item interactions as a single, undifferentiated relationship (e.g., "interact-with"). However, user behavior is often driven by multiple, distinct underlying reasons or intents. For example, a user might watch a movie because of its director and star, but choose another due to its genre and producer. Ignoring these fine-grained intents limits the model's ability to capture the full complexity of user preferences and provide meaningful explanations.

  2. Insufficient Exploitation of Relation Dependencies / Relational Paths: GNNs typically aggregate information from neighboring nodes. While some GNNs incorporate KG relations as decay factors or attention weights, they primarily focus on node features and often fail to explicitly model the dependencies and sequential semantics embedded in relational paths (sequences of relations connecting distant entities). This means that the rich structural information present in multi-hop connections within a KG is not fully utilized, leading to a loss of holistic semantic understanding of long-range connectivity.

    The problem is important because recommender systems are ubiquitous and crucial for platforms like e-commerce, social media, and entertainment. Improving their accuracy directly enhances user experience and business metrics. Furthermore, explainability is increasingly vital, as users want to understand why an item is recommended. Addressing the limitations of existing GNNs in modeling intents and relational paths can lead to more accurate, interpretable, and powerful recommender systems. The paper's entry point is to explicitly model these fine-grained user intents and preserve the semantics of relational paths within a GNN framework.

2.2. Main Contributions / Findings

The primary contributions of this paper are:

  1. Introduction of User Intent Modeling: The paper proposes to explicitly model user-item relations at a fine-grained level of intents, departing from the coarse-grained interact-with relation used in prior GNN-based models. Each intent is represented as an attentive combination of KG relations, making its semantics interpretable. An independence constraint is introduced to encourage distinct and meaningful intents, enhancing both model capability and interpretability.

  2. Novel Relational Path-aware Aggregation Scheme: A new information aggregation mechanism for GNNs is devised. Unlike node-based aggregators, this scheme treats relational paths as distinct information channels and recursively integrates relation sequences of long-range connectivity. This allows the model to capture relation dependencies and encode the holistic semantics of paths into user and item representations.

  3. Proposed Model KGIN: The paper introduces Knowledge Graph-based Intent Network (KGIN), an end-to-end model that combines the user intent modeling and relational path-aware aggregation components. KGIN refines collaborative information from an intent graph (IG) and knowledge-aware information from the knowledge graph (KG).

  4. Empirical Validation and Interpretability: Extensive experiments on three benchmark datasets (Amazon-Book, Last-FM, Alibaba-iFashion) demonstrate that KGIN achieves significant improvements over state-of-the-art methods (KGAT, KGNN-LS, CKAN). Furthermore, KGIN provides interpretable explanations for predictions by identifying the most influential intents and relational paths driving a recommendation.

    These findings solve the problems of coarse-grained relational modeling and the neglect of relation dependencies in long-range connectivity, leading to more accurate, nuanced, and explainable recommendations.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand this paper, a reader should be familiar with the following foundational concepts:

  • Recommender Systems: Systems that predict user preferences for items and suggest items that users might like. They are fundamental to many online platforms. The paper focuses on knowledge-aware recommendation, which integrates external knowledge into the recommendation process.
  • Implicit Feedback: A type of user preference signal where users do not explicitly state their likes or dislikes (e.g., through ratings). Instead, preferences are inferred from actions like views, clicks, purchases. This is common in real-world recommender systems.
  • Knowledge Graph (KG): A knowledge graph (KG) is a structured representation of information that describes real-world entities and their interrelations in a graph format. It consists of entities (nodes) and relations (edges), often represented as triplets in the form (h, r, t), where hh is the head entity, rr is the relation, and tt is the tail entity. KGs enrich item profiles with attributes, categories, and external facts, which can be leveraged to improve recommendations. For example, (movie, directed_by, director) or (book, authored_by, author).
  • Graph Neural Networks (GNNs): A class of deep learning methods designed to operate on graph-structured data. GNNs learn representations (embeddings) for nodes by iteratively aggregating information from their neighbors.
    • Information Aggregation: The core idea of GNNs is that a node's representation is updated by combining its previous representation with aggregated information from its neighbors. This process can be repeated over multiple layers to capture information from multi-hop neighbors (nodes further away in the graph).
    • Graph Convolutional Networks (GCNs): A specific type of GNN that uses a convolutional operation on graphs. The representation of a node in the next layer is typically a non-linear transformation of the average of its neighbors' representations (including itself).
  • Embeddings: Low-dimensional vector representations of entities (users, items, relations, entities in a KG) that capture their semantic meaning and relationships. These vectors are learned through neural networks and allow for computations like similarity.
  • Matrix Factorization (MF): A traditional collaborative filtering technique that decomposes the user-item interaction matrix into two lower-rank matrices: one for user embeddings and one for item embeddings. The dot product of a user's embedding and an item's embedding predicts their interaction score.
  • Attention Mechanism: A mechanism that allows a neural network to focus on the most relevant parts of the input when making a prediction. In the context of graphs, attention can be used to assign different weights to neighbors or relations during information aggregation, indicating their relative importance.
  • BPR (Bayesian Personalized Ranking) Loss: A widely used pairwise ranking loss function for implicit feedback recommendation. It optimizes the model such that observed (positive) interactions are ranked higher than unobserved (negative) interactions for any given user. The sigmoid function, σ(x)=11+ex\sigma(x) = \frac{1}{1 + e^{-x}}, is typically used to squash the difference between positive and negative item scores into a probability.
  • Mutual Information (MI): A measure of the statistical dependence between two random variables. It quantifies the amount of information obtained about one random variable by observing the other. Minimizing mutual information between representations can encourage them to be statistically independent.
  • Distance Correlation: A measure of dependence between two random vectors of arbitrary dimension. It is zero if and only if the random vectors are independent. Unlike Pearson correlation, it can capture non-linear dependencies.
    • Distance Covariance (dCov): A measure of the dependence between two random vectors.
    • Distance Variance (dVar): A measure of spread or dispersion for a single random vector, analogous to variance but defined using distances.
  • Hyperparameters: Parameters whose values are set before the learning process begins (e.g., learning rate, embedding size, number of layers, regularization coefficients). They are typically tuned using techniques like grid search.
  • L2 Regularization: A technique used to prevent overfitting in machine learning models by adding a penalty term to the loss function that is proportional to the sum of the squares of the model's weights. This encourages smaller weights and simpler models.

3.2. Previous Works

The paper categorizes previous knowledge-aware recommender systems into four groups:

  1. Embedding-based Methods:

    • Concept: These methods primarily focus on first-order connectivity (direct user-item pairs and KG triplets). They use KG embedding techniques to learn representations (embeddings) for KG entities and relations. These knowledge-aware embeddings are then used as prior or context information to enhance item representations within traditional recommender frameworks, often Matrix Factorization (MF).
    • Examples:
      • CKE (Collaborative Knowledge Base Embedding) [51]: Applies TransE on KG triplets and feeds the resulting knowledge-aware embeddings into MF.
      • TransE (Translating Embeddings for Modeling Multi-relational Data) [3]: A KG embedding model that represents entities and relations as vectors. For a triplet (h, r, t), it tries to ensure that the embedding of the head entity plus the embedding of the relation is approximately equal to the embedding of the tail entity (i.e., eh+eret\mathbf{e}_h + \mathbf{e}_r \approx \mathbf{e}_t). This models relations as translations in the embedding space.
      • TransH [46]: An extension of TransE that addresses issues with many-to-one, one-to-many, and many-to-many relations. Instead of a single relation vector, TransH projects entities onto a relation-specific hyperplane before performing the translation, allowing an entity to have different representations in different relations. For a triplet (h, r, t), it aims for ehwr+eretwr\mathbf{e}_h \cdot \mathbf{w}_r + \mathbf{e}_r \approx \mathbf{e}_t \cdot \mathbf{w}_r, where wr\mathbf{w}_r is the normal vector of the hyperplane for relation rr.
      • KTUP [4]: Uses TransH on user-item interactions and KG triplets simultaneously for joint learning of user preferences and KG completion.
    • Limitation: These methods largely ignore higher-order connectivity and long-range semantics of paths, which limits their ability to capture complex user-item relationships.
  2. Path-based Methods:

    • Concept: These methods explicitly leverage long-range connectivity by extracting paths that connect target user and item nodes via KG entities. These paths are then used to predict user preferences.
    • Examples:
      • RippleNet [36]: Memorizes item representations along paths rooted at each user and uses them to enhance user representations. It propagates user preferences over the KG through ripple effects.
    • Limitations:
      • Brute-force search for paths can be computationally intensive and requires labor-intensive feature engineering for large graphs.
      • Using meta-path patterns requires domain experts to predefine domain-specific patterns, leading to poor transferability across different domains.
  3. Policy-based Methods:

    • Concept: Inspired by reinforcement learning (RL), these methods design RL agents to learn optimal path-finding policies within the KG. The agent learns to navigate the KG to find relevant entities and relations that explain or contribute to a recommendation.
    • Examples:
      • PGPR (Policy-guided Path Reasoning) [49]: Exploits a policy network to explore items of interest for a target user, providing explainable recommendations.
    • Limitations: Sparse reward signals, huge action spaces, and policy gradient-based optimization make RL-based networks challenging to train and converge to stable solutions.
  4. GNN-based Methods:

    • Concept: These methods build upon the information aggregation mechanism of Graph Neural Networks (GNNs). They typically combine user-item interaction graphs with KGs into a heterogeneous graph and apply GNNs to learn node representations (embeddings) that capture multi-hop connectivity.
    • Examples:
      • KGAT (Knowledge Graph Attention Network) [41]: Combines user-item interactions and KG into a holistic heterogeneous graph and applies an attentive neighborhood aggregation mechanism to generate user and item representations. It treats user-item relationships and KG relations as attentive weights in the adjacency matrix.
      • KGNN-LS (Knowledge-aware Graph Neural Networks with Label Smoothness Regularization) [38]: Converts the KG into user-specific graphs and considers user preferences on KG relations and label smoothness during aggregation to generate user-specific item representations. It models relations as decay factors.
      • CKAN (Collaborative Knowledge-aware Attentive Network) [47]: Built upon KGNN-LS, it uses different neighborhood aggregation schemes for the user-item graph and KG separately to obtain user and item embeddings.
      • R-GCN (Relational Graph Convolutional Networks) [27]: Originally for knowledge graph completion, it views different KG relations as distinct channels of information flow when aggregating neighbors. It can be adapted for recommendation by propagating information through these relational channels.
    • Limitation (addressed by KGIN): Most existing GNN-based methods assume only one relation between users and items and fail to explicitly model hidden user intents or relational dependencies in paths.

3.3. Technological Evolution

The evolution of knowledge-aware recommender systems has progressed from merely incorporating KG embeddings as auxiliary features (Embedding-based methods) to explicitly navigating KG paths to find relevant items (Path-based and Policy-based methods). The most recent and powerful trend is the adoption of Graph Neural Networks (GNNs), which inherently capture multi-hop connectivity and learn node representations in an end-to-end fashion.

  • Early Stage (Embedding-based): Focused on injecting KG information into MF or other basic models by pre-training or jointly training KG embeddings. The KG serves as a source of rich feature vectors.

  • Intermediate Stage (Path-based, Policy-based): Recognized the importance of multi-hop reasoning over KGs. Methods sought to discover relational paths between users and items. This improved explainability and captured more complex semantics, but often faced challenges with path enumeration or RL training stability.

  • Current Stage (GNN-based): Leveraging GNNs to implicitly learn path-like features through message passing and neighborhood aggregation. This offers end-to-end learning and better scalability than explicit path enumeration.

    KGIN fits into this evolution by addressing the shortcomings of current GNN-based methods. It pushes the boundary by introducing fine-grained user intent modeling and relational path-aware aggregation, moving beyond simple node-based aggregation and decay factors for relations. It aims to capture richer relational semantics and provide better interpretability within the GNN paradigm.

3.4. Differentiation Analysis

Compared to the main methods in related work, especially other GNN-based models, KGIN introduces key innovations:

  1. Fine-grained User Intent Modeling vs. Coarse-grained Relation:

    • Previous GNNs (KGAT, KGNN-LS, CKAN, R-GCN): Typically treat the user-item interaction as a single, generic interact-with relation. While KGAT uses attentive weights for user-item graph edges, it doesn't decompose this into multiple underlying intents.
    • KGIN: Explicitly models multiple latent intents behind a user-item interaction. Each intent is defined as an attentive combination of KG relations, making its semantics transparent. This fine-grained modeling allows KGIN to capture diverse reasons for user behavior, leading to more nuanced user preferences.
  2. Relational Path-aware Aggregation vs. Node-based Aggregation:

    • Previous GNNs (KGAT, KGNN-LS, CKAN): Employ node-based aggregation schemes, where information is collected from neighboring nodes. KG relations are often used as decay factors or attention weights for neighbors, controlling the influence of a neighbor but not explicitly preserving the semantics of relation sequences or relation dependencies along a path. R-GCN uses relation-specific transformations but still aggregates information primarily from direct neighbors.
    • KGIN: Devises a novel aggregation scheme that views a relational path as an information channel. It recursively integrates the relation sequences (e.g., (r1,r2,,rl)(r_1, r_2, \dots, r_l)) into the node representations. This relational path-aware aggregation explicitly captures relation dependencies and the holistic semantics of multi-hop paths, which is a significant departure from simply weighting neighbor signals.
  3. Independence of Intents:

    • Previous GNNs: Do not have an explicit mechanism to ensure that different latent factors (if any are implicitly learned) are distinct.
    • KGIN: Introduces an independence constraint (using mutual information or distance correlation) among the learned intent embeddings. This ensures that each intent captures unique information about user preferences, leading to better model capacity and interpretability.
  4. Differentiated Aggregation for Intent Graph and Knowledge Graph:

    • KGIN uses distinct aggregation strategies for the intent graph (modeling user-intent-item relations) and the knowledge graph (modeling item-relation-entity facts). This allows for specialized handling of collaborative signals and item knowledge. CKAN also uses different strategies for user-item graph and KG, but without intent modeling.

      In summary, KGIN advances the state-of-the-art by explicitly incorporating user intents and precisely modeling relational paths, moving beyond the limitations of previous GNN-based models in capturing complex relational semantics and providing interpretable recommendations.

4. Methodology

The proposed Knowledge Graph-based Intent Network (KGIN) aims to enhance recommender systems by explicitly modeling user intents and leveraging relational paths within Knowledge Graphs (KGs). The framework is composed of two primary components: User Intent Modeling and Relational Path-aware Aggregation.

4.1. Principles

The core idea behind KGIN is to move beyond the coarse-grained assumption of a single interact-with relation between users and items in GNN-based recommender systems. Instead, it hypothesizes that user behaviors are driven by multiple underlying intents, which can be linked to combinations of KG relations. Simultaneously, to effectively utilize the rich structural information in KGs, KGIN emphasizes preserving the semantics of relation dependencies and sequences within multi-hop relational paths, rather than just aggregating information from individual neighboring nodes.

The theoretical basis and intuition are:

  1. User Intents: Users often have diverse reasons for interacting with items. By explicitly modeling these intents (e.g., preference for a certain director-genre combination, or a specific star-partner aspect), the model can capture finer-grained preferences, leading to more accurate and personalized recommendations. Associating these intents with KG relations also provides a natural way to interpret why a user might like an item.
  2. Relational Path Semantics: Knowledge Graphs contain valuable long-range connectivity that can reveal complex relationships. A path like user>item>property>valueuser -> item -> property -> value provides more semantic information than just knowing that user and value are somehow connected. By treating relation sequences as distinct information channels and integrating them, KGIN aims to encode this holistic path semantics into embeddings, thereby enriching user and item representations.
  3. Independence for Interpretability: To ensure that each learned intent offers a unique perspective on user behavior and avoids redundancy, KGIN incorporates an independence constraint. This encourages diverse and distinct intent representations, which is crucial for interpretability and model capacity.

4.2. Core Methodology In-depth (Layer by Layer)

The KGIN framework (Figure 3) comprises two key components: User Intent Modeling and Relational Path-aware Aggregation. The model ultimately learns high-quality representations for users and items, which are then used for prediction.

The following figure (Figure 3 from the original paper) illustrates the overall structure of the proposed KGIN framework:

Figure 3: Illustration of the proposed KGIN framework. Best viewed in color. 该图像是论文中图3的示意图,展示了提出的KGIN框架的整体结构,包括用户意图建模、意图表示、基于意图图的用户表示、基于知识图的实体表示,以及最终将多种表示融合得到用户和物品的嵌入。

4.2.1. User Intent Modeling

Existing GNN-based studies often simplify user-item relations to a single interact-with type. KGIN challenges this by asserting that user behaviors are influenced by multiple intents. An intent is defined as the reason for a user's choice, reflecting commonalities in user behaviors. For instance, in movie recommendations, intents could be combinations of star and partner, or director and genre.

The set of shared intents across all users is denoted by P\mathcal{P}. Each user-item interaction (u, i) is decomposed into multiple intent-specific interactions {(u,p,i)pP}\{ (u, p, i) | p \in \mathcal{P} \}. This transformation results in an intent graph (IG), which is a heterogeneous graph where user-item edges are typed by intents.

4.2.1.1. Representation Learning of Intents

While intents can be represented as latent vectors, their direct semantics might be opaque. To make them interpretable, KGIN associates each intent pPp \in \mathcal{P} with a distribution over KG relations. This means an intent embedding is formed as an attentive combination of relation embeddings, where relations deemed more important for that intent receive higher attribution scores.

The intent embedding ep\mathbf{e}_p for an intent pp is calculated as: ep=rRα(r,p)er \mathbf{e}_p = \sum_{r \in \mathcal{R}} \alpha(r, p) \mathbf{e}_r Here,

  • epRd\mathbf{e}_p \in \mathbb{R}^d is the embedding vector for intent pp.

  • erRd\mathbf{e}_r \in \mathbb{R}^d is the ID embedding (initial vector representation) of a KG relation rr. ID embeddings are basic, learned vector representations for entities or relations, typically initialized randomly and updated during training.

  • R\mathcal{R} is the set of all KG relations.

  • α(r,p)\alpha(r, p) is an attention score that quantifies the importance of relation rr for intent pp. A higher score means rr contributes more to the semantic definition of pp.

    The attention score α(r,p)\alpha(r, p) is calculated using a softmax function to ensure that the weights sum to 1 for each intent: α(r,p)=exp(wrp)rRexp(wrp) \alpha(r, p) = \frac{\exp(w_{rp})}{\sum_{r' \in \mathcal{R}} \exp(w_{r'p})} Where,

  • wrpw_{rp} is a trainable weight specific to a particular relation rr and intent pp. These weights are learned during the training process, indicating how strongly each relation contributes to defining an intent. The attention mechanism here is not personalized per user but defines the common patterns of intents across all users.

4.2.1.2. Independence Modeling of Intents

To ensure that different intents carry distinct and informative perspectives on user preference, KGIN introduces an independence modeling module. This module guides the learning process to encourage divergence among intent representations, improving both model capacity and explainability. If intents were highly correlated, they would provide redundant information.

Two implementations for this module are offered:

  • Mutual Information: This approach aims to minimize the mutual information between the representations of any two different intents. This aligns with contrastive learning principles, where distinct entities are pushed apart in the embedding space. The independence loss using mutual information is formulated as: LIND=pPlogexp(s(ep,eϕ)/τ)pPexp(s(eϕ,ep)/τ) \mathcal{L}_{\mathrm{IND}} = \sum_{p \in \mathcal{P}} -\log \frac{\exp{(s(\mathbf{e}_p, \mathbf{e}_{\phi}) / \tau)}}{\sum_{p' \in \mathcal{P}} {\exp{(s(\mathbf{e}_{\phi}, \mathbf{e}_{p'}) / \tau)}}} Where,

    • s()s(\cdot) is a similarity function measuring the association between two intent representations. In KGIN, it is set to the cosine similarity function, which computes the cosine of the angle between two vectors. Cosine similarity is commonly used to measure how similar two vectors are, regardless of their magnitude.
    • eϕ\mathbf{e}_{\phi} represents an anchor intent or positive sample for intent pp, while ep\mathbf{e}_{p'} for ppp' \neq p represents negative samples. This formulation is typical for contrastive learning where a positive pair is distinguished from negative pairs.
    • τ\tau is a hyper-parameter representing the temperature in the softmax function. A smaller τ\tau makes the softmax output sharper, emphasizing larger similarities more strongly.
  • Distance Correlation: This method minimizes the distance correlation between intent representations. Distance correlation measures both linear and non-linear associations, and its coefficient is zero if and only if the variables are independent. The independence loss using distance correlation is formulated as: LIND=p,pP, ppdCor(ep,ep) \mathcal{L}_{\mathrm{IND}} = \sum_{p, p' \in \mathcal{P}, \ p \neq p'} dCor(\mathbf{e}_p, \mathbf{e}_{p'}) Where,

    • dCor()dCor(\cdot) is the distance correlation between intent pp and intent pp'. The distance correlation is calculated as: dCor(ep,ep)=dCov(ep,ep)dVar(ep)dVar(ep) dCor(\mathbf{e}_p, \mathbf{e}_{p'}) = \frac{dCov(\mathbf{e}_p, \mathbf{e}_{p'})}{\sqrt{dVar(\mathbf{e}_p) \cdot dVar(\mathbf{e}_{p'})}} Where,
    • dCov()dCov(\cdot) is the distance covariance of two representations ep\mathbf{e}_p and ep\mathbf{e}_{p'}.
    • dVar()dVar(\cdot) is the distance variance of each intent representation. Minimizing this loss encourages the intent embeddings to be statistically independent, thus making them more distinct and interpretable. The paper notes that both implementations yield similar trends and performance, and reports results using the mutual information based loss (Equation (3)).

4.2.2. Relational Path-aware Aggregation

This component focuses on learning user and item representations using a GNN-based paradigm, but with a novel relational path-aware aggregation scheme. The authors argue that previous node-based aggregation methods in GNNs are limited because they don't explicitly distinguish the paths from which information originates and fail to preserve relation dependencies and sequences.

The following figure (Figure 2 from the original paper) provides a visual comparison between node-based and relational path-aware aggregation schemes:

Figure 2: An example of the node-based and relational pathaware aggregation schemes, where a (dashed or solid) arrow is an information flow among nodes. Best viewed in color. 该图像是论文中的示意图,展示了基于节点聚合和关系路径感知聚合的两种信息聚合方案。左侧为节点邻居聚合,右侧为关系路径邻居聚合,箭头表示信息流动,关系路径序列用红色表示。

The figure shows that node-based aggregation (left) simply mixes signals from all neighbors of different hop counts, while relational path-aware aggregation (right) explicitly considers the sequence of relations (red paths) to preserve the semantic context.

KGIN sets different aggregation strategies for the intent graph (IG) (user-intent-item relationships) and the knowledge graph (KG) (item-relation-entity relationships) to better distill behavioral patterns and item relatedness respectively.

4.2.2.1. Aggregation Layer over Intent Graph

The intent graph (IG) captures collaborative information at a finer-grained level of intents. For a user uu, KGIN uses her intent-aware history Nu={(p,i)(u,p,i)C}N_u = \{ (p, i) | (u, p, i) \in C \} (where CC is the set of user-intent-item triplets derived from interactions) to represent the first-order connectivity around uu.

The representation of user uu after the first layer of aggregation, eu(1)\mathbf{e}_u^{(1)}, is created by integrating intent-aware information from historical items: eu(1)=fIG({(eu(0),ep,ei(0))(p,i)Nu}) \mathbf{e}_u^{(1)} = f_{\mathrm{IG}} \Big( \big\{ (\mathbf{e}_u^{(0)}, \mathbf{e}_p, \mathbf{e}_i^{(0)}) | (p, i) \in N_u \big\} \Big) Here,

  • eu(1)Rd\mathbf{e}_u^{(1)} \in \mathbb{R}^d is the user uu's representation after the first aggregation layer.
  • fIG()f_{\mathrm{IG}}(\cdot) is the aggregator function for the intent graph.
  • eu(0)\mathbf{e}_u^{(0)} is the initial ID embedding of user uu.
  • ep\mathbf{e}_p is the embedding of intent pp, as defined in Section 4.2.1.1.
  • ei(0)\mathbf{e}_i^{(0)} is the initial ID embedding of item ii. The set {(eu(0),ep,ei(0))(p,i)Nu}\{ (\mathbf{e}_u^{(0)}, \mathbf{e}_p, \mathbf{e}_i^{(0)}) | (p, i) \in N_u \} represents the set of intent-aware connections for user uu.

The specific implementation of fIG()f_{\mathrm{IG}}(\cdot) is given as: eu(1)=1Nu(p,i)Nuβ(u,p)epei(0) \mathbf{e}_u^{(1)} = \frac{1}{|N_u|} \sum_{(p, i) \in N_u} \beta(u, p) \mathbf{e}_p \odot \mathbf{e}_i^{(0)} Where,

  • Nu|N_u| is the number of intent-item pairs in user uu's history.

  • \odot denotes the element-wise product (Hadamard product). This operation allows the intent embedding to modulate or gate the item embedding, effectively creating an intent-specific message from item ii.

  • β(u,p)\beta(u, p) is an attention score that differentiates the importance of intent pp for user uu. This makes the intent contribution personalized.

    The attention score β(u,p)\beta(u, p) is calculated as: β(u,p)=exp(epeu(0))pPexp(epeu(0)) \beta(u, p) = \frac{\exp(\mathbf{e}_p^\top \mathbf{e}_u^{(0)})}{\sum_{p' \in \mathcal{P}} \exp(\mathbf{e}_{p'}^\top \mathbf{e}_u^{(0)})} Where,

  • epeu(0)\mathbf{e}_p^\top \mathbf{e}_u^{(0)} computes the dot product between the intent embedding and the user's initial ID embedding, indicating their compatibility or relevance.

  • The softmax function normalizes these scores across all intents for user uu. This personalized attention ensures that specific intents are more salient for a given user. The use of element-wise product epei(0)\mathbf{e}_p \odot \mathbf{e}_i^{(0)} explicitly encodes the first-order intent-aware information into user representations.

4.2.2.2. Aggregation Layer over Knowledge Graph

For items, KGIN aggregates information from the Knowledge Graph (KG). An item ii can be described by its attributes and connections to other KG entities. Ni={(r,v)(i,r,v)G}N_i = \{ (r, v) | (i, r, v) \in \mathcal{G} \} represents the attributes and first-order connectivity around item ii within the KG G\mathcal{G}.

The representation of item ii after the first layer of aggregation, ei(1)\mathbf{e}_i^{(1)}, is generated by integrating relation-aware information from connected entities: ei(1)=fKG({(ei(0),er,ev(0))(r,v)Ni}) \mathbf{e}_i^{(1)} = f_{\mathrm{KG}} \Big( \{ (\mathbf{e}_i^{(0)}, \mathbf{e}_r, \mathbf{e}_v^{(0)}) | (r, v) \in N_i \} \Big) Here,

  • ei(1)Rd\mathbf{e}_i^{(1)} \in \mathbb{R}^d is the item ii's representation after the first aggregation layer from the KG.
  • fKG()f_{\mathrm{KG}}(\cdot) is the aggregator function for the knowledge graph.
  • ei(0)\mathbf{e}_i^{(0)} is the initial ID embedding of item ii.
  • er\mathbf{e}_r is the ID embedding of relation rr.
  • ev(0)\mathbf{e}_v^{(0)} is the initial ID embedding of KG entity vv. The set {(ei(0),er,ev(0))(r,v)Ni}\{ (\mathbf{e}_i^{(0)}, \mathbf{e}_r, \mathbf{e}_v^{(0)}) | (r, v) \in N_i \} represents the relation-aware connections for item ii.

The specific implementation of fKG()f_{\mathrm{KG}}(\cdot) accounts for the relational context, as KG entities can have different semantics in different relations (e.g., Quentin Tarantino as director vs. star). Instead of using attention mechanisms as decay factors (like previous works), KGIN models the relation as a transformation operator: ei(1)=1Ni(r,v)Nierev(0) \mathbf{e}_i^{(1)} = \frac{1}{|N_i|} \sum_{(r, v) \in N_i} \mathbf{e}_r \odot \mathbf{e}_v^{(0)} Where,

  • Ni|N_i| is the number of relation-entity pairs in item ii's attributes.
  • erev(0)\mathbf{e}_r \odot \mathbf{e}_v^{(0)} creates a relational message. The element-wise product here means that the relation rr acts as a projection or rotation operator (similar to TransR or RotatE models in KG embedding literature [22, 30]). This allows the message to explicitly capture the meaning that the relation rr carries when connecting item ii to entity vv. This process is applied analogously to obtain the representation ev(1)\mathbf{e}_v^{(1)} for any KG entity vv.

4.2.2.3. Capturing Relational Paths

To capture higher-order connectivity and long-range signals, KGIN recursively stacks multiple aggregation layers. The representations of user uu and item ii after LL layers are formulated recursively: eu(l)=fIG({(eu(l1),ep,ei(l1))(p,i)Nu})ei(l)=fKG({(ei(l1),er,ev(l1))(r,v)Ni}) \begin{array}{r} \mathbf{e}_u^{(l)} = f_{\mathrm{IG}} \Big( \big\{ (\mathbf{e}_u^{(l-1)}, \mathbf{e}_p, \mathbf{e}_i^{(l-1)}) | (p, i) \in N_u \big\} \Big) \\ \mathbf{e}_i^{(l)} = f_{\mathrm{KG}} \Big( \{ (\mathbf{e}_i^{(l-1)}, \mathbf{e}_r, \mathbf{e}_v^{(l-1)}) | (r, v) \in N_i \} \Big) \end{array} Where,

  • eu(l)\mathbf{e}_u^{(l)}, ei(l)\mathbf{e}_i^{(l)}, and ev(l)\mathbf{e}_v^{(l)} denote the representations of user uu, item ii, and entity vv at layer ll, respectively.

  • These representations memorize the relational signals propagated from their (l-1)-hop neighbors.

  • fIGf_{\mathrm{IG}} and fKGf_{\mathrm{KG}} are the aggregator functions defined previously, but now operating on the representations from the previous layer.

    Due to the specific element-wise product structure in the aggregators, the representation ei(l)\mathbf{e}_i^{(l)} (and similarly for users) can be analytically rewritten to reveal how relational paths are captured. For an ll-hop path s=ir1s1r2sl1rlsls = i \xrightarrow{r_1} s_1 \xrightarrow{r_2} \cdots s_{l-1} \xrightarrow{r_l} s_l rooted at item ii, its relational path is the sequence of relations (r1,r2,,rl)(r_1, r_2, \dots, r_l).

The representation ei(l)\mathbf{e}_i^{(l)} can be expressed as: ei(l)=sNiler1Ns1er2Ns2erlNslesl(0) \mathbf{e}_i^{(l)} = \sum_{s \in \mathcal{N}_i^l} \frac{\mathbf{e}_{r_1}}{|N_{s_1}|} \odot \frac{\mathbf{e}_{r_2}}{|N_{s_2}|} \odot \dots \odot \frac{\mathbf{e}_{r_l}}{|N_{s_l}|} \odot \mathbf{e}_{s_l}^{(0)} Where,

  • Nil\mathcal{N}_i^l is the set of all ll-hop paths starting from item ii.
  • sks_k is the kk-th entity in the path ss.
  • Nsk|N_{s_k}| is the degree of node sks_k (number of neighbors for entity sks_k). The division by degree acts as a normalization factor, similar to mean aggregation in GNNs.
  • The element-wise product \odot across the relation embeddings er1,,erl\mathbf{e}_{r_1}, \dots, \mathbf{e}_{r_l} explicitly models the interactions among relations along the path. This means the representation directly incorporates the holistic semantics of the relational path itself, not just the features of the end nodes or aggregated neighbor features. This is a crucial distinction from previous GNNs that primarily focused on node features and used relations only as decay factors.

4.2.3. Model Prediction

After LL layers of aggregation, KGIN obtains representations for user uu and item ii at each layer l{0,,L}l \in \{0, \dots, L\}. These layer-specific representations are then summed up to form the final representations: eu=eu(0)++eu(L),ei=ei(0)++ei(L) \mathbf{e}_u^* = \mathbf{e}_u^{(0)} + \cdot \cdot \cdot + \mathbf{e}_u^{(L)}, \qquad \mathbf{e}_i^* = \mathbf{e}_i^{(0)} + \cdot \cdot \cdot + \mathbf{e}_i^{(L)} Where,

  • eu\mathbf{e}_u^* and ei\mathbf{e}_i^* are the final embeddings for user uu and item ii, respectively. This summation aggregates information from different hop-distances, capturing both local and long-range connectivity into the final user and item embeddings. The intent-aware relationships and KG relation dependencies from paths are encoded within these final representations.

Finally, the prediction score y^ui\hat{y}_{ui} (how likely user uu would adopt item ii) is computed using the inner product of their final embeddings: y^ui=euei \hat{y}_{ui} = \mathbf{e}_u^{*^\top} \mathbf{e}_i^* The inner product (dot product) is a common way to measure the similarity or compatibility between user and item embeddings in recommender systems. A higher score indicates a stronger predicted preference.

4.2.4. Model Optimization

KGIN uses the pairwise Bayesian Personalized Ranking (BPR) loss [26] for optimization. BPR loss is designed for implicit feedback and aims to ensure that a user's observed (positive) items are ranked higher than unobserved (negative) items.

The BPR loss LBPR\mathcal{L}_{\mathrm{BPR}} is defined as: LBPR=(u,i,j)Olnσ(y^uiy^uj) \mathcal{L}_{\mathrm{BPR}} = \sum_{(u, i, j) \in \mathcal{O}} -\ln \sigma(\hat{y}_{ui} - \hat{y}_{uj}) Where,

  • O={(u,i,j)(u,i)O+,(u,j)O}\mathcal{O} = \{ (u, i, j) | (u, i) \in O^+, (u, j) \in O^- \} is the training dataset.

    • O+O^+ is the set of observed feedback (user uu interacted with item ii).
    • OO^- is the set of unobserved counterparts (user uu did not interact with item jj, which is sampled as a negative item).
  • y^ui\hat{y}_{ui} is the predicted score for the positive item ii for user uu.

  • y^uj\hat{y}_{uj} is the predicted score for the negative item jj for user uu.

  • σ()\sigma(\cdot) is the sigmoid function, which squashes its input into a range between 0 and 1. The BPR loss aims to maximize σ(y^uiy^uj)\sigma(\hat{y}_{ui} - \hat{y}_{uj}), meaning it wants y^ui\hat{y}_{ui} to be significantly larger than y^uj\hat{y}_{uj}.

    The overall objective function for KGIN combines the BPR loss with the independence loss and an L2 regularization term: LKGIN=LBPR+λ1LIND+λ2Θ22 \mathcal{L}_{\mathrm{KGIN}} = \mathcal{L}_{\mathrm{BPR}} + \lambda_1 \mathcal{L}_{\mathrm{IND}} + \lambda_2 \left\| \Theta \right\|_2^2 Where,

  • LKGIN\mathcal{L}_{\mathrm{KGIN}} is the total loss function to be minimized.

  • LBPR\mathcal{L}_{\mathrm{BPR}} is the Bayesian Personalized Ranking loss.

  • λ1\lambda_1 is a hyperparameter controlling the strength of the independence loss.

  • LIND\mathcal{L}_{\mathrm{IND}} is the independence loss (e.g., using mutual information from Equation (3)).

  • λ2\lambda_2 is a hyperparameter controlling the strength of the L2 regularization.

  • Θ22\left\| \Theta \right\|_2^2 is the L2 regularization term, calculated as the squared L2 norm of all trainable model parameters in Θ\Theta.

  • Θ={eu(0),ev(0),er,ep,wuU,vV,pP}\Theta = \{ \mathbf{e}_u^{(0)}, \mathbf{e}_v^{(0)}, \mathbf{e}_r, \mathbf{e}_p, \mathbf{w} | u \in \mathcal{U}, v \in \mathcal{V}, p \in \mathcal{P} \} is the set of all trainable parameters in the model, including initial ID embeddings for users uu, KG entities vv, relations rr, intent embeddings ep\mathbf{e}_p, and the attention weights w\mathbf{w} for intent definition.

4.2.5. Model Analysis

4.2.5.1. Model Size

The model parameters of KGIN primarily consist of:

  1. ID embeddings: eu(0)\mathbf{e}_u^{(0)} for users, ev(0)\mathbf{e}_v^{(0)} for KG entities (which include items since IV\mathcal{I} \subset \mathcal{V}), and er\mathbf{e}_r for KG relations.
  2. Intent embeddings: ep\mathbf{e}_p for each intent.
  3. Attention weights: w\mathbf{w} (specifically, the wrpw_{rp} terms used in defining intent embeddings). Notably, KGIN discards nonlinear activation functions and feature transformation matrices in its aggregation scheme. This design choice, supported by recent studies [48] (e.g., LightGCN), simplifies the model and can make GNNs easier to train, as non-linearities can sometimes hinder training stability.

4.2.5.2. Time Complexity

The time complexity of KGIN's training mainly stems from:

  1. User representation computation (IG aggregation): O(LCd)O(L \cdot |C| \cdot d)
    • LL: number of aggregation layers.
    • C|C|: number of intent-aware triplets in the intent graph (effectively the number of user-item interactions times the number of intents, in the worst case if each interaction has all intents).
    • dd: embedding size.
  2. Entity representation computation (KG aggregation): O(LGd)O(L \cdot |\mathcal{G}| \cdot d)
    • G|\mathcal{G}|: number of KG triplets.
  3. Independence modeling: O(P(P1)/2)O(|\mathcal{P}| \cdot (|\mathcal{P}| - 1) / 2)
    • P|\mathcal{P}|: number of user intents. This is for calculating distance correlation between all unique pairs of intents. For mutual information based loss (Eq. 3), if eϕ\mathbf{e}_{\phi} is chosen as ep\mathbf{e}_p, it would be O(P2)O(|\mathcal{P}|^2) for calculating dot products between all intent pairs.

      The total time complexity for one training epoch is approximately: O(LCd+LGd+P(P1)/2) O(L \cdot |C| \cdot d + L \cdot |\mathcal{G}| \cdot d + |\mathcal{P}| \cdot (|\mathcal{P}| - 1) / 2) The authors state that KGIN has comparable complexity to KGAT and CKAN under the same experimental settings.

5. Experimental Setup

5.1. Datasets

The experiments are conducted on three benchmark datasets, covering different domains (books, music, fashion outfits):

  1. Amazon-Book: Released by KGAT [41]. This dataset represents book recommendations.

  2. Last-FM: Also released by KGAT [41]. This dataset focuses on music recommendations.

  3. Alibaba-iFashion: Introduced by Chen et al. [8]. This dataset is specific to fashion outfit recommendations, where outfits are items and fashion staffs (e.g., tops, bottoms) constitute their KG attributes.

    To ensure data quality and manageability, the following preprocessing steps were applied:

  • 10-core setting: Users and items with fewer than ten interactions were discarded.

  • KG entity filtering: KG entities involved in fewer than ten triplets were filtered out.

  • Inverse relations were constructed for all canonical relations, effectively doubling the relations and triplets in the KG for most models.

    The following are the results from Table 1 of the original paper:

    Amazon-Book Last-FM Alibaba-iFashion
    User-Item Interaction #Users 70,679 23,566 114,737
    #Items 24,915 48,123 30,040
    #Interactions 847,733 3,034,796 1,781,093
    Knowledge Graph #Entities 88,572 58,266 59,156
    #Relations 39 9 51
    #Triplets 2,557,746 464,567 279,155

Dataset Characteristics and Choice:

  • Amazon-Book: A relatively large dataset with a moderate number of users and items, and a substantial KG with 39 relations and over 2.5 million triplets. This dataset provides rich KG information for testing knowledge-aware models.

  • Last-FM: Fewer users but more items compared to Amazon-Book, with a very high number of interactions. Its KG is smaller in terms of entities and triplets, and notably has only 9 relations. This allows testing model performance on datasets with varying KG densities and relation richness.

  • Alibaba-iFashion: The largest in terms of users and a significant number of items and interactions. Its KG is relatively smaller in terms of triplets compared to Amazon-Book, but has a high number of relations (51). The specific nature of outfit recommendation (outfit includes staff, staff has categories) suggests a more direct and possibly shallower KG structure.

    These datasets are well-suited for validating knowledge-aware recommender systems because they combine explicit user-item interactions with rich knowledge graph data, allowing for evaluation of both recommendation accuracy and how effectively KG information is leveraged.

Data Partitioning: Following prior studies [41, 45], the same data partition strategy is used:

  • For training, each observed user-item interaction is considered a positive instance.
  • For each positive instance, an item that the user did not adopt (i.e., an unobserved item) is randomly sampled to serve as a negative instance. This forms (user,positive_item,negative_item)(user, positive\_item, negative\_item) triplets for pairwise ranking loss.

5.2. Evaluation Metrics

The all-ranking strategy [20] is used for evaluation. This means for each user in the test set, all items they have not interacted with are treated as potential negative items, and the relevant items from the test set are treated as positive items. All these items are ranked based on the model's prediction scores.

The performance of top- Krecommendationisevaluatedusingrecall@Kandndcg@K,with recommendation` is evaluated using `recall@K` and `ndcg@K`, with K$ defaulting to 20. The average metrics across all users in the testing set are reported.

  1. Recall@K:

    • Conceptual Definition: Recall@K measures the proportion of relevant items (items the user actually interacted with in the test set) that are successfully retrieved within the top KK recommendations. It focuses on how many of the truly relevant items the model manages to recommend.
    • Mathematical Formula: Recall@K=1UuU{recommended items in top K for u}{relevant items for u}{relevant items for u} \mathrm{Recall@K} = \frac{1}{|\mathcal{U}|} \sum_{u \in \mathcal{U}} \frac{|\{\text{recommended items in top K for u}\} \cap \{\text{relevant items for u}\}|}{|\{\text{relevant items for u}\}|}
    • Symbol Explanation:
      • U|\mathcal{U}|: Total number of users in the test set.
      • uu: A specific user.
      • KK: The number of top recommendations considered.
      • {recommended items in top K for u}\{\text{recommended items in top K for u}\}: The set of KK items recommended by the model for user uu.
      • {relevant items for u}\{\text{relevant items for u}\}: The set of items user uu actually interacted with in the test set (positive items).
  2. NDCG@K (Normalized Discounted Cumulative Gain at K):

    • Conceptual Definition: NDCG@K is a measure of ranking quality that takes into account the position of relevant items in the ranked list. It assigns higher scores to relevant items that appear higher in the list and penalizes relevant items that appear lower. It is "normalized" by comparing it to the ideal DCG (where all relevant items are perfectly ranked at the top).
    • Mathematical Formula: NDCG@K=1UuUDCG@KuIDCG@Ku \mathrm{NDCG@K} = \frac{1}{|\mathcal{U}|} \sum_{u \in \mathcal{U}} \frac{\mathrm{DCG@K}_u}{\mathrm{IDCG@K}_u} Where DCG@Ku\mathrm{DCG@K}_u (Discounted Cumulative Gain for user uu) is: DCG@Ku=p=1Krelplog2(p+1) \mathrm{DCG@K}_u = \sum_{p=1}^{K} \frac{\mathrm{rel}_p}{\log_2(p+1)} And IDCG@Ku\mathrm{IDCG@K}_u (Ideal Discounted Cumulative Gain for user uu) is: IDCG@Ku=p=1RELu1log2(p+1) \mathrm{IDCG@K}_u = \sum_{p=1}^{|\mathrm{REL}_u|} \frac{1}{\log_2(p+1)}
    • Symbol Explanation:
      • U|\mathcal{U}|: Total number of users in the test set.
      • uu: A specific user.
      • KK: The number of top recommendations considered.
      • relp\mathrm{rel}_p: The relevance score of the item at position pp in the recommended list. For implicit feedback, this is typically 1 if the item is relevant (interacted with in test set) and 0 otherwise.
      • pp: The position in the ranked list (starting from 1).
      • RELu|\mathrm{REL}_u|: The number of relevant items for user uu in the test set. IDCG@Ku\mathrm{IDCG@K}_u sums up to RELu|\mathrm{REL}_u| or KK, whichever is smaller, considering all relevant items are ranked at the top.

5.3. Baselines

The proposed KGIN model is compared against several state-of-the-art methods, categorized by their approach:

  1. KG-free:

    • MF (Matrix Factorization) [26]: A classic collaborative filtering model that learns ID embeddings for users and items and predicts interactions via their inner product. It serves as a baseline that uses no KG information.
  2. Embedding-based:

    • CKE (Collaborative Knowledge Base Embedding) [51]: A representative model that incorporates KG embeddings into MF. It uses TransR [22] (or TransE [3] as a base) to learn KG entity embeddings and uses them to supplement item representations within the MF framework. Relations are primarily used as constraints for KG embedding learning.
  3. GNN-based:

    • KGNN-LS (Knowledge-aware Graph Neural Networks with Label Smoothness Regularization) [38]: Converts the KG into user-specific graphs. It considers user preference on KG relations and label smoothness during information aggregation to generate user-specific item representations. It models relations mainly as decay factors.

    • KGAT (Knowledge Graph Attention Network) [41]: A state-of-the-art GNN-based recommender. It applies an attentive neighborhood aggregation mechanism on a holistic graph (combining KG and user-item graph) to generate user and item representations. User-item relationships and KG relations serve as attentive weights in the adjacency matrix.

    • CKAN (Collaborative Knowledge-aware Attentive Network) [47]: Builds upon KGNN-LS. It utilizes different neighborhood aggregation schemes on the user-item graph and KG respectively to obtain user and item embeddings.

    • R-GCN (Relational Graph Convolutional Networks) [27]: A GNN originally for knowledge graph completion. It views various KG relations as distinct channels of information flow for neighbor aggregation. It is adapted here for the recommendation task.

      These baselines are chosen to represent different advancements in knowledge-aware recommendation, ranging from basic MF to embedding-based approaches and various GNN-based models, providing a comprehensive comparison for KGIN.

5.4. Parameter Settings

The implementation of KGIN is in PyTorch. To ensure a fair comparison across all methods, several common settings are fixed:

  • Embedding size (dd): Fixed at 64.

  • Optimizer: Adam [18].

  • Batch size: 1024.

    A grid search is performed to find optimal settings for each method:

  • Learning rate (ρ\rho): Tuned in {104,103,102}\{10^{-4}, 10^{-3}, 10^{-2}\}.

  • Coefficients of additional constraints (λ\lambda values): Tuned in {105,104,,101}\{10^{-5}, 10^{-4}, \dots, 10^{-1}\} (e.g., L2 regularization for all, independence modeling for KGIN, TransR for CKE and KGAT, label smoothness for KGNN-LS).

  • Number of GNN layers (LL): Tuned in {1,2,3}\{1, 2, 3\} for GNN-based methods.

    Specific settings for baselines:

  • KGNN-LS and CKAN: Neighborhood size set to 16, batch size set to 128.

  • Model initialization: Parameters initialized with Xavier [11].

  • KGAT: Uses pre-trained ID embeddings from MF as initialization.

    For KGIN, the paper notes that using Mutual Information (Equation (3)) and Distance Correlation (Equation (4)) for independence modeling yield similar performance, so results using Mutual Information are reported. Unless otherwise specified, the default settings for KGIN are:

  • Number of user intents (P|\mathcal{P}|): 4.

  • Number of relational path aggregation layers (LL): 3. The notation KGIN-3 denotes the model with three aggregation layers.

The following are the results from Table 6 of the original paper:

ρ\rho dd LL P|\mathcal{P}| λ1\lambda_1 λ2\lambda_2
Amazon-Book 10410^{-4} 64 3 4 10510^{-5} 10510^{-5}
Last-FM 10410^{-4} 64 3 4 10410^{-4} 10510^{-5}
Alibaba-iFashion 10410^{-4} 64 3 4 10410^{-4} 10510^{-5}

These parameters are crucial for reproducibility, and the authors have provided their code and settings.

6. Results & Analysis

6.1. Core Results Analysis

The experimental results demonstrate KGIN's effectiveness compared to state-of-the-art knowledge-aware recommender models. The performance is measured using recall@20 and ndcg@20.

The following are the results from Table 2 of the original paper:

| | | Amazon-Book | | Last-FM | | Alibaba-iFashion | | :------------ | :----- | :---------- | :---------- | :---------- | :---------- | :---------------- | :---------- | | | recall | ndcg | recall | ndcg | recall | ndcg | MF | | 0.1300 | 0.0678 | 0.0724 | 0.0617 | 0.1095 | 0.0670 | CKE | | 0.1342 | 0.0698 | 0.0732 | 0.0630 | 0.1103 | 0.0676 | KGAT | | 0.1487 | 0.0799 | 0.0873 | 0.0744 | 0.1030 | 0.0627 | KGNN-LS | | 0.1362 | 0.0560 | 0.0880 | 0.0642 | 0.1039 | 0.0557 | CKAN | | 0.1442 | 0.0698 | 0.0812 | 0.0660 | 0.0970 | 0.0509 | R-GCN | | 0.1220 | 0.0646 | 0.0743 | 0.0631 | 0.0860 | 0.0515 | KGIN-3 | | 0.1687* | 0.0915* | 0.0978* | 0.0848* | 0.1147* | 0.0716* | %Imp. | | 13.44% | 14.51% | 11.13% | 13.97% | 3.98% | 5.91%

Key Observations and Analysis:

  1. KGIN's Superiority: KGIN-3 consistently outperforms all baseline models across all three datasets and both metrics (recall@20 and ndcg@20). The improvements are significant, especially in ndcg@20, with 14.51% on Amazon-Book, 13.97% on Last-FM, and 5.91% on Alibaba-iFashion over the strongest baselines. This confirms the effectiveness and rationality of KGIN's design.

    • Reasoning: The authors attribute this to KGIN's relational modeling innovations:
      • User Intent Modeling: By uncovering user intents, KGIN better characterizes user-item relationships, leading to more powerful and nuanced user and item representations. Baselines, by ignoring hidden user intents, treat user-item edges as a homogeneous channel.
      • Relational Path Aggregation: KGIN's ability to preserve the holistic semantics of paths and collect more informative signals from KG (compared to node-based GNNs like KGAT, CKAN, KGNN-LS) contributes significantly.
      • Differentiated Aggregation: Applying distinct aggregation schemes to the intent graph (IG) and knowledge graph (KG) allows KGIN to effectively encode both collaborative signals and item knowledge.
  2. Impact of KG Information:

    • CKE (embedding-based) generally performs better than MF (KG-free), indicating that incorporating KG embeddings does improve recommendations. This aligns with previous research on the value of side information.
    • The GNN-based methods (KGAT, KGNN-LS, CKAN) generally outperform CKE on Amazon-Book and Last-FM, suggesting the importance of long-range connectivity modeling via GNNs.
  3. Dataset-Specific Performance:

    • The improvement of KGIN on Amazon-Book is more substantial than on Alibaba-iFashion. This is explained by Amazon-Book having denser and richer interaction and KG data. The KG in Amazon-Book is extracted from Freebase and contains diverse relations, allowing KGIN to fully exploit long-range connectivity.
    • In contrast, Alibaba-iFashion's KG is dominated by first-order connectivity (e.g., outfit-includes-staff). This suggests that KGIN's strengths in leveraging long-range paths are particularly beneficial in KGs with richer multi-hop structures.
  4. Baseline Comparison:

    • KGAT, KGNN-LS, and CKAN perform at similar levels, generally better than R-GCN. This suggests that while R-GCN's relation-specific transformations are useful for KG completion, it might not be optimally designed for user-item relationship modeling in recommendation without specific adaptations.
    • Interestingly, CKE outperforms some GNN-based methods (KGAT, KGNN-LS, CKAN) on Alibaba-iFashion. Possible reasons include: (1) GNNs can be challenging to train due to nonlinear feature transformations, potentially degrading performance if not carefully tuned [14, 48]; (2) TransR (used in CKE) might effectively capture the dominant first-order connectivity in Alibaba-iFashion's KG.

6.2. Ablation Studies / Parameter Analysis

The paper conducts several ablation studies and parameter analyses to investigate the impact of KGIN's design choices.

6.2.1. Impact of Presence of User Intents & KG Relations

To understand the necessity of user intents and KG relations, two variants of KGIN-3 are tested:

  • KGIN-3_w/o I&R: Removes both user intents and KG relations. This effectively turns KGIN into a simplified GNN that only propagates node information without relational semantics.

  • KGIN-3_w/o I: Removes only user intents (sets P=0|\mathcal{P}| = 0), but retains KG relation modeling. This variant treats user-item interactions as a single relation.

    The following are the results from Table 3 of the original paper:

    | | | Amazon-Book | | Last-FM | | Alibaba-iFashion | | :-------------- | :----- | :---------- | :---------- | :---------- | :---------- | :---------------- | :---------- | | | recall | ndcg | recall | ndcg | recall | ndcg | w/o I&R | | 0.1518 | 0.0816 | 0.0802 | 0.0669 | 0.0862 | 0.0530 | w/o I | | 0.1627 | 0.0870 | 0.0942 | 0.0819 | 0.1103 | 0.0678

Analysis:

  • Necessity of Relational Modeling: Comparing KGIN-3_w/o I&R with the full KGIN-3 (Table 2), there's a dramatic reduction in predictive accuracy. This underscores that relational modeling (both user intents and KG relations) is crucial. Without it, the model lacks the semantic information necessary to capture complex relationships.
  • Necessity of User Intents: KGIN-3_w/o I also shows a performance drop compared to KGIN-3, although less severe than w/o I&R. This indicates that while KG relation modeling is beneficial, explicitly modeling user intents at a finer granularity provides additional significant gains. KGIN-3_w/o I still captures KG relations, but its user representations are less refined due to the absence of intent-specific collaborative signals.

6.2.2. Impact of Model Depth

The number of aggregation layers (LL) determines how far information propagates and thus the length of relational paths captured. Experiments vary LL in {1,2,3}\{1, 2, 3\}.

The following are the results from Table 4 of the original paper:

| | | Amazon-Book | | Last-FM | | Alibaba-iFashion | | :------- | :----- | :---------- | :---------- | :---------- | :---------- | :---------------- | :---------- | | | recall | ndcg | recall | ndcg | recall | ndcg | KGIN-1 | | 0.1455 | 0.0766 | 0.0831 | 0.0707 | 0.1045 | 0.0638 | KGIN-2 | | 0.1652 | 0.0892 | 0.0920 | 0.0791 | 0.1162 | 0.0723 | KGIN-3 | | 0.1687 | 0.0915 | 0.0978 | 0.0848 | 0.1147 | 0.0716

Analysis:

  • Benefits of Deeper Models: Increasing model depth from KGIN-1 to KGIN-2 yields substantial improvements across all datasets. KGIN-1 only considers first-order connectivity (user-intent-item and item-relation-entity), while KGIN-2 captures two-hop paths. This shows that exploring longer-range connectivity is crucial for understanding user interests and item relatedness. More information pertinent to user intents is derived from longer paths.
  • Diminishing Returns (or Saturation):
    • On Amazon-Book and Last-FM, KGIN-3 (three layers) further improves performance over KGIN-2. This indicates that higher-order connectivity beyond two hops can still provide complementary information and lead to better node representations.
    • However, on Alibaba-iFashion, KGIN-3 performs slightly worse than KGIN-2. This likely stems from the nature of the Alibaba-iFashion dataset's KG, where most KG triplets represent first-order connectivity (e.g., outfit-includes-staff). Once these primary connections are captured at two hops, adding more layers might introduce noise or lead to over-smoothing without significant new structural information. This observation highlights that the optimal model depth can be dataset-dependent.

6.2.3. Impact of Intent Modeling

6.2.3.1. Impact of the Number of Intents (P|\mathcal{P}|)

The impact of varying the number of user intents (P|\mathcal{P}|) in the set {1,2,4,8}\{1, 2, 4, 8\} is analyzed.

The following figure (Figure 4 from the original paper) shows the impact of the number of intents:

Figure 4: Impact of intent number \(( | \\mathcal { P } | )\) Best viewed in color.

Analysis:

  • Importance of Multiple Intents: When only one intent is modeled (P=1|\mathcal{P}| = 1), KGIN-3 performs poorly on both Amazon-Book and Last-FM. This strongly supports the hypothesis that user behaviors are driven by multiple intents, and explicitly modeling them is beneficial.
  • Optimal Number of Intents:
    • On Amazon-Book, performance generally improves as P|\mathcal{P}| increases from 1 to 4, but then slightly impairs at P=8|\mathcal{P}| = 8. This suggests an optimal number of intents beyond which adding more may introduce redundancy or make individual intents too fine-grained to be useful, despite independence modeling.
    • On Last-FM, increasing P|\mathcal{P}| to 8 continues to improve accuracy. The paper attributes this difference to the characteristics of the KGs. Last-FM has fewer KG relations (9), while Amazon-Book's KG (from Freebase) might contain more noisy or less relevant relations for user behaviors. A smaller, more focused KG might benefit from more fine-grained intent distinctions.

6.2.3.2. Impact of Independence Modeling

An ablation study is performed by disabling the independence modeling module (KGIN-3_w/oInd) and comparing its distance correlation coefficients with the full KGIN-3.

The following are the results from Table 5 of the original paper:

Amazon-Book w/Ind Amazon-Book w/o Ind Last-FM w/ Ind Last-FM w/o Ind Alibaba-iFashion w/ Ind Alibaba-iFashion w/o Ind
distance correlation 0.0389 0.3490 0.0365 0.4944 0.0112 0.3121

Analysis:

  • Ensuring Distinct Intents: The results clearly show that KGIN with independence modeling (w/Ind) achieves significantly lower distance correlation coefficients compared to KGIN-3_w/oInd. This demonstrates that the independence modeling module successfully encourages the intent embeddings to be less correlated and more distinct.
  • Interpretability and Capacity: While KGIN-3_w/oInd might achieve comparable recommendation performance (as the paper implies), its intents are more correlated, making them less distinct and thus harder to interpret. The independence modeling is crucial for both interpretability (ensuring each intent captures a unique aspect) and model capacity (avoiding redundant representations).

6.3. Explainability of KGIN (RQ3)

One of KGIN's key strengths is its ability to provide interpretable explanations. This is achieved by:

  1. Inducing Intents: KGIN learns intents as attentive combinations of KG relations, making their semantics explicit.

  2. Instance-wise Explanations: For a specific user-item interaction, KGIN identifies the most influential intent and relational paths based on attention scores, offering a personalized explanation.

    The following figure (Figure 5 from the original paper) provides examples of intent and interaction explanations:

    该图像是论文中的示意图,展示了用户u236与不同目标实体通过多种KG关系路径(p1-p4)的连接及其得分,突出意图p2的重要性。左侧表格列出了每个意图对应的前两KG关系及其分数,右侧图形显示了用户与物品之间的关系路径和权重。

    Analysis of Interpretability Examples (Figure 5):

  • Intent Semantics (Left Table): The table shows the top two KG relations and their attention scores for each intent (p1, p2, p3, p4) on Last-FM and Amazon-Book.

    • Amazon-Book Example:
      • P1 is heavily weighted by theater.play.genre (0.4945) and theater.plays.in-this-genre (0.3569). This suggests P1 captures a user's interest in genre and related plays.
      • P3 is weighted by date-of-the-first-performance (0.147) and fictional-universe (0.115). This indicates P3 relates to historical context or specific fictional worlds.
    • Last-FM Example:
      • P1 emphasizes featured_artist (0.7616) and versions (0.1794). This intent likely represents a preference for specific artist's versions of music.
      • The paper notes that in Last-FM, where there are only 9 relations, some relations like version might get high weights in multiple intents. This suggests that these are common factors influencing user behaviors, but their combination with other relations (e.g., featured_artist) defines a specific intent.
    • The independence modeling helps ensure that these intents, even if sharing some common relations, have distinct overall distributions, providing unique angles for explaining user behaviors.
  • Instance-wise Explanations (Right Diagram): The diagram shows an example interaction for user u231u231 and item i21904i21904 on Amazon-Book.

    • KGIN identifies intent P1 as the most influential intent for this specific interaction based on the attention scores β(u,p)\beta(u, p) (from Equation (8)).

    • The explanation derived is: "User u231u231 selects music i21904i21904 since it matches her interest on the featured artist and certain version." This explanation is directly interpretable by users because it connects the recommendation to specific KG relations (featured artist, version) that define the intent P1.

    • The diagram also shows relational paths (e.g., u231P1i21904u_{231} \xrightarrow{P_1} i_{21904} connecting to entities like v61367v61367 and v78158v78158 via relations like featured_artist and versions), further grounding the explanation in the KG structure.

      This ability to articulate why a recommendation is made, by identifying the underlying intents and the KG relations that compose them, is a significant step towards more transparent and trustworthy recommender systems.

7. Conclusion & Reflections

7.1. Conclusion Summary

This work makes significant advancements in knowledge-aware recommendation, particularly within the Graph Neural Network (GNN) paradigm. The authors successfully identified and addressed two critical limitations of existing GNN-based methods: their coarse-grained relational modeling and their failure to explicitly leverage relation dependencies in long-range connectivity.

The core contribution is the Knowledge Graph-based Intent Network (KGIN), which introduces a novel approach to relational modeling from two key dimensions:

  1. User Intent Modeling: KGIN innovatively uncovers user-item relationships at a fine-grained granularity of intents. Each intent is semantically grounded by being expressed as an attentive combination of KG relations. An independence constraint is incorporated to ensure that these intents are distinct, enhancing both the model's capacity and its interpretability.

  2. Relational Path-aware Aggregation: A new GNN aggregation scheme is proposed that recursively integrates relation sequences from multi-hop paths. This mechanism effectively preserves the holistic semantics of relational paths and their dependencies, enriching the learned user and item representations.

    Extensive experiments on three benchmark datasets demonstrated KGIN's superior performance over state-of-the-art baselines. Crucially, KGIN also provides interpretable explanations by identifying the influential intents and relational paths behind recommendations, which is a major step towards more transparent recommender systems.

7.2. Limitations & Future Work

The authors acknowledge several limitations and propose future research directions:

  1. Sparsity of Supervision: Current KG-based recommendation models, including KGIN, frame the problem as a supervised task primarily relying on historical interactions. This supervision signal can be very sparse, potentially hindering the learning of high-quality representations.

    • Future Work: Explore self-supervised learning in recommendation. This would involve generating auxiliary supervisions through self-supervised tasks to uncover internal relationships among data instances, which could mitigate the sparsity problem.
  2. Biases in Recommendation: The paper suggests that knowledge-aware recommendation could benefit from explicitly addressing biases.

    • Future Work: Introduce causal concepts into knowledge-aware recommendation. This includes causal effect inference, counterfactual reasoning, and deconfounding techniques to discover, amplify, and mitigate biases present in the data and model.
  3. Intent Granularity: While the paper explored the number of intents, it noted that for some datasets, too many intents might impair accuracy (e.g., Amazon-Book with 8 intents vs. 4).

    • Future Work: Further explore the optimal granularity of user intents and potentially dynamic ways to determine it.

7.3. Personal Insights & Critique

KGIN presents a compelling and elegant solution to critical challenges in knowledge-aware recommendation. The explicit modeling of user intents as attentive combinations of KG relations is a significant conceptual leap, offering both performance gains and a clear pathway to interpretability. Previous GNN-based models often treat the black box of "why" a recommendation is made as something to be inferred post-hoc, but KGIN builds it into its core architecture.

The relational path-aware aggregation is also a powerful innovation. By explicitly integrating relation sequences into node representations, it moves beyond simply leveraging KG as a source of additional features or weighted edges. This deeper semantic understanding of paths is critical for truly harnessing the richness of KGs. The analytical form of ei(l)\mathbf{e}_i^{(l)} (Equation 11) nicely demonstrates how relation embeddings interact along paths, providing a strong theoretical grounding for this approach.

Potential Issues/Areas for Improvement:

  1. Scalability of Independence Modeling: While effective, the independence loss using distance correlation involves calculating correlations between all pairs of intents, which scales quadratically with the number of intents, O(P2)O(|\mathcal{P}|^2). For a very large number of intents, this could become computationally intensive. The mutual information based loss also involves pair-wise comparisons.
  2. Defining Intents from Relations: The approach of defining intents as combinations of KG relations is powerful, but the trainable weights wrpw_{rp} (Equation 2) might still be somewhat abstract. Future work could explore more constrained or prior-driven ways to define intents, perhaps by clustering KG relations or leveraging natural language descriptions of common user behaviors to initialize intent semantics.
  3. Hyperparameter Sensitivity: The model has several hyperparameters (λ1,λ2\lambda_1, \lambda_2, P|\mathcal{P}|, LL). Optimal settings vary across datasets (e.g., optimal P|\mathcal{P}| for Amazon-Book vs. Last-FM). This suggests that KGIN might be sensitive to hyperparameter tuning, which is common in complex GNN models.
  4. Complexity of Path Interpretation: While KGIN identifies influential intents and relational paths, presenting this information to end-users in a digestible and actionable way remains a challenge for real-world deployment. How to summarize a multi-hop relational path and its intent for a non-technical user is an open problem in explainable AI.

Transferability and Broader Applications: The core ideas of intent modeling and relational path-aware aggregation are highly transferable beyond recommendation systems:

  • Knowledge Graph Reasoning: The relational path-aware aggregation could be applied to more general KG reasoning tasks, such as KG completion or question answering over KGs, where understanding the sequence of relations is crucial.

  • Explainable AI (XAI): The approach to defining and enforcing independence among interpretable latent factors (intents) could be adapted to other XAI domains where disentangling underlying reasons for model decisions is important.

  • Personalized Content Generation: Understanding fine-grained user intents could inform personalized content generation (e.g., generating product descriptions that highlight aspects relevant to a user's intent).

    Overall, KGIN is a significant contribution that pushes the boundaries of knowledge-aware recommendation by integrating interpretable latent factors (intents) with a more semantically rich GNN aggregation mechanism. Its focus on both performance and explainability makes it a highly relevant and impactful work.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.