Paper status: completed

Comprehending Knowledge Graphs with Large Language Models for Recommender Systems

Published:10/16/2024

LLM-based Recommendation Systems (28)Knowledge Graph-Driven Recommendation (1)Semantic-Enhanced Representation Learning (1)Retrieval-Augmented Representation Learning (1)Subgraph Extraction and Fusion (1)

Original Link PDF

Price: 0.100000

6 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

CoLaKG employs large language models to enrich knowledge graphs by addressing missing facts, capturing high-order connections, and preserving semantic information, significantly enhancing recommendation results on multiple real-world datasets.

Abstract

In recent years, the introduction of knowledge graphs (KGs) has significantly advanced recommender systems by facilitating the discovery of potential associations between items. However, existing methods still face several limitations. First, most KGs suffer from missing facts or limited scopes. Second, existing methods convert textual information in KGs into IDs, resulting in the loss of natural semantic connections between different items. Third, existing methods struggle to capture high-order connections in the global KG. To address these limitations, we propose a novel method called CoLaKG, which leverages large language models (LLMs) to improve KG-based recommendations. The extensive knowledge and remarkable reasoning capabilities of LLMs enable our method to supplement missing facts in KGs, and their powerful text understanding abilities allow for better utilization of semantic information. Specifically, CoLaKG extracts useful information from KGs at both local and global levels. By employing the item-centered subgraph extraction and prompt engineering, it can accurately understand the local information. In addition, through the semantic-based retrieval module, each item is enriched by related items from the entire knowledge graph, effectively harnessing global information. Furthermore, the local and global information are effectively integrated into the recommendation model through a representation fusion module and a retrieval-augmented representation learning module, respectively. Extensive experiments on four real-world datasets demonstrate the superiority of our method.

Mind Map

In-depth Reading

English Analysis~15 min read · 21,319 chars

1. Bibliographic Information

Title: Comprehending Knowledge Graphs with Large Language Models for Recommender Systems
Authors: Ziqiang Cui, Yunpeng Weng, Xing Tang, Fuyuan Lyu, Dugang Liu, Xiuqiang He, and Chen Ma.
Affiliations: The authors are affiliated with City University of Hong Kong, Tencent, McGill University & MILA, Shenzhen University, and Shenzhen Technology University. This indicates a collaboration between academic institutions and industry (Tencent), suggesting the research has both theoretical grounding and practical relevance.
Journal/Conference: The paper is submitted to the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25). SIGIR is a premier, top-tier international conference in the field of information retrieval, known for its rigorous peer-review process and high impact.
Publication Year: 2025 (as per the submission details).
Abstract: The paper proposes a novel method, CoLaKG, to enhance knowledge graph (KG)-based recommender systems using Large Language Models (LLMs). It addresses three key limitations of existing methods: (1) KGs often have missing facts; (2) converting text to IDs loses semantic meaning; and (3) capturing high-order (distant) connections in a KG is difficult. CoLaKG leverages the knowledge and reasoning of LLMs to supplement KGs and understand their semantics. It comprehends KGs at both a local level (by analyzing item-centered subgraphs via prompts) and a global level (by retrieving semantically related items from the entire KG). This information is then integrated into a recommendation model through representation fusion and a retrieval-augmented learning module. Experiments on four datasets show that CoLaKG significantly outperforms existing methods.
Original Source Link:
- ArXiv: https://arxiv.org/abs/2410.12229v3
- PDF: https://arxiv.org/pdf/2410.12229v3.pdf
- The paper is a preprint on ArXiv, submitted for publication at SIGIR '25.

2. Executive Summary

Background & Motivation (Why): Recommender systems are essential for navigating information overload, but they often struggle with data sparsity (not enough user interaction data). To overcome this, many systems incorporate external knowledge using Knowledge Graphs (KGs), which connect items based on their real-world attributes (e.g., a movie's director, genre, actors). However, existing KG-based methods have critical flaws:
1. Incomplete Knowledge: KGs are often manually built and incomplete. If a movie is missing its "genre" attribute, it loses a potential connection to other movies of the same genre.
2. Semantic Loss: Traditional methods convert textual information (like "horror" and "thriller") into distinct numerical IDs. This process discards the rich semantic relationship between the words, treating them as completely unrelated.
3. Limited Scope: These methods, often based on Graph Neural Networks (GNNs), struggle to capture connections between items that are many "hops" away from each other in the KG, a problem known as over-smoothing. The paper proposes to use Large Language Models (LLMs) to address these gaps. LLMs possess vast world knowledge and a deep understanding of text, making them ideal for "filling in the blanks" in a KG and recognizing semantic similarities that ID-based methods miss.
Main Contributions / Findings (What): The paper introduces CoLaKG (Comprehending Knowledge Graphs with Large Language Models), a novel framework with several key contributions:
1. LLM-Powered KG Comprehension: It is the first to use LLMs to systematically comprehend KGs for recommendations by understanding both their structure and semantics. This helps mitigate the issues of missing facts and semantic loss.
2. Dual Local and Global Perspective: CoLaKG analyzes KGs from two viewpoints:
  - Local: It prompts an LLM with an item's immediate neighborhood (1-hop and 2-hop connections) to generate a rich, semantically aware description.
  - Global: It uses the LLM-generated descriptions to find semantically similar items across the entire KG, even if they are structurally distant. This is achieved via a retrieval mechanism.
3. Efficient Decoupled Architecture: The LLM-based comprehension is performed offline as a pre-processing step. The resulting semantic embeddings are then fed into a standard recommendation model. This means the LLM is not needed during the live recommendation (inference) phase, making the system efficient and practical for real-world deployment.
4. Superior Performance: Extensive experiments show that CoLaKG significantly outperforms traditional, KG-based, and other LLM-based recommendation models on four diverse datasets.

Foundational Concepts:
- Recommender Systems: Systems that predict a user's interest in an item and suggest items they are likely to enjoy. They are widely used on platforms like Netflix, Amazon, and Spotify.
- Collaborative Filtering (CF): A classic recommendation technique that works on the principle "users who liked similar items in the past will like similar items in the future." It relies solely on the user-item interaction matrix but suffers from data sparsity when there are few interactions.
- Knowledge Graph (KG): A structured representation of facts in the form of a graph. It consists of nodes (entities, e.g., "Titanic," "Leonardo DiCaprio") and edges (relations, e.g., "starred_in"). In recommendations, KGs provide rich side information about items, helping to connect them in meaningful ways (e.g., "Movie A" and "Movie B" are connected because they share the same director).
- Graph Neural Networks (GNNs): A class of neural networks designed to work with graph-structured data. They operate by passing "messages" between connected nodes and aggregating information from their neighbors. In KG-based recommendation, GNNs propagate information across the KG to learn better item and user representations. However, stacking too many GNN layers can lead to the over-smoothing problem, where all node representations become indistinguishable.
- Large Language Models (LLMs): Massive neural networks (like GPT-4 or DeepSeek) trained on vast amounts of text data. They excel at understanding natural language, reasoning, and possess extensive "world knowledge," which can be used to infer missing information or understand semantic nuances.
- Embeddings: Numerical vector representations of entities like users, items, or words. These vectors capture the semantic meaning of the entity, such that similar entities have similar vectors.
Previous Works: The paper positions itself within the evolution of knowledge-aware recommendation:
- Embedding-based Methods (e.g., CKE): These methods learn embeddings for entities and relations in the KG and integrate them into the recommendation model. They primarily use the KG's structure but often ignore the textual semantics.
- Path-based Methods (e.g., PER, MCRec): These methods explicitly define and use "meta-paths" (e.g., User → Item → Actor → Item) to capture high-order relationships. Their main drawback is the need for manual, domain-specific design of these paths.
- GNN-based Methods (e.g., KGAT, KGIN): These have become the standard. They automatically learn how to propagate information through the KG. However, as noted, they suffer from ID-based semantic loss and have difficulty capturing global connections. Some recent methods (KGCL, KGRec) use contrastive learning to reduce noise but still operate on the ID-based graph.
- LLMs for Recommendation (e.g., RLMRec, KAR): Recent works have started using LLMs for recommendations. Some use LLMs to generate item descriptions, while others use them as the recommender itself (zero-shot recommendation). However, these methods often don't leverage the structured, task-specific knowledge available in a KG.
Differentiation: CoLaKG is distinct from prior art in a crucial way. Instead of just using the KG's structure or using an LLM in isolation, it creates a synergy: it uses the LLM to "read" and "understand" the KG.
- Compared to GNN-based methods, CoLaKG directly processes text, avoiding semantic loss, and uses its inherent knowledge to complete the KG. Its retrieval mechanism for global context bypasses the over-smoothing limitations of deep GNNs.
- Compared to other LLM-based methods, CoLaKG grounds the LLM's reasoning in a task-relevant KG, mitigating the risk of LLM "hallucinations" (generating plausible but incorrect information) and focusing its power on the specific recommendation task.

4. Methodology (Core Technology & Implementation)

The CoLaKG framework is a two-stage process, as illustrated in Figure 2.

Figure 2: The framework of our proposed CoLaKG.

Stage 1: KG Comprehension with LLMs (Offline)

This stage generates semantic embeddings for items and users.

4.1.1 Local KG Comprehension (for each item): The goal is to generate a rich semantic representation for each item using its local neighborhood in the KG.
1. Subgraph Extraction: For an item $v$ , the model extracts its 1-hop and 2-hop neighbors from the KG.
  - 1-hop: All triples (v, r, e) where $v$ is the head entity (e.g., (Titanic, has_genre, Romance)).
  - 2-hop: To avoid an explosion of nodes, it randomly samples $m$ triples connected to the 1-hop neighbors (e.g., if "James Cameron" is a 1-hop neighbor of "Titanic", a 2-hop triple might be (James Cameron, directed, Avatar)).
2. Prompt Engineering: The extracted triples are converted into natural language and formatted into a prompt for the LLM. As shown in Figure 3, the prompt instructs the LLM to act as an expert, summarize the item's attributes based on the provided 1-hop and 2-hop information, and infer any related details.
  
  该图像是图表，展示了论文中用于局部知识图理解的提示模板，示例内容涉及电影推荐的一级和二级关系信息，如导演和演员的关联描述。
3. LLM-based Generation: The LLM processes the prompt to generate a comprehensive textual summary $C_v$ of the item. This process is formulated as: $C_v = \mathrm{LLMs}(\Psi_v, \mathcal{D}_v, \mathcal{D}_v')$
  - $C_v$ : The textual comprehension generated by the LLM for item $v$ .
  - $\Psi_v$ : The system prompt/instruction.
  - $\mathcal{D}_v$ : Text representation of the 1-hop triples.
  - $\mathcal{D}_v'$ : Text representation of the sampled 2-hop triples.
4. Semantic Embedding: The textual summary $C_v$ is converted into a dense vector embedding $\mathbf{s}_v$ using a pre-trained sentence embedding model $\mathcal{P}$ (like SimCSE). $\mathbf{s}_v = \mathcal{P}(C_v)$
4.1.2 Retrieval-based Global KG Utilization: This component captures long-range semantic relationships across the entire KG.
1. Similarity Calculation: Using the semantic embeddings $\mathbf{s}_v$ for all items, the model computes the cosine similarity between every pair of items $(v_i, v_j)$ . This creates a fully-connected semantic item-item graph where edge weights represent semantic similarity. $r_{(v_i, v_j)} = \mathrm{sim}(\mathbf{s}_{v_i}, \mathbf{s}_{v_j})$
2. Neighbor Retrieval: For each item $v_i$ , the model retrieves the top- $k$ most semantically similar items, forming its neighbor set $\mathcal{N}_k(v_i)$ . These neighbors can be structurally distant in the original KG but are semantically close.
4.2 User Preference Comprehension: A similar process is applied to understand user preferences.
1. For a user $u$ , the model gathers all items they have interacted with and their corresponding 1-hop KG information.
2. This information is concatenated into a single text block $\mathcal{D}_u$ and fed to the LLM with a prompt $\mathcal{I}_u$ asking it to summarize the user's tastes.
3. The LLM's output $C_u$ is converted into a user semantic embedding $\mathbf{s}_u$ . $\mathbf{s}_u = \mathcal{P}(C_u)$

Stage 2: Retrieval-Augmented Representation Learning (Online Training)

This stage integrates the pre-computed semantic embeddings into the recommendation model.

4.3.1 Cross-Modal Representation Alignment: The model needs to combine the traditional ID-based embeddings ( $\mathbf{e}_u, \mathbf{e}_v$ ) with the new semantic embeddings ( $\mathbf{s}_u, \mathbf{s}_v$ ).
1. Alignment: Since these embeddings are from different "modalities" and may have different dimensions, a learnable adapter network (a linear projection layer with an ELU activation function $\sigma$ $σ$ ) maps the semantic embeddings into the same space as the ID embeddings. $\mathbf{s}_v' = \sigma(\mathbf{W}_1 \mathbf{s}_v); \quad \mathbf{s}_u' = \sigma(\mathbf{W}_2 \mathbf{s}_u)$
  - $\mathbf{W}_1, \mathbf{W}_2$ : Learnable weight matrices for alignment.
  - $\mathbf{s}_v', \mathbf{s}_u'$ : The projected semantic embeddings.
2. Fusion: The ID and projected semantic embeddings are fused using simple mean pooling to create initial user and item representations that contain both collaborative and semantic signals. $\mathbf{h}_v = \frac{1}{2}(\mathbf{e}_v + \mathbf{s}_v'); \quad \mathbf{h}_u = \frac{1}{2}(\mathbf{e}_u + \mathbf{s}_u')$
4.3.2 Item Representation Augmentation with Retrieved Neighbors: The item representations $\mathbf{h}_v$ are further enhanced using the globally retrieved neighbors $\mathcal{N}_k(v)$ .
1. Attention Mechanism: An attention mechanism calculates the importance $\alpha_{ij}$ of each neighbor $v_j \in \mathcal{N}_k(v_i)$ to the central item $v_i$ . The attention scores are computed based on their semantic embeddings $\mathbf{s}_v$ , ensuring the aggregation is guided by semantic relevance.
2. Aggregation: The final augmented item representation $\mathbf{h}'_{v_i}$ is a combination of the item's own fused embedding and a weighted average of its neighbors' embeddings. $\mathbf{h}_{v_i}' = \sigma\left(\frac{1}{2}\left(\mathbf{h}_{v_i} + \sum_{j \in N_k(v_i)} \alpha_{ij} \mathbf{h}_{v_j}\right)\right)$
4.4-4.5 User-Item Modeling and Training:
1. Recommendation Backbone: The final user representations $\mathbf{h}_u$ and augmented item representations $\mathbf{h}'_v$ are used as the initial embeddings for a standard recommendation model. The paper uses LightGCN for its simplicity and effectiveness. LightGCN then performs several layers of message passing on the user-item interaction graph to capture collaborative filtering signals.
2. Prediction: The final prediction score is the inner product of the final user and item embeddings learned by LightGCN.
3. Training: The model is trained end-to-end (except for the fixed LLM-generated embeddings) using the Bayesian Personalized Ranking (BPR) loss, a standard pairwise loss function that aims to rank observed (positive) items higher than unobserved (negative) items for a given user. $\mathcal{L} = \sum_{(u, v^+, v^-) \in O} -\ln \sigma(\hat{y}_{uv^+} - \hat{y}_{uv^-}) + \lambda \|\Theta\|_2^2$

5. Experimental Setup

Datasets: Experiments were conducted on four real-world datasets from different domains to ensure generalizability. The statistics are transcribed from Table 1 below.

MovieLens-1M: A popular benchmark for movie recommendations.
Last-FM: A music recommendation dataset based on user listening habits.
MIND: A large-scale news recommendation dataset.
Fund: An industrial dataset for financial fund recommendation.

(Manual Transcription of Table 1) Table 1: Dataset statistics.

Statistics	MovieLens	Last-FM	MIND	Funds
# Users	6,040	1,859	44,603	209,999
# Items	3,260	2,813	15,174	5,701
# Interactions	998,539	86,608	1,285,064	1,225,318
Knowledge Graph
# Entities	12,068	9,614	32,810	8,111
# Relations	12	2	14	12
# Triples	62,958	118,500	307,140	65,697

Evaluation Metrics: The performance of top-N recommendation is evaluated using two standard metrics, Recall@k and NDCG@k, for k=10 and 20.
1. Recall@k:
  - Conceptual Definition: This metric measures the proportion of relevant items (that the user actually interacted with in the test set) that are found in the top- $k$ recommended items. It answers the question: "Out of all the items the user liked, how many did we manage to recommend in the top- $k$ list?"
  - Mathematical Formula: For a single user, $\mathrm{Recall@k} = \frac{|\{\text{Recommended Items}\} \cap \{\text{Relevant Items}\}|}{|\{\text{Relevant Items}\}|}$ The final value is averaged over all users.
  - Symbol Explanation: Recommended Items is the set of top- $k$ items suggested by the model. Relevant Items is the set of items from the test set that the user has interacted with.
2. Normalized Discounted Cumulative Gain (NDCG@k):
  - Conceptual Definition: NDCG@k is a more sophisticated metric that evaluates the quality of the ranking. It rewards models for placing relevant items higher up in the top- $k$ list. A relevant item at position 1 is more valuable than one at position 10. The score is normalized so that a perfect ranking yields a score of 1.
  - Mathematical Formula: $\mathrm{NDCG@k} = \frac{\mathrm{DCG@k}}{\mathrm{IDCG@k}}, \quad \text{where} \quad \mathrm{DCG@k} = \sum_{i=1}^{k} \frac{rel_i}{\log_2(i+1)}$
  - Symbol Explanation: $i$ is the rank position. $rel_i$ is the relevance of the item at position $i$ (1 if relevant, 0 otherwise). DCG@k is the Discounted Cumulative Gain. IDCG@k is the Ideal DCG, which is the DCG of a perfect ranking where all relevant items are placed at the top.
Baselines: CoLaKG is compared against a comprehensive set of 12 baselines from three categories:
- Classical Methods: BPR-MF, NFM, LightGCN. These rely mostly on collaborative filtering signals.
- KG-enhanced Methods: CKE, RippleNet, KGAT, KGIN, KGCL, KGRec. These represent the state-of-the-art in using KGs for recommendation.
- LLM-based Methods: RLMRec, KAR, CLLM4Rec. These represent alternative ways of leveraging LLMs in recommendation.

6. Results & Analysis

Core Results: The main results, transcribed from Table 2, show the performance of CoLaKG against all baselines.

(Manual Transcription of Table 2)

Model	MovieLens				Last-FM				MIND				Funds
Model	R@10	N@10	R@20	N@20	R@10	N@10	R@20	N@20	R@10	N@10	R@20	N@20	R@10	N@10	R@20	N@20
BPR-MF	0.1257	0.3100	0.2048	0.3062	0.1307	0.1352	0.1971	0.1685	0.0315	0.0238	0.0537	0.0310	0.4514	0.3402	0.5806	0.3809
NFM	0.1346	0.3558	0.2129	0.3379	0.2246	0.2327	0.3273	0.2830	0.0495	0.0356	0.0802	0.0458	0.4388	0.3187	0.5756	0.3651
LightGCN	0.1598	0.3901	0.2512	0.3769	0.2589	0.2799	0.3642	0.3321	0.0624	0.0492	0.0998	0.0609	0.4992	0.3778	0.6353	0.4204
CKE	0.1524	0.3783	0.2373	0.3609	0.2342	0.2545	0.3266	0.3001	0.0526	0.0417	0.0822	0.0510	0.4926	0.3702	0.6294	0.4130
RippleNet	0.1415	0.3669	0.2201	0.3423	0.2267	0.2341	0.3248	0.2861	0.0472	0.0364	0.0785	0.0451	0.4764	0.3591	0.6124	0.4003
KGAT	0.1536	0.3782	0.2451	0.3661	0.2470	0.2595	0.3433	0.3075	0.0594	0.0456	0.0955	0.0571	0.5037	0.3751	0.6418	0.4182
KGIN	0.1631	0.3959	0.2562	0.3831	0.2562	0.2742	0.3611	0.3215	0.0640	0.0518	0.1022	0.0639	0.5079	0.3857	0.6428	0.4259
KGCL	0.1554	0.3797	0.2465	0.3677	0.2599	0.2763	0.3652	0.3284	0.0671	0.0543	0.1059	0.0670	0.5071	0.3877	0.6355	0.4273
KGRec	0.1640	0.3968	0.2571	0.3842	0.2571	0.2748	0.3617	0.3251	0.0627	0.0506	0.1003	0.0625	0.5104	0.3913	0.6467	0.4304
RLMRec	0.1613	0.3920	0.2524	0.3787	0.2597	0.2812	0.3651	0.3335	0.0619	0.0486	0.0990	0.0602	0.4988	0.3784	0.6351	0.4210
KAR	0.1582	0.3869	0.2511	0.3722	0.2532	0.2770	0.3612	0.3324	0.0615	0.0480	0.1002	0.0613	0.5033	0.3812	0.6312	0.4175
CLLM4Rec	0.1563	0.3841	0.2433	0.3637	0.2571	0.2793	0.3642	0.3268	0.0631	0.0494	0.1012	0.0628	0.4996	0.3791	0.6273	0.4103
CoLaKG	0.1699	0.4130	0.2642	0.3974	0.2738	0.2948	0.3803	0.3471	0.0698	0.0562	0.1087	0.0684	0.5273	0.4012	0.6524	0.4392

Analysis:

Consistent Superiority: CoLaKG achieves the best performance on all metrics across all four datasets, demonstrating its effectiveness and robustness.
Value of KGs: KG-enhanced methods like KGIN and KGRec generally outperform classical CF methods like LightGCN, confirming that KGs provide valuable side information.
Superiority over LLM Baselines: CoLaKG significantly outperforms other LLM-based methods (RLMRec, KAR, CLLM4Rec). This suggests that CoLaKG's approach of using LLMs to deeply comprehend a task-specific KG is more effective than using LLMs in a more general way (e.g., just for generating item profiles).

Validation of Generalizability: Table 3 shows that the semantic information generated by CoLaKG can be plugged into various recommendation backbones (BPR-MF, NFM, LightGCN) and consistently improves their performance. This highlights that the core contribution of CoLaKG—the LLM-powered KG comprehension—is a versatile module that can enhance a wide range of models.

(Manual Transcription of Table 3)

Model	MovieLens		Last-FM		MIND
Model	R@20	N@20	R@20	N@20	R@20	N@20
BPR-MF	0.2048	0.3062	0.1971	0.1685	0.0537	0.0310
BPR-MF+Ours	0.2213	0.3255	0.2104	0.1812	0.0609	0.3986
NFM	0.2129	0.3379	0.3273	0.2830	0.0802	0.0458
NFM+Ours	0.2285	0.3527	0.3478	0.2996	0.0859	0.0487
LightGCN	0.2512	0.3769	0.3642	0.3321	0.0998	0.0609
LightGCN+Ours	0.2642	0.3974	0.3803	0.3471	0.1087	0.0684

Ablation Study: This study dissects the CoLaKG model to verify the contribution of each component.

(Manual Transcription of Table 4)

	Metric	w/o s_v	w/o s_u	w/o N_k(v)	w/o D'_v	CoLaKG
ML	R@20	0.2553	0.2613	0.2603	0.2628	0.2642
ML	N@20	0.3811	0.3948	0.3902	0.3960	0.3974
Last-FM	R@20	0.3628	0.3785	0.3725	0.3789	0.3803
Last-FM	N@20	0.3278	0.3465	0.3403	0.3459	0.3471
MIND	R@20	0.1043	0.1048	0.1064	0.1076	0.1087
MIND	N@20	0.0640	0.0658	0.0662	0.0671	0.0684
Funds	R@20	0.6382	0.6481	0.6455	0.6499	0.6524
Funds	N@20	0.4247	0.4351	0.4305	0.4378	0.4392