Synergistic Integration and Discrepancy Resolution of Contextualized Knowledge for Personalized Recommendation
TL;DR Summary
CoCo integrates semantic and behavioral data via adaptive fusion and discrepancy resolution, forming dynamic user-specific contextual embeddings. It outperforms seven state-of-the-art methods, enhancing recommendation accuracy and boosting sales in real-world e-commerce systems.
Abstract
The integration of large language models (LLMs) into recommendation systems has revealed promising potential through their capacity to extract world knowledge for enhanced reasoning capabilities. However, current methodologies that adopt static schema-based prompting mechanisms encounter significant limitations: (1) they employ universal template structures that neglect the multi-faceted nature of user preference diversity; (2) they implement superficial alignment between semantic knowledge representations and behavioral feature spaces without achieving comprehensive latent space integration. To address these challenges, we introduce CoCo, an end-to-end framework that dynamically constructs user-specific contextual knowledge embeddings through a dual-mechanism approach. Our method realizes profound integration of semantic and behavioral latent dimensions via adaptive knowledge fusion and contradiction resolution modules. Experimental evaluations across diverse benchmark datasets and an enterprise-level e-commerce platform demonstrate CoCo's superiority, achieving a maximum 8.58% improvement over seven cutting-edge methods in recommendation accuracy. The framework's deployment on a production advertising system resulted in a 1.91% sales growth, validating its practical effectiveness. With its modular design and model-agnostic architecture, CoCo provides a versatile solution for next-generation recommendation systems requiring both knowledge-enhanced reasoning and personalized adaptation.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
- Title: Synergistic Integration and Discrepancy Resolution of Contextualized Knowledge for Personalized Recommendation
- Authors: Lingyu Mu, Hao Deng, Haibo Xing, Kaican Lin, Zhitong Zhu, Yu Zhang, Xiaoyi Zeng, Zhengxiao Liu, Zheng Lin, and Jinxin Hu.
- Affiliations: The authors are from the Institute of Information Engineering at the Chinese Academy of Sciences and the Alibaba International Digital Commerce Group. This collaboration between a prominent academic research institution and a major e-commerce company suggests the work is grounded in both rigorous theory and practical, large-scale application.
- Journal/Conference: The paper is formatted for an ACM conference proceeding, though the specific venue is not named. The provided DOI is a placeholder (
XXXXXXX.XXXXXXX), indicating it is likely under review or in preparation for submission. - Publication Year: The paper's ACM reference format lists the year as 2026, and its online A/B test is dated September 2025. These futuristic dates, combined with the arXiv link, suggest these are placeholders and the paper is a recent preprint.
- Abstract: The abstract identifies two key limitations in current Large Language Model (LLM)-based recommendation systems: (1) they use static, universal prompt templates that ignore user diversity, and (2) they only achieve superficial alignment between LLM-generated semantic knowledge and user behavioral data. To solve this, the authors propose
CoCo, an end-to-end framework.CoCodynamically creates user-specific knowledge embeddings using adaptive knowledge fusion and contradiction resolution modules. Experiments on benchmark and enterprise datasets showCoCooutperforms seven state-of-the-art methods by up to 8.58%. Its deployment on a live advertising system led to a 1.91% sales growth, confirming its real-world effectiveness. - Original Source Link: The paper is available as a preprint on arXiv: https://arxiv.org/abs/2510.14257. Its status is a preprint, meaning it has not yet completed a formal peer-review process for a journal or conference.
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: Traditional recommender systems (RSs) struggle with issues like recommending items for new users (
cold-start) or for niche items with little interaction data (long-tail). While Large Language Models (LLMs) can help by providing "world knowledge," current integration methods are flawed. - Identified Gaps: The paper argues that existing LLM-enhanced recommenders suffer from two major weaknesses:
- Static Prompting: They use a single, fixed prompt template for all users. This "one-size-fits-all" approach fails to capture the diverse and multi-faceted nature of individual user preferences.
- Superficial Integration: They treat the LLM as a black-box knowledge generator and simply concatenate its output with the recommender's features. This fails to resolve fundamental discrepancies between the LLM's "semantic space" (based on language) and the RS's "behavioral space" (based on clicks, purchases, etc.), which can introduce noise and degrade performance.
- Innovation: The paper introduces
CoCo, a framework that tackles these issues head-on. Its innovation lies in a dual-mechanism approach that enables a dynamic, bidirectional relationship between the LLM and the RS. It personalizes knowledge generation for each user and actively fine-tunes the LLM to align its understanding with user behavior, moving beyond static, one-way information flow.
- Core Problem: Traditional recommender systems (RSs) struggle with issues like recommending items for new users (
-
Main Contributions / Findings (What):
- Systematic Analysis: The paper first empirically demonstrates that (i) the value of LLM-generated knowledge heavily depends on how well the prompt aligns with a user's specific characteristics, and (ii) not all knowledge generated by an LLM is beneficial; some can be noisy and harmful.
- Novel Framework (
CoCo): The core contribution is an end-to-end framework namedCoCo(Collaboration-Contradiction fusion) that deeply integrates LLMs and RSs.- Collaboration Enhancement: A module that dynamically generates personalized prompts for each user by selecting from a learnable pool of "soft prompts." This ensures the LLM's knowledge output is tailored to the individual.
- Contradiction Elimination: A novel module that evaluates whether the LLM's generated knowledge is actually helping the recommendation. If it's not, the framework uses a parameter-efficient fine-tuning technique (
LoRA) to adjust the LLM, progressively aligning its semantic space with the RS's behavioral space.
- Robust Performance:
CoCosignificantly outperforms seven cutting-edge baselines across multiple datasets and five different recommender architectures, with up to an 8.58% improvement in accuracy. - Proven Practical Value: An online A/B test on a large-scale commercial advertising platform resulted in a 1.91% increase in sales and a 0.64% increase in Gross Merchandise Volume (GMV), validating its real-world business impact.
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Recommender Systems (RSs): These are algorithms designed to filter information and predict the "rating" or "preference" a user would give to an item. They are crucial for platforms like Amazon, Netflix, and Spotify to handle information overload. A key task is sequential recommendation, where the goal is to predict the next item a user will interact with based on their history.
- Large Language Models (LLMs): These are massive neural networks (e.g., GPT-3, Qwen) trained on internet-scale text data. Their key strength is "world knowledge"—a vast, implicit understanding of facts, concepts, and common sense reasoning, stored within their parameters.
- LLM-based Recommender Systems (LRSs): This is an emerging paradigm that aims to enhance RSs by leveraging the world knowledge and reasoning abilities of LLMs.
- Prompting: A prompt is the input text given to an LLM to instruct it on what task to perform.
- Structured Prompts: Fixed text templates where user/item data is inserted. They are easy to create but rigid.
- Soft Prompts: Learnable vectors (not human-readable text) that are optimized alongside the main model. They offer more flexibility and can capture nuances that are hard to express in words.
- Latent Space: An abstract, multi-dimensional space where data points (like users and items) are represented as vectors. In this paper, two key spaces are discussed:
- Behavioral Space: A latent space created by the RS, where representations are learned from user interaction patterns (clicks, purchases).
- Semantic Space: A latent space created by the LLM, where representations are based on the meaning and context of text. A core challenge is bridging the gap between these two different spaces.
- Vector Quantization (VQ): A technique that maps continuous vectors to a finite set of "codebook" vectors. In
CoCo, it's cleverly used to select the most relevant soft prompts for a user from a discrete candidate pool. - LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning (PEFT) method. Instead of retraining an entire multi-billion parameter LLM, LoRA freezes the original weights and injects small, trainable "low-rank" matrices into its layers. This allows the model to adapt to a new task with minimal computational cost and without forgetting its original knowledge.
- Contrastive Loss (InfoNCE): A loss function that trains a model to pull representations of "positive" pairs (e.g., a user and an item they liked) closer together in the latent space, while pushing them away from "negative" pairs (items they didn't interact with).
-
Previous Works & Technological Evolution:
- Traditional RSs: Models like
SASRecandBERT4Recrely solely on user-item interaction data, making them vulnerable to data sparsity. - LLMs as Feature Encoders: Early LRSs like
UniSRecandMoRecused LLMs to convert item text or images into rich feature embeddings. Limitation: This treats the LLM as a static feature extractor and doesn't leverage its powerful reasoning capabilities. - Prompt-based Two-Stage Fusion: More recent methods like
KARandR4ecuse prompts to ask the LLM for knowledge or reasoning about a user's preferences. This knowledge is then fused with RS features in a second stage. Limitations (addressed byCoCo): They use static, universal prompts and perform a shallow fusion of features, failing to resolve the underlying mismatch between the semantic and behavioral spaces.
- Traditional RSs: Models like
-
Differentiation:
CoCostands out from prior work in two main ways:- Dynamic & Personalized Prompts: Unlike the static templates in
KARandR4ec,CoCouses a VQ-based mechanism to dynamically select a personalized set of soft prompts for each user, tailoring the knowledge generation process. - Deep & Adaptive Fusion: Instead of just concatenating features,
CoCointroduces a "contradiction elimination" mechanism. It actively checks if the LLM's knowledge is beneficial and, if not, fine-tunes the LLM itself via LoRA. This creates a feedback loop that deeply aligns the LLM's semantic space with the RS's behavioral space over time.
- Dynamic & Personalized Prompts: Unlike the static templates in
4. Methodology (Core Technology & Implementation)
The paper's proposed framework, CoCo, is designed to create a synergistic and adaptive loop between an LLM and an RS. It operates through two main phases: Collaboration Enhancement and Contradiction Elimination.

4.1 Overview of CoCo
As illustrated in the figure above, CoCo integrates an RS and an LLM. The collaboration phase focuses on generating personalized semantic knowledge. The contradiction phase focuses on resolving discrepancies between the LLM's knowledge and the user's actual behavior by fine-tuning the LLM.
4.2 Collaboration Enhancement
This phase aims to generate high-quality, personalized semantic knowledge from the LLM.
4.2.1 Personalized Prompt Generation based on Vector Quantization
- Goal: Move beyond static prompts to generate prompts tailored to each user.
- Method:
- Soft Prompt Pool: Instead of hand-crafting text prompts, the framework creates a candidate pool (or "codebook") of "soft prompts." Each soft prompt is a learnable vector that can capture complex semantic concepts.
- User-to-Prompt Matching: For a given user , their features are encoded into a representation . The framework then computes the similarity between this user vector and every soft prompt vector in the pool.
- : The similarity score for the -th prompt.
- : Cosine similarity function.
- : The user's vector representation.
- : The -th soft prompt vector from the codebook.
- Prompt Selection: A subset of prompts is selected for the user if their similarity score is above a certain threshold .
- LLM Input Formulation: The selected personalized prompts, along with a shared general prompt and text from the user's interaction history (e.g., item titles), are concatenated to form the final input for the LLM.
- Training the Prompt Pool: The soft prompt codebook is trained using a Vector Quantization loss, which encourages the user vector to be close to the selected prompt vectors.
- : The quantization loss.
- : The selected codebook vector (prompt) that is closest to the user vector.
4.2.2 Semantic Decoupling based on Orthogonal Constraint
- Goal: When multiple prompts are fed to the LLM at once, its output is a mixture of different semantic ideas. This step aims to disentangle this mixed output.
- Method: A
cross-attentionmechanism is used. The personalized prompts act as "queries" to "pull out" the relevant information from the LLM's mixed output.Query: The user's personalized prompt matrix .KeyandValue: The mixed semantic output from the LLM, .- The result, , is a set of "decoupled" semantic knowledge vectors, each corresponding to one of the input prompts.
- Promoting Diversity: To prevent the soft prompts in the pool from becoming redundant, an orthogonal constraint loss is added. This loss penalizes high similarity between different prompts, encouraging them to be semantically diverse.
4.2.3 Semantic-Behavioral Alignment
- Goal: Fuse the pure semantic knowledge with the user's behavioral representation from the recommender system.
- Method: Another
cross-attentionmodule is used, where the behavioral representation attends to a combined representation of both semantic and behavioral information. This allows the model to selectively focus on the most relevant knowledge from the LLM for the final prediction.
4.3 Contradiction Elimination
This is the most novel part of the framework, creating a feedback loop to correct the LLM.
- Goal: Address cases where the LLM's generated knowledge is unhelpful or even harmful to the recommendation task.
- Method:
- Assess Utility: For each recommendation, the framework compares two scores: (1) the similarity between the original RS output and the target item , and (2) the similarity between the final aligned output (with LLM knowledge) and the target item .
- Create a Decision Mask: An indicator function creates a binary mask . If the LLM's knowledge improved the prediction, ; otherwise, .
- Conditional Fine-Tuning (Gradient Masking): This mask is used to control the backpropagation of gradients. Gradients are blocked (
stop-gradient) for instances where the LLM was helpful (). For instances where the LLM was not helpful (), gradients are allowed to flow back and update the LLM's parameters. - Parameter-Efficient Tuning with LoRA: The gradient updates are applied only to lightweight LoRA matrices added to the LLM, not the entire model. This efficiently "tunes" the LLM, nudging its semantic space to better align with the RS's behavioral space, without the massive cost of full fine-tuning and while preserving its pre-trained world knowledge.
4.4 Overall Training Objective
The final loss function is a weighted sum of four components:
- : The main recommendation loss (InfoNCE).
- : An auxiliary loss to ensure the RS representation aligns with the target item.
- : The loss to encourage prompt diversity.
- : The loss to train the prompt codebook.
- : Hyperparameters to balance the contribution of each loss term.
5. Experimental Setup
-
Datasets:
- Amazon Product Reviews: Two subsets,
BeautyandToys & Games, which are standard public benchmarks for recommendation research. - Industrial Dataset: A large, in-house dataset from a major Southeast Asian e-commerce platform, containing 18 million users and 23 million interactions. Using a real-world dataset strengthens the paper's claims of practical applicability.
- Amazon Product Reviews: Two subsets,
-
Evaluation Metrics:
Recall@5(R@5):- Conceptual Definition: This metric measures the percentage of times the correct "next item" is found within the top 5 items recommended by the model. It evaluates whether the model can retrieve the correct item.
- Mathematical Formula:
- Symbol Explanation:
- : The set of users in the test set.
- : The indicator function, which is 1 if the condition is true and 0 otherwise.
- : The rank position of the true target item in the list of recommended items for user .
- : The cutoff for the recommendation list (in this case, 5).
NDCG@5(Normalized Discounted Cumulative Gain @ 5):- Conceptual Definition: This metric evaluates the quality of the ranking of the top 5 recommended items. It not only checks if the correct item is present but also gives more credit if it is ranked higher (e.g., at position 1 vs. position 5). It is a more fine-grained measure of ranking quality than Recall.
- Mathematical Formula:
- Symbol Explanation:
- : The relevance of the item at rank . In this context, it's 1 if the item is the correct target item, and 0 otherwise.
- : A discount factor that penalizes items ranked lower in the list.
DCG@K: Discounted Cumulative Gain, the raw score for the predicted ranking.IDCG@K: Ideal Discounted Cumulative Gain, the DCG score for the perfect ranking (i.e., the target item at rank 1). Normalizing by IDCG ensures the score is between 0 and 1.
-
Baselines:
- Backbone Models:
SASRec,BERT4Rec,FDSA, ,PinnerFormer. These are used to showCoCo's model-agnostic nature. - Prompt-based Fusion Baselines:
KARand , the most direct competitors that also use prompts for knowledge fusion. - Other LRS Baselines:
- Generative Models:
TIGER,COBRA. - Knowledge-Injected Models:
UniSRec,TALLRec,RecFormer. This comprehensive set of baselines covers various paradigms in the LRS field.
- Generative Models:
- Backbone Models:
6. Results & Analysis
-
Core Results (RQ1, RQ2):
Table 1: Comparison with Knowledge Fusion Baselines This table has been transcribed from the paper's text.
Backbone Method Beauty Toys Industrial Dataset R@5 Impr. N@5 Impr. R@5 Impr. N@5 Impr. R@5 Impr. N@5 Impr. BERT4Rec base 0.0203 — 0.0124 − 0.0116 — 0.0069 — 0.0418 − 0.0282 − KAR 0.0206 +1.48% 0.0127 +2.42% 0.0117 +0.86% 0.0070 +1.45% 0.0423 +1.20% 0.0284 +0.71% R4ec 0.0207 +1.97% 0.0129 +4.03% 0.0118 +1.72% 0.0071 +2.90% 0.0427 +2.15% 0.0285 +1.06% CoCo (Ours) 0.0213 +4.93% 0.0134 +8.06% 0.0121 +4.31% 0.0073 +5.80% 0.0439 +5.02% 0.0291 +3.19% PinnerFormer base 0.0516 0.0373 0.0585 0.0455 0.0697 0.0469 — KAR 0.0522 +1.16% 0.0377 +1.07% 0.0595 +1.71% 0.0458 +0.66% 0.0708 +1.58% 0.0473 +0.85% R4ec 0.0536 +3.88% 0.0383 +2.68% 0.0597 +2.05% 0.0460 +1.10% 0.0711 +2.01% 0.0475 +1.28% CoCo (Ours) 0.0549 +6.40% 0.0405 +8.58% 0.0634 +8.38% 0.0476 +4.62% 0.0752 +7.89% 0.0497 +5.97% (Note: The full table with all 5 backbones is in the paper; a representative subset is shown here for brevity.) Analysis:
CoCoconsistently and significantly outperforms the base models and theKARandR4ecbaselines across all datasets and all five backbone architectures. For example, with the strongPinnerFormerbackbone,CoCoachieves an 8.58% improvement on N@5 for the Beauty dataset, demonstrating its superiority and model-agnosticism.Table 2: Comparison with Other LRS Paradigms This table has been transcribed from the paper's text.
Method Beauty Toys Industrial Dataset R@5 N@5 R@5 N@5 R@5 N@5 UniSRec 0.0329 0.0248 0.0429 0.0292 0.0594 0.0385 TALLRec 0.0403 0.0295 0.0498 0.0327 0.0655 0.0421 Recformer 0.0439 0.0317 0.0512 0.0345 0.0640 0.0418 Tiger 0.0454 0.0321 0.0521 0.0371 0.0677 0.0446 COBRA 0.0537 0.0395 0.0619 0.0462 0.0716 0.0480 PinnerFormer + CoCo 0.0549(+2.23%) 0.0405(+2.53%) 0.0634(+2.42%) 0.0476(+3.03%) 0.0752(+5.03%) 0.0497(+3.54%) Analysis:
CoCointegrated withPinnerFormersurpasses all other LRS baselines, including strong generative models likeCOBRA. For instance, on the Industrial Dataset, it shows a 5.03% improvement in R@5 over the best baseline (COBRA). This highlights the benefit ofCoCo's deep, bidirectional fusion over other LRS paradigms. -
Ablations / Parameter Sensitivity (RQ4, RQ7):
Hyperparameter Impact:
- Prompt Pool Size (Figure 5): The model's performance peaks at . A smaller pool lacks diversity, while a larger pool introduces redundancy and noise, hurting performance. This shows the importance of a moderately sized, high-quality prompt library.
- Selection Threshold (Figure 6): Performance is optimal around . A low threshold allows too many irrelevant prompts, introducing noise. A high threshold may filter out too many prompts, leaving some users with insufficient semantic guidance. This demonstrates a trade-off between knowledge richness and knowledge quality.
Ablation Study (Table 3): This table has been transcribed from the paper's text.
Method Beauty Industrial Dataset R@5 N@5 R@5 N@5 CoCo 0.0549 0.0405 0.0752 0.0497 CoCo w/o Soft Prompts 0.0531 0.0390 0.0737 0.0487 CoCo w/o Decoupling 0.0539 0.0397 0.0738 0.0491 CoCo w/o Contradiction Elim. 0.0543 0.0400 0.0742 0.0493 Analysis: Removing any of the three core components of
CoColeads to a drop in performance, confirming their individual importance.- Removing dynamic soft prompts causes the largest drop, highlighting that personalized knowledge generation is crucial.
- Removing semantic decoupling also hurts performance, showing that disentangling mixed LLM outputs is beneficial.
- Removing contradiction elimination leads to a noticeable drop, proving the value of adaptively fine-tuning the LLM to align the semantic and behavioral spaces.
-
Online Experiments (RQ8):
Table 4: Online A/B Test Results This table has been transcribed from the paper's text.
Method Advertising Revenue GMV CTR CoCo +1.91% +0.64% +0.53% Analysis: The online A/B test provides the strongest validation. A 1.91% lift in advertising revenue and 0.64% in GMV are significant gains in a large-scale e-commerce environment. This demonstrates that the offline metric improvements translate directly into tangible business value.
7. Conclusion & Reflections
-
Conclusion Summary: The paper successfully identifies and addresses critical weaknesses in existing LLM-based recommender systems. The proposed
CoCoframework offers a powerful and novel solution by (1) moving from static to dynamic, personalized prompt generation, and (2) establishing a feedback loop for deep alignment of semantic and behavioral spaces through adaptive LLM fine-tuning. Its model-agnostic design, strong empirical results on both public and industrial datasets, and impressive online A/B test performance make it a significant contribution to the field of knowledge-enhanced recommendation. -
Limitations & Future Work: The authors propose two main directions for future research:
- Scaling Laws: Investigating how the performance of the
CoCoframework scales with even larger LLMs. - Fundamental Fusion Paradigms: Exploring new, even more deeply integrated architectures for LLMs and RSs.
- Scaling Laws: Investigating how the performance of the
-
Personal Insights & Critique:
- Novelty and Significance: The "Contradiction Elimination" module is the most innovative aspect of this work. It reframes the LLM from a static knowledge source into a dynamic, learnable component of the recommendation pipeline. This idea of having the RS "teach" the LLM about user behavior is a powerful concept with potential applications beyond recommendation.
- Practicality: The use of parameter-efficient methods like VQ for prompt selection and LoRA for fine-tuning is critical. It makes an otherwise computationally prohibitive idea (per-user adaptation of an LLM) feasible for large-scale industrial systems, as proven by their successful online deployment.
- Potential Weaknesses:
-
Complexity: The framework has many moving parts and hyperparameters (, LoRA rank, etc.), which could make it complex to implement and tune for new applications.
-
Computational Overhead: While more efficient than full fine-tuning, the process still involves inference and potential gradient computation through an LLM for each training batch. A detailed analysis of the trade-off between performance gain and increased computational cost/latency would be valuable.
-
Unusual Dating: The use of futuristic dates (2025, 2026) in a paper with a 2025 arXiv ID is unconventional and may be a typographical error, but it does not detract from the technical merit of the work.
Overall,
CoCopresents a sophisticated and well-executed solution to a key problem in modern recommender systems. Its design is thoughtful, its experiments are thorough, and its demonstrated real-world impact is compelling. It represents a clear step forward in the quest to synergistically combine the strengths of recommender systems and large language models.
-
Similar papers
Recommended via semantic vector search.