Data-to-Text Conversion (Personalized Prompts): All raw recommendation data is transformed into input-target
text pairs using a collection of predefined prompt templates. As shown in Figure 2, these templates contain placeholders {}
that are filled with specific user IDs, item IDs, interaction histories, or other metadata.
该图像是论文中关于三种不同推荐任务数据格式及其文本提示模板的示意图,包括评分/评论/解释生成(a),序列推荐(b)和直接推荐(c)。
For example, a rating prediction task might use the input prompt: "user123hasratedthefollowingitems:[item45,item67,...].Howwoulduser123rateitem89?" The target would be the actual rating, e.g., "5"
. The authors designed a large collection of such prompts covering five task families:
- Rating Prediction: Predict a score (e.g., 1-5).
- Sequential Recommendation: Predict the next item a user will interact with.
- Explanation Generation: Generate a text explaining why a user might like an item.
- Review Summarization: Summarize a long review into a short title.
- Direct Recommendation: Predict whether to recommend an item (yes/no) or rank a list of items.
Pretraining with a Unified Objective: The P5 model, based on the T5 encoder-decoder Transformer architecture, is trained on a mixture of these input-target text pairs from all tasks simultaneously.
该图像是P5模型中双向文本编码器与自回归文本解码器结构的示意图,展示了多层嵌入(Token Emb., Position Emb., Whole-word Emb.)输入如何经过编码器,最终由解码器生成推荐评分。
The model architecture (Figure 3) works as follows:
- Input Processing: The input text prompt is tokenized. The model uses three types of embeddings:
Token Embeddings
: Standard embeddings for each sub-word token.
Positional Embeddings
: To give the model information about the order of tokens in the sequence.
Whole-word Embeddings
: A special embedding added to tokens that are part of a personalized field (like a user ID user23 or item ID item7391). This helps the model recognize these IDs as single, coherent entities, even if they are broken into multiple sub-word tokens.
- Encoding: A bidirectional Transformer encoder processes the summed embeddings to create a contextualized representation of the input prompt.
- Decoding: An autoregressive Transformer decoder generates the target text token by token, attending to the encoder's output and the tokens it has already generated.