-
First, the three RQ codebook embeddings are fused into a single representation, IembRQ:
IembRQ=Fuse(RQ1,RQ2,RQ3)
The Fuse function could be concatenation followed by a linear layer, or simple averaging.
-
An adaptive transfer gate Tg is computed by a DNN using context features. This gate determines how to mix the id and RQ embeddings.
Tg=DNN(C,Up,X)
The intuition is that for warm items, the model should rely more on the rich collaborative signal in Iembid, while for cold items, it should rely more on the generalizable semantic information in IembRQ.
-
The two embeddings are fused to create a coarse-grained representation Iembc:
Iembc=Tg∗Iembid+(1−Tg)∗IembRQ
-
This transfer is guided by a directional KL-divergence loss, Ltrans, which enforces asymmetric knowledge transfer:
Ltrans=Tg∗KL(sg(Iembid),IembRQ)+(1−Tg)KL(Iembid,sg(IembRQ))
- Symbols & Intuition:
KL(P, Q): Kullback-Leibler divergence, measuring how distribution Q differs from a reference distribution P.
- sg(⋅): The
stop-gradient operator. It detaches a variable from the computation graph, so no gradients flow back through it. It effectively treats the variable as a fixed target.
- For warm items (where Tg is large), the first term dominates. It pushes the
RQ embedding (IembRQ) to match the distribution of the id embedding (Iembid), thereby transferring collaborative knowledge to the semantic representation.
- For cold items (where Tg is small), the second term dominates. It pushes the
id embedding (Iembid) to match the distribution of the RQ embedding (IembRQ), thereby injecting semantic knowledge into the sparse id representation.