-
Effect of Reinforcement Learning: By comparing LREM (Cold Start+RL)
to LREM (Cold Start)
, the impact of the RL stage is clear. RL provides a substantial boost, increasing overall HitRate@6000
by 7.18% and Precision@100
by 5.23%. Figure 4 shows that the RL-tuned model produces more accurate and relevant reasoning steps, correcting errors made by the cold-start model and leading to better retrieval. For example, for "Meats That Pair Well with Brandy," the RL model correctly reasons about "Beef, Pork, Duck," while the cold-start model gave irrelevant suggestions.
该图像是论文中展示的图表,比较了LREM模型在Cold Start和Cold Start加RL两种配置下生成的链式推理(CoT)文本及对应的检索结果。图中展示了三组示例,每组左侧为Cold Start生成内容及结果,右侧为加强化学习后模型的改进表现。
-
Effect of CoT Content: This study quantitatively proves that the reasoning content is crucial.
(Manual transcription of Table 2 from the paper)
Methods |
HitRate@6000 |
Precision@100 |
LREM |
34.78 |
68.22 |
LREM (Empty-CoT) |
31.59 |
64.25 |
LREM (Random-CoT) |
30.16 |
62.32 |
LREM (Query-CoT) |
32.54 |
65.63 |
When the CoT is empty (Empty-CoT
), the model degenerates to a direct-embedding method, and performance drops significantly. When filled with random tokens (Random-CoT
), the noise harms performance even more. This confirms that the specific, meaningful content of the CoT is what drives the performance gain.
-
Effect of CoT Length: As shown in Figure 5, performance improves when the CoT length increases from 16 to 32, as a longer chain allows for more complete reasoning. However, performance degrades with even longer chains (48 or 64 tokens), likely because excessively long keyword lists introduce noise and dilute the semantic focus. The final model uses a length of 16 as a balance between performance and efficiency.
该图像是图表,展示了论文中图5关于LREM在不同链式思维(CoT)长度下的检索性能表现。图中纵轴分别为HitRate@6000和Precision@100,横轴为CoT长度,结果显示在CoT长度为32时性能最佳。