Mathematical Formulas & Key Details:
The InfoNCE loss function is defined as:
LNCE(S(B),τ):=−i,j=0∑nlnσ(S(B),τ,i,j)whereσ(S,τ,i,j):=∑k=0neSi,k/τeSi,j/τ
- B: A batch of n query-document pairs.
S(B)
: A similarity matrix where Si,j is the cosine similarity between the embedding for the i-th query and the embedding for the j-th document.
- n: The batch size. A larger batch size provides more negative examples for contrastive learning.
- τ: The temperature, a hyperparameter that controls the sharpness of the distribution. A small value like the τ=0.05 used here makes the model more sensitive to differences in similarity scores, pushing it to better distinguish between positive and negative samples.
- σ(⋅): The softmax function applied over the similarity scores for a given query against all documents in the batch, scaled by the temperature. The overall loss aims to maximize the probability assigned to the correct (positive) document.