Solution: The paper proposes chain-of-forward training. Periodically during the training process, the model performs several forward passes in a row, using its own predictions as input for the next step.
该图像是论文中的示意图,展示了Epona模型的训练过程。图中x代表图像潜变量或轨迹,描绘了链式前向(Chain-of-Forward)和矫正流损失(Rectified Flow Loss)两个训练步骤。
As shown in Figure 4, instead of just predicting x^T+1 from x1,…,xT, the model also predicts x^T+2 from x1,…,xT,x^T+1. To make this efficient, it doesn't run the full diffusion sampling process. Instead, it estimates the final denoised latent x^(0) in a single step using the predicted velocity:
x^(0)=x(t)+tvΘ(x(t),t)
This one-step prediction, while noisy, simulates the kind of errors that occur during inference, teaching the model to be more robust.