AiPaper
Paper status: completed

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System

Published:09/23/2025
Original LinkPDF
Price: 0.10
Price: 0.10
2 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

OnePiece integrates context engineering and multi-step reasoning into industrial ranking systems, enhancing existing Transformer models. Key innovations include structured context engineering and progressive multi-task training, leading to significant performance improvements in

Abstract

Despite the growing interest in replicating the scaled success of large language models (LLMs) in industrial search and recommender systems, most existing industrial efforts remain limited to transplanting Transformer architectures, which bring only incremental improvements over strong Deep Learning Recommendation Models (DLRMs). From a first principle perspective, the breakthroughs of LLMs stem not only from their architectures but also from two complementary mechanisms: context engineering, which enriches raw input queries with contextual cues to better elicit model capabilities, and multi-step reasoning, which iteratively refines model outputs through intermediate reasoning paths. However, these two mechanisms and their potential to unlock substantial improvements remain largely underexplored in industrial ranking systems. In this paper, we propose OnePiece, a unified framework that seamlessly integrates LLM-style context engineering and reasoning into both retrieval and ranking models of industrial cascaded pipelines. OnePiece is built on a pure Transformer backbone and further introduces three key innovations: (1) structured context engineering, which augments interaction history with preference and scenario signals and unifies them into a structured tokenized input sequence for both retrieval and ranking; (2) block-wise latent reasoning, which equips the model with multi-step refinement of representations and scales reasoning bandwidth via block size; (3) progressive multi-task training, which leverages user feedback chains to effectively supervise reasoning steps during training. OnePiece has been deployed in the main personalized search scenario of Shopee and achieves consistent online gains across different key business metrics, including over +2%+2\% GMV/UU and a +2.90%+2.90\% increase in advertising revenue.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System

1.2. Authors

The paper is co-authored by a team from multiple institutions:

  • Sunhao Dai and Jiakai Tang from Renmin University of China.

  • Jiahua Wu, Kunwang, Yuxuan Zhu, Bingjun Chen, Bangyang Hong, Yu Zhao, Cong Fu, Kangle Wu, Yabo Ni, and Anxiang Zeng from Shopee.

  • Wenjie Wang from the University of California, San Diego.

  • Xu Chen and Jun Xu from Renmin University of China.

  • See-Kiong Ng from the National University of Singapore.

    The affiliations indicate a strong collaboration between academia (Renmin University of China, UCSD, NUS) and industry (Shopee), suggesting a paper with both theoretical grounding and practical deployment relevance.

1.3. Journal/Conference

The paper is published as a preprint on arXiv.

  • Original Source Link: https://arxiv.org/abs/2509.18091
  • PDF Link: https://arxiv.org/pdf/2509.18091v1.pdf
  • Publication Status: As of 2025-09-22T17:59:07.000Z, it is a preprint, indicating it has not yet undergone formal peer review or been published in a specific journal or conference proceedings. However, given the nature of the work and the affiliations, it is likely intended for a top-tier conference in information retrieval, data mining, or artificial intelligence.

1.4. Publication Year

2025 (Published at UTC: 2025-09-22T17:59:07.000Z)

1.5. Abstract

The paper introduces OnePiece, a unified framework designed to integrate large language model (LLM)-style context engineering and multi-step reasoning into industrial cascaded ranking systems, specifically for both retrieval and ranking models. The authors observe that existing industrial efforts primarily transplant Transformer architectures, yielding only incremental improvements over strong Deep Learning Recommendation Models (DLRMs). They argue that the success of LLMs stems from two complementary mechanisms—context engineering (enriching inputs with contextual cues) and multi-step reasoning (iteratively refining outputs)—which remain underexplored in industrial ranking.

OnePiece is built on a pure Transformer backbone and features three key innovations:

  1. Structured Context Engineering: Augments user interaction history with preference anchors (auxiliary item sequences from domain knowledge) and situational descriptors (user profiles, query context), unifying them into a structured tokenized input sequence for both retrieval and ranking.

  2. Block-Wise Latent Reasoning: Equips the model with multi-step refinement of representations, scaling reasoning bandwidth via block size. This allows for iterative enhancement of hidden states.

  3. Progressive Multi-Task Training: Leverages natural user feedback chains (e.g., click, add-to-cart, order) to supervise reasoning steps effectively during training, aligning earlier steps with weak signals and later steps with stronger, sparser signals.

    OnePiece has been deployed in Shopee's main personalized search scenario, demonstrating consistent online gains across key business metrics, including over +2+2% GMV/UU (Gross Merchandise Volume per Unique User) and a +2.90+2.90% increase in advertising revenue. Extensive offline experiments also validate the effectiveness of each core design, showing higher sample efficiency and better scaling with larger training spans compared to baselines.

https://arxiv.org/abs/2509.18091

https://arxiv.org/pdf/2509.18091v1.pdf

2. Executive Summary

2.1. Background & Motivation

The core problem the paper addresses is the limited success of current industrial search and recommender systems in fully replicating the breakthroughs seen in large language models (LLMs). While Transformer architectures have been widely adopted, they often provide only incremental improvements over existing strong Deep Learning Recommendation Models (DLRMs). This suggests that merely transplanting architectures is insufficient.

This problem is important because industrial ranking systems are crucial for e-commerce, content platforms, and other digital services, directly impacting user experience and business revenue. Achieving significant performance leaps in these systems could unlock immense value.

The paper identifies specific challenges and gaps:

  1. Limited LLM Mechanism Adoption: The authors argue that the true breakthroughs of LLMs come not just from their Transformer architectures, but from two complementary mechanisms: context engineering and multi-step reasoning. These mechanisms, which significantly expand LLMs' generalization and capability, remain largely underexplored in industrial ranking systems.

  2. Input Context Construction: Current Transformer-based industrial models primarily rely on raw user-item interaction sequences, which lack the rich, structured context of LLM-style prompts. There's a gap in how to effectively enrich context for ranking models to enable reasoning.

  3. Optimization of Multi-Step Reasoning: Unlike LLMs, where chain-of-thought annotations provide explicit supervision for reasoning, industrial ranking systems lack such direct supervision. It's difficult to articulate or supervise the latent decision paths underlying user behaviors, making it challenging to train multi-step reasoning.

    The paper's entry point and innovative idea is to systematically integrate these two underexplored but powerful LLM mechanisms—context engineering and multi-step reasoning—into the specific context of industrial cascaded ranking pipelines, tailoring them to the unique characteristics and constraints of recommendation tasks.

2.2. Main Contributions / Findings

The paper's primary contributions are:

  1. First Deployment of LLM Mechanisms in Industrial Ranking: To the best of the authors' knowledge, this is the first work to explore and successfully deploy context engineering and multi-step reasoning in industrial-scale ranking systems, achieving significant improvements over strong DLRM baselines in both retrieval and ranking tasks.

  2. Proposed OnePiece Framework: The paper introduces OnePiece, a unified framework built on a pure Transformer backbone that integrates:

    • Structured Context Engineering: Augments user interaction history with preference anchors and situational descriptors, unifying them into a structured tokenized input sequence for both retrieval and ranking.
    • Block-Wise Latent Reasoning: Equips the model with multi-step refinement of representations, allowing for adjustable reasoning bandwidth via block size.
    • Progressive Multi-Task Training: A strategy that leverages user feedback chains (e.g., click, add-to-cart, order) to effectively supervise reasoning steps during training, aligning tasks of increasing complexity to successive reasoning steps.
  3. Extensive Offline and Online Validation: The paper conducts comprehensive evaluations, including large-scale A/B testing in Shopee's main personalized search scenario. These experiments validate the effectiveness of each design choice, demonstrate favorable scaling and efficiency properties, and confirm the practicality of deploying OnePiece in real-world industrial environments.

    The key conclusions and findings are:

  • OnePiece significantly outperforms strong baselines like DLRM, HSTU, and ReaRec across various retrieval and ranking metrics offline.
  • Structured context engineering, particularly the preference anchors, provides substantial improvements by enriching user context with domain knowledge and query-specific signals.
  • Block-wise latent reasoning consistently enhances performance by enabling finer-grained preference refinement through multi-step processing.
  • Progressive multi-task training is crucial for effectively supervising intermediate reasoning steps, preventing gradient conflicts, and allowing each reasoning block to develop specialized capabilities for tasks of increasing complexity.
  • OnePiece exhibits higher sample efficiency and better scaling capabilities with increasing training data compared to baselines.
  • Online A/B testing demonstrates consistent and significant business gains, including over +2+2% GMV/UU and +2.90+2.90% advertising revenue, along with improved recall coverage and exclusive contribution, proving its real-world impact and efficiency.

3. Prerequisite Knowledge & Related Work

This section provides foundational knowledge necessary for understanding the OnePiece paper, followed by a summary of related work and a differentiation analysis.

3.1. Foundational Concepts

3.1.1. Transformer Architecture

The Transformer (Vaswani et al., 2017) is a neural network architecture that revolutionized sequence transduction tasks, particularly in natural language processing. It is distinct from recurrent neural networks (RNNs) and convolutional neural networks (CNNs) in its heavy reliance on the attention mechanism to draw global dependencies between input and output.

  • Self-Attention: This mechanism allows the model to weigh the importance of different words in an input sequence when encoding a particular word. For each token, it computes three vectors: a Query (Q), a Key (K), and a Value (V). The attention score is calculated by taking the dot product of the Query vector with all Key vectors, followed by a scaling factor and a softmax function to get weights. These weights are then applied to the Value vectors to produce the output for that token. The core formula for Attention is: Attention(Q,K,V)=softmax(QKTdk)V \mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

    • QQ: Query matrix. It contains the query vectors for all tokens in the sequence.
    • KK: Key matrix. It contains the key vectors for all tokens.
    • VV: Value matrix. It contains the value vectors for all tokens.
    • dkd_k: Dimension of the Key vectors. This term is used to scale the dot products, preventing them from becoming too large and pushing the softmax function into regions with tiny gradients.
    • softmax\mathrm{softmax}: A function that converts a vector of numbers into a probability distribution, ensuring all elements are between 0 and 1 and sum to 1.
    • QKTQK^T: The dot product between queries and keys, representing how much each query should attend to each key.
  • Multi-Head Self-Attention (MHSA): Instead of performing a single attention function, MHSA linearly projects the Queries, Keys, and Values multiple times with different learned linear projections to hh different sets. Then, hh parallel attention functions are applied. Each of these attention heads learns to focus on different parts of the input sequence. The outputs from these hh heads are then concatenated and linearly transformed to produce the final output. This allows the model to capture diverse contextual information from different representation subspaces.

  • Feed-Forward Network (FFN): After the attention mechanism, each position in the sequence passes through an identical, independently applied position-wise feed-forward network. This typically consists of two linear transformations with a ReLU activation in between.

  • Layer Normalization (LN): Applied after the output of each sub-layer (attention and FFN) and before the residual connection. Layer Normalization normalizes the inputs across the features for each sample independently, helping stabilize the training process.

  • Positional Embeddings: Since Transformers do not inherently process sequences in order (unlike RNNs), positional embeddings are added to the input embeddings to inject information about the relative or absolute position of tokens in the sequence. These can be learned or fixed (e.g., sinusoidal functions).

3.1.2. Large Language Models (LLMs)

LLMs are a class of neural networks, typically based on the Transformer architecture with billions of parameters, trained on vast amounts of text data. They exhibit emergent capabilities such as text generation, summarization, translation, and question answering.

  • Scaling Laws: The performance of LLMs often scales predictably with model size, dataset size, and computational budget.
  • Context Engineering (Prompt Engineering): Refers to the art and science of designing effective inputs (prompts) to guide LLMs towards desired outputs. This involves structuring the input, providing examples (in-context learning), and incorporating specific instructions or external knowledge.
  • Multi-Step Reasoning (Chain-of-Thought): A technique where LLMs are prompted to break down complex problems into intermediate steps, explicitly showing their reasoning process. This improves performance on complex reasoning tasks by making the model's "thought process" more explicit and allowing for self-correction.

3.1.3. Industrial Search and Recommender Systems

These systems aim to provide users with relevant items (products, articles, videos, etc.) from a vast corpus.

  • Cascade Ranking Paradigm: A dominant approach in large-scale industrial systems due to its efficiency. It organizes the decision process into multiple stages:

    1. Retrieval Stage: The first stage, which efficiently selects a small set of highly relevant candidate items from a massive corpus (e.g., millions or billions of items). It uses lightweight models and aims for high recall (not missing potentially relevant items).
    2. Pre-ranking Stage (Optional): An intermediate stage that further filters the retrieved candidates to a smaller set (e.g., from thousands to hundreds) using slightly more complex models.
    3. Ranking Stage: The final stage, which takes a refined, smaller set of candidates and uses sophisticated, computationally expensive models to precisely estimate relevance scores and produce a finely ordered list for display to the user. It prioritizes precision.
  • Dual-Tower Architecture (for Retrieval): Common in retrieval, it consists of two independent neural networks (towers): one for encoding the user query/context and another for encoding items. The output of each tower is a dense vector (embedding). Retrieval is performed by finding items whose embeddings are "close" to the user/query embedding in a vector space, typically using dot product or cosine similarity. This allows pre-computation of item embeddings, enabling fast Approximate Nearest Neighbor (ANN) search.

  • Single-Tower Architecture (for Ranking): Common in ranking, it jointly encodes the user context, query, and candidate item within a single network. This allows for rich, explicit interactions between user, query, and item features, leading to more accurate preference prediction. However, it's computationally more expensive as it must be run for each candidate item.

  • Approximate Nearest Neighbor (ANN) Search: A family of algorithms used to find data points that are "closest" (most similar) to a given query point in a high-dimensional space, without exhaustively checking every single point. This is crucial for efficient retrieval from large item corpora. Examples include HNSW (Hierarchical Navigable Small World) used in the paper.

3.1.4. Deep Learning Recommendation Models (DLRMs)

DLRMs (Naumov et al., 2019) are a class of deep learning models widely used in industrial recommendation systems. They typically handle both sparse (categorical, e.g., user ID, item ID) and dense (continuous, e.g., price, age) features.

  • Embeddings: Categorical features are converted into dense embedding vectors.
  • Feature Interactions: DLRMs often employ techniques to model interactions between features, such as cross-network components (e.g., DCN, DCNv2) or attention mechanisms (e.g., DIN-like attention) to focus on relevant historical items.
  • MLP (Multi-Layer Perceptron): A series of fully connected layers used to combine processed features and output the final prediction score.

3.2. Previous Works

The paper compares OnePiece against several representative baselines and draws inspiration from related work.

  • DLRM (Naumov et al., 2019): This is Shopee's production baseline, a highly optimized hybrid model integrating various state-of-the-art components.
    • Retrieval Mode (Two-Tower): Uses a DSSM (Deep Structured Semantic Models) (Huang et al., 2013) inspired dual-tower design. User context (query, history) is encoded separately from items. Features include DIN-like attention (Zhou et al., 2018) and zero-attention (Ai et al., 2019) for relevance. Lightweight text CNN for keyword features, DCNv2 (Wang et al., 2021) for high-order cross features. Sequential features are aggregated via mean pooling and fused with other features using an MLP.
    • Ranking Mode (Single-Tower): Incorporates candidate items into a single tower with user features. Uses ResFlow (Fu et al., 2024) as backbone, combined with DIN-like target attention and cross-attention across sequential behaviors. DCNv2 again for higher-order interactions, followed by MLP fusion. A SENet (Hu et al., 2017) module supports adaptive feature selection.
  • HSTU (Zhai et al., 2024): A representative generative recommendation framework from Meta. It typically focuses on Interaction History (IH) and Situational Descriptors (SD). For fair comparison, OnePiece aligns its parameter size and adapts a HSTU+PAHSTU+PA variant by introducing Preference Anchors (PA) as well.
  • ReaRec (Tang et al., 2025): A reasoning-enhanced recommendation model that formulates user representation modeling as a multi-step reasoning process over item sequences.
    • The vanilla ReaRec supports retrieval with user interaction history. OnePiece adapts its backbone and feature inputs.
    • For ranking, ReaRec is adapted by introducing candidate items with a target-aware attention mask, meaning sequence tokens can attend to the candidate, but candidates are mutually invisible. ReaRec+PAReaRec+PA also augments IH and SD with Preference Anchors.
  • CLIP (Radford et al., 2021): A model that learns transferable visual models from natural language supervision using a bidirectional contrastive learning objective. OnePiece draws inspiration from CLIP for its Bidirectional Contrastive Learning (BCL) objective in retrieval mode.

3.3. Technological Evolution

The field of industrial ranking systems has seen a progressive integration of advanced modeling techniques:

  1. Early Systems (Rule-based, Collaborative Filtering): Initial systems relied on hand-crafted rules, content-based filtering, or collaborative filtering (e.g., item-to-item similarity, user-to-user similarity). These were simple but struggled with scalability and cold-start problems.

  2. Feature Engineering & Traditional Machine Learning: Introduction of extensive feature engineering combined with traditional ML models like logistic regression or gradient boosting decision trees (GBDT).

  3. Deep Learning (DLRMs, DSSM): The advent of deep learning brought models like DSSM for retrieval and DLRMs for ranking. These models excel at learning complex feature interactions and representations, handling sparse and dense features efficiently. Architectures like DIN (Deep Interest Network) introduced attention mechanisms to model dynamic user interests.

  4. Sequential Recommendation (RNNs, Transformers): Recognizing the importance of user behavior sequences, RNN-based models (e.g., GRU4Rec) emerged, followed by Transformer-based models like SASRec (Self-Attentive Sequential Recommendation) and BERT4Rec. These models leverage self-attention to capture long-range dependencies in user interaction histories, leading to more personalized recommendations.

  5. LLM-Inspired Architectures (Current Trend): The recent success of LLMs has inspired researchers to transplant Transformer-based architectures into recommendation. However, as OnePiece points out, many of these efforts focus only on the architecture, yielding incremental gains.

    OnePiece fits into this timeline by pushing the LLM-inspired trend further. Instead of just adopting Transformer backbones, it systematically integrates the mechanisms behind LLM success (context engineering and multi-step reasoning), which were largely overlooked in previous Transformer-based recommendation models. This represents a significant step towards more intelligent and adaptive industrial ranking systems.

3.4. Differentiation Analysis

Compared to the main methods in related work, OnePiece presents several core differences and innovations:

  • Beyond Architectural Transplant: Unlike many Transformer-based models (HSTU, SASRec, BERT4Rec) that primarily transplant the architecture, OnePiece explicitly focuses on integrating the mechanisms of context engineering and multi-step reasoning from LLMs. This is a more principled approach to leveraging LLM breakthroughs.
  • Unified Framework for Retrieval and Ranking: OnePiece provides a single, unified framework that seamlessly integrates these LLM mechanisms into both the retrieval and ranking stages of a cascaded pipeline. This contrasts with approaches that typically optimize each stage independently with different model designs.
  • Structured Context Engineering:
    • Most Transformer-based industrial models primarily use raw user-item interaction sequences. OnePiece enriches this by introducing Preference Anchors (PA) (auxiliary item sequences based on domain knowledge, like top-clicked items for a query) and Situational Descriptors (SD) (user profiles, query context).
    • This structured approach provides richer contextual cues than plain interaction history, addressing the "lack of structural richness" challenge compared to LLM-style prompts.
    • Even HSTU+PAHSTU+PA and ReaRec+PAReaRec+PA incorporate PA, but OnePiece's overall context engineering is part of a more integrated system with block-wise reasoning.
  • Block-Wise Latent Reasoning:
    • While ReaRec also employs multi-step reasoning, OnePiece introduces block-wise latent reasoning. This means that instead of recycling a single hidden state across iterations (which might overly compress information), OnePiece iteratively refines a set of hidden states (a block), offering adjustable reasoning bandwidth.
    • This design provides greater flexibility and a better balance between information compression and retention, potentially leading to more expressive representations.
  • Progressive Multi-Task Training:
    • To supervise the multi-step reasoning process effectively without chain-of-thought annotations, OnePiece introduces a progressive multi-task training strategy. It leverages natural user feedback chains (e.g., click, add-to-cart, order) as staged supervision signals, assigning tasks of increasing complexity to successive reasoning blocks.
    • This differs from traditional multi-task learning where all tasks might supervise a single final representation, and from models like ReaRec that might use a single task or simpler supervision for reasoning. This progressive approach provides rich process supervision, helping each reasoning step develop specialized capabilities and mitigating gradient conflicts.
  • Enhanced Inter-Candidate Interaction in Ranking: For the ranking stage, OnePiece explicitly models cross-candidate interactions within small Candidate Item Set (CIS) groups (grouped setwise strategy) by making them jointly visible. This is a significant improvement over pointwise models or even adapted ReaRec where candidates remain mutually invisible. This allows the model to compare candidates directly, which is crucial for fine-grained ranking.
  • Online Deployment and Efficiency: OnePiece is designed and optimized for large-scale industrial deployment, demonstrating significant online gains and superior hardware utilization compared to DLRM baselines, validating its practicality and efficiency in real-world scenarios.

4. Methodology

4.1. Principles

The core idea behind OnePiece is to systematically adapt and integrate two fundamental mechanisms that have driven the success of Large Language Models (LLMs)—context engineering and multi-step reasoning—into the specific domain of industrial cascaded ranking systems. The theoretical basis and intuition are as follows:

  1. Context Engineering: LLMs demonstrate that rich, structured input contexts (prompts) are crucial for eliciting their full capabilities. In ranking systems, traditional input sequences often lack this richness. The principle here is that by augmenting raw user interaction history with preference anchors (domain-specific reference points, e.g., top-clicked items) and situational descriptors (user and query context), the model can be provided with more informative cues. This enriched context helps the model better understand user intent and scenario specifics, analogous to how well-crafted prompts guide an LLM.

  2. Multi-Step Reasoning: LLMs solve complex problems by breaking them down into intermediate, iterative reasoning steps (chain-of-thought). The intuition is that user preference modeling in recommendation is also a complex task that benefits from iterative refinement. Instead of a single-shot prediction, a model can progressively refine its understanding of user preferences and item relevance. OnePiece proposes a block-wise latent reasoning mechanism, where hidden representations are iteratively updated, allowing for a deeper, more nuanced understanding that builds upon previous steps. This addresses the limitation of single-unit reasoning, which might overly compress signals.

  3. Supervision for Reasoning: A key challenge in applying multi-step reasoning to ranking is the lack of explicit "thought process" annotations. OnePiece addresses this by leveraging naturally occurring user feedback chains (e.g., exposure -> click -> add-to-cart -> order) as a form of progressive multi-task supervision. The principle is that these feedback chains represent a natural curriculum of increasing user commitment and task complexity. By assigning tasks of varying complexity to different reasoning steps, the model learns to develop specialized capabilities at each stage, guiding the latent reasoning process effectively.

    By unifying these principles within a Transformer-based backbone, OnePiece aims to enhance context-awareness and reasoning depth across both retrieval and ranking stages of industrial systems, moving beyond incremental architectural adaptations to fundamental capability improvements.

4.2. Core Methodology In-depth (Layer by Layer)

OnePiece is a unified framework combining structured context engineering, block-wise latent reasoning, and a progressive multi-task training strategy. Figure 2 illustrates its overall architecture in both retrieval and ranking modes.

Figure 2 | Overall architecture of the proposed OnePiece framework. Retrieval Mode (a) and Ranking Mode (b) both employ structured context engineering to construct unified input tokens, utilize block…
该图像是OnePiece框架的整体架构示意图,展示了检索模式(a)和排名模式(b)。两种模式均采用结构化上下文工程来构建统一输入标记,利用块状潜在推理通过多步推理逐步增强表示,并通过渐进式多任务训练进行优化。

Figure 2 | Overall architecture of the proposed OnePiece framework. Retrieval Mode (a) and Ranking Mode (b) both employ structured context engineering to construct unified input tokens, utilize block-wise latent reasoning to iteratively enhance representations across multiple reasoning steps, and are optimized through progressive multi-task training sy.

Both modes utilize structured context engineering to create unified input tokens, which are then processed by a Transformer-based backbone equipped with block-wise latent reasoning to iteratively refine representations. The entire system is optimized using a progressive multi-task training strategy.

4.2.1. Context Engineering

The first step in OnePiece is to transform all heterogeneous inputs into a unified token sequence that can be processed by a Transformer backbone. This is achieved through four complementary token types: Interaction History (IH), Preference Anchors (PA), Situational Descriptors (SD), and Candidate Item Set (CIS). Figure 3 provides a visual representation of this design.

Figure 3 | Context engineering and tokenizer design for input token sequences in OnePiece. Both retrieval and ranking share the same construction of interaction history (IH), preference anchors (PA),…
该图像是示意图,展示了OnePiece中检索模式和排名模式下的输入标记序列。图中包括了交互历史、偏好锚点和情境描述符的构建,同时在排名模式中增加了候选项目集的标记,支持单塔架构下的联合评分。

Figure 3 | Context engineering and tokenizer design for input token sequences in OnePiece. Both retrieval and ranking share the same construction of interaction history (IH), preference anchors (PA), and situational descriptors (SD). The key difference is that ranking additionally incorporates candidate item set (CIS) tokens, enabling joint scoring within the single-tower architecture.

Following the problem formulation, a user uu has feature representation u\mathbf{u}, a query qq has q\mathbf{q}, and an item ν\nu has v\mathbf{v}. Entity-specific embedding functions ϕuser()\phi_{\mathrm{user}}(\cdot), ϕquery()\phi_{\mathrm{query}}(\cdot), and ϕitem()\phi_{\mathrm{item}}(\cdot) map these entities' features (categorical and continuous) into concatenated embedding vectors. To unify these into the dd-dimensional hidden space of the Transformer backbone, lightweight projection layers Proj are used. Specifically, Projuser\mathrm{Proj}_{\mathrm{user}}, Projquery\mathrm{Proj}_{\mathrm{query}}, and Projcand\mathrm{Proj}_{\mathrm{cand}} are defined, each mapping its input dimension to Rd\mathbb{R}^d. IH and PA components share a common projection layer, Projshared\mathrm{Proj}_{\mathrm{shared}}.

Let's detail each component of the input token sequence:

Interaction History (IH)

The IH component encodes the user's historical item interactions Su=(v1u,,vnuu)S^u = ( \mathbf{v}_1^u, \ldots, \mathbf{v}_{n_u}^u ) in chronological order. Each item descriptor is embedded using the shared projection layer: ztIH=Projshared(ϕitem(vtu))Rd. \begin{array}{r} \mathbf{z}_t^{\mathrm{IH}} = \mathrm{Proj}_{\mathrm{shared}} \left( \phi_{\mathrm{item}} ( \mathbf{v}_t^u ) \right) \in \mathbb{R}^d . \end{array}

  • ztIH\mathbf{z}_t^{\mathrm{IH}}: The embedding of the tt-th item in the user's interaction history.

  • Projshared()\mathrm{Proj}_{\mathrm{shared}}(\cdot): A shared projection layer that maps the item's raw feature embedding to the model's hidden dimension dd.

  • ϕitem(vtu)\phi_{\mathrm{item}}(\mathbf{v}_t^u): The raw feature representation of the tt-th item vtu\mathbf{v}_t^u, which includes its ID and associated content information.

  • Rd\mathbb{R}^d: The dd-dimensional hidden space of the backbone model.

    Temporal information is then incorporated by adding learnable positional embeddings: htIH=ztIH+ptIH,1tnu, \begin{array}{r} \mathbf{h}_t^{\mathrm{IH}} = \mathbf{z}_t^{\mathrm{IH}} + \mathbf{p}_t^{\mathrm{IH}}, \quad 1 \leq t \leq n_u, \end{array}

  • htIH\mathbf{h}_t^{\mathrm{IH}}: The final token embedding for the tt-th interaction, combining content and temporal information.

  • ptIHRd\mathbf{p}_t^{\mathrm{IH}} \in \mathbb{R}^d: The learnable positional embedding for the tt-th interaction in the sequence.

Preference Anchors (PA)

Preference Anchors are auxiliary item sequences constructed based on domain knowledge (e.g., top-clicked items under the current query). These anchors provide high-quality reference points, injecting inductive biases and guiding the model towards plausible prediction directions. For a given user uu and query qq, BB anchor groups Au={A1u,,ABu}\mathcal{A}^u = \{A_1^u, \ldots, A_B^u\} are provided, where each group Abu=(vb,1PA,,vb,mbPA)A_b^u = (\mathbf{v}_{b,1}^{\mathrm{PA}}, \ldots, \mathbf{v}_{b,m_b}^{\mathrm{PA}}) contains mbm_b items. The token embedding for the jj-th item in the bb-th anchor group is computed similarly to IH items: zb,jPA=Projshared(ϕitem(vb,jPA))Rd,hb,jPA=zb,jPA+pjPA. \begin{array}{r} \mathbf{z}_{b,j}^{\mathrm{PA}} = \mathrm{Proj}_{\mathrm{shared}} \left( \phi_{\mathrm{item}} \left( \mathbf{v}_{b,j}^{\mathrm{PA}} \right) \right) \in \mathbb{R}^d, \quad \mathbf{h}_{b,j}^{\mathrm{PA}} = \mathbf{z}_{b,j}^{\mathrm{PA}} + \mathbf{p}_j^{\mathrm{PA}} . \end{array}

  • zb,jPA\mathbf{z}_{b,j}^{\mathrm{PA}}: The embedding of the jj-th item in the bb-th anchor group.

  • hb,jPA\mathbf{h}_{b,j}^{\mathrm{PA}}: The final token embedding for the jj-th item in the bb-th anchor group.

  • pjPARd\mathbf{p}_j^{\mathrm{PA}} \in \mathbb{R}^d: The positional embedding for the jj-th item within its group.

    To preserve the group structure, each anchor group is wrapped with learnable boundary tokens: e_BOS (Beginning of Sequence) and e_EOS (End of Sequence), both Rd\in \mathbb{R}^d. The final token sequence for each anchor group is: (eBOS, hb,1PA,, hb,mbPA, eEOS). ( \mathbf{e}_{\mathrm{BOS}}, \ \mathbf{h}_{b,1}^{\mathrm{PA}}, \ldots, \ \mathbf{h}_{b,m_b}^{\mathrm{PA}}, \ \mathbf{e}_{\mathrm{EOS}} ) .

Situational Descriptors (SD)

Situational Descriptors capture non-item information relevant to the ranking task, such as static user features and query-specific information. For the user uu with features u\mathbf{u}, the embedding is: zU=Projuser(ϕuser(u))Rd,hU=zU+pkU. \begin{array}{r} \mathbf{z}^{\mathrm{U}} = \mathrm{Proj}_{\mathrm{user}} \big ( \phi_{\mathrm{user}} ( \mathbf{u} ) \big ) \in \mathbb{R}^d, \quad \mathbf{h}^{\mathrm{U}} = \mathbf{z}^{\mathrm{U}} + \mathbf{p}_k^{\mathrm{U}} . \end{array}

  • zU\mathbf{z}^{\mathrm{U}}: The projected user embedding.

  • hURd\mathbf{h}^{\mathrm{U}} \in \mathbb{R}^d: The final user token embedding.

  • Projuser()\mathrm{Proj}_{\mathrm{user}}(\cdot): A projection layer for user features.

  • ϕuser(u)\phi_{\mathrm{user}}(\mathbf{u}): The raw feature representation of the user.

  • pkURd\mathbf{p}_k^{\mathrm{U}} \in \mathbb{R}^d: The positional embedding for the user token at position kk.

    Similarly, for the query qq with features q\mathbf{q} (omitted in recommendation scenarios without explicit queries): zQ=Projquery(ϕquery(q))Rd,hQ=zQ+pkQ. \begin{array}{r} \mathbf{z}^{\mathrm{Q}} = \mathrm{Proj}_{\mathrm{query}} \left( \phi_{\mathrm{query}} ( \mathbf{q} ) \right) \in \mathbb{R}^d, \quad \mathbf{h}^{\mathrm{Q}} = \mathbf{z}^{\mathrm{Q}} + \mathbf{p}_k^{\mathrm{Q}} . \end{array}

  • zQ\mathbf{z}^{\mathrm{Q}}: The projected query embedding.

  • hQRd\mathbf{h}^{\mathrm{Q}} \in \mathbb{R}^d: The final query token embedding.

  • Projquery()\mathrm{Proj}_{\mathrm{query}}(\cdot): A projection layer for query features.

  • ϕquery(q)\phi_{\mathrm{query}}(\mathbf{q}): The raw feature representation of the query.

  • pkQRd\mathbf{p}_k^{\mathrm{Q}} \in \mathbb{R}^d: The positional embedding for the query token at position kk.

Candidate Item Set (CIS, Ranking Mode Only)

In the ranking stage, OnePiece adopts a grouped setwise strategy to balance efficiency and expressiveness. The retrieved candidate set V\mathcal{V}' is randomly partitioned into smaller groups of size CC (e.g., 12). Each group is processed independently, allowing intra-group interaction among candidates. Given a candidate group Cu={v1CIS,,vCCIS}C^u = \{ \mathbf{v}_1^{\mathrm{CIS}}, \ldots, \mathbf{v}_C^{\mathrm{CIS}} \}, each candidate item is embedded as: ziCIS=Projcand(ϕitem(viCIS))Rd. \begin{array}{r} \mathbf{z}_i^{\mathrm{CIS}} = \mathrm{Proj}_{\mathrm{cand}} \left( \phi_{\mathrm{item}} ( \mathbf{v}_i^{\mathrm{CIS}} ) \right) \in \mathbb{R}^d . \end{array}

  • ziCIS\mathbf{z}_i^{\mathrm{CIS}}: The projected embedding of the ii-th candidate item in the group.

  • Projcand()\mathrm{Proj}_{\mathrm{cand}}(\cdot): A projection layer for candidate item features.

    Crucially, positional embeddings are deliberately excluded for candidate tokens to prevent the model from learning spurious correlations between position and relevance labels: hiCIS=ziCIS,1iC. \begin{array}{r} \mathbf{h}_i^{\mathrm{CIS}} = \mathbf{z}_i^{\mathrm{CIS}}, \quad 1 \leq i \leq C . \end{array}

  • hiCIS\mathbf{h}_i^{\mathrm{CIS}}: The final token embedding for the ii-th candidate item, which is simply its projected content embedding.

Sequence Packing and Ordering

Let \oplus denote concatenation of token subsequences. The final input sequence to the backbone model is constructed by packing these components according to fixed ordering rules:

  • Retrieval Mode: The input token sequence Iretrievalu\boldsymbol{\mathcal{I}}_{\mathrm{retrieval}}^u is constructed as: Iretrievalu=(h1IH,,hnuIH)chronologicalIHb=1B(eBOS,hb,1PA,,hb,mbPA,eEOS)PAgroupsorderedbybusinessrule(hU, hQ,)SDsegment. \begin{array}{r} \boldsymbol{\mathcal{I}}_{\mathrm{retrieval}}^u = \underbrace{ ( \mathbf{h}_1^{\mathrm{IH}}, \dots, \mathbf{h}_{n_u}^{\mathrm{IH}} ) }_{\mathrm{chronological IH}} \oplus \underbrace{ \bigoplus_{b=1}^B \left( \mathbf{e}_{\mathrm{BOS}}, \mathbf{h}_{b,1}^{\mathrm{PA}}, \dots, \mathbf{h}_{b,m_b}^{\mathrm{PA}}, \mathbf{e}_{\mathrm{EOS}} \right) }_{\mathrm{PA groups ordered by business rule}} \oplus \underbrace{ ( \mathbf{h}^{\mathrm{U}}, \ \mathbf{h}^{\mathrm{Q}}, \dots ) }_{\mathrm{SD segment}} . \end{array}

    • IH tokens are ordered by ascending interaction timestamp.
    • Each PA group is wrapped by BOS/EOS boundary tokens, and groups are ordered by predefined business rules.
    • SD tokens (user, query, etc.) have no temporal ordering and are placed in a segment with distinct positional indices.
  • Ranking Mode: The retrieval-mode sequence is extended by appending candidate item tokens: Iranku=Iretrievalu(h1CIS,,hCCIS). \boldsymbol{\mathcal{I}}_{\mathrm{rank}}^u = \boldsymbol{\mathcal{I}}_{\mathrm{retrieval}}^u \oplus ( \mathbf{h}_1^{\mathrm{CIS}}, \ldots, \mathbf{h}_C^{\mathrm{CIS}} ) .

    • CIS tokens are appended without positional encodings.

4.2.2. Backbone Architecture

The OnePiece backbone processes the packed token sequence uniformly for both retrieval and ranking.

Transformer-Based Sequential Encoding

Let I=[h1;;hN]\boldsymbol{\mathcal{I}} = [ \mathbf{h}_1; \ldots; \mathbf{h}_N ] denote the final input tokens (from Section 3.2), where NN is the total sequence length. OnePiece adopts an LL-layer bi-directional Transformer (Vaswani et al., 2017) with pre-normalization. Let Hl=[h1l;;hNl]RN×d\mathbf{H}^l = [ \mathbf{h}_1^l; \ldots; \mathbf{h}_N^l ] \in \mathbb{R}^{N \times d} be the hidden states at layer ll. The initial input is H(0)=I\mathbf{H}^{(0)} = \boldsymbol{\mathcal{I}}. For the ll-th layer (1lL1 \leq l \leq L): Hattnl=Hl1+MHSA(LN(Hl1)),Hl=Hattnl+FFN(LN(Hattnl)), \begin{array}{r l} & \mathbf{H}_{\mathrm{attn}}^l = \mathbf{H}^{l-1} + \mathrm{MHSA} \Big ( \mathrm{LN} \Big ( \mathbf{H}^{l-1} \Big ) \Big ) , \\ & \quad \mathbf{H}^l = \mathbf{H}_{\mathrm{attn}}^l + \mathrm{FFN} \Big ( \mathrm{LN} \Big ( \mathbf{H}_{\mathrm{attn}}^l \Big ) \Big ) , \end{array}

  • Hl1\mathbf{H}^{l-1}: Input hidden states from the previous layer.

  • LN()\mathrm{LN}(\cdot): Layer Normalization.

  • MHSA()\mathrm{MHSA}(\cdot): Multi-Head Self-Attention with bi-directional attention. Bi-directional attention allows tokens to attend to all other tokens in the sequence, which is suitable for non-autoregressive tasks like personalized ranking.

  • Hattnl\mathbf{H}_{\mathrm{attn}}^l: The output of the MHSA sub-layer after adding the residual connection.

  • FFN()\mathrm{FFN}(\cdot): Position-wise Feed-Forward Network.

  • Hl\mathbf{H}^l: The final output hidden states of layer ll, after the FFN sub-layer and its residual connection.

    The final encoder output HL=[h1L;;hNL]\mathbf{H}^L = [ \mathbf{h}_1^L; \ldots; \mathbf{h}_N^L ] serves as the foundation for subsequent reasoning.

Block-Wise Multi-Step Reasoning

OnePiece introduces a block-wise reasoning mechanism to iteratively refine a set of hidden states across multiple steps, providing adjustable reasoning bandwidth. Let MM be the block size, which is task-dependent. Let BkRM×d\mathbf{B}_k \in \mathbb{R}^{M \times d} denote the kk-th reasoning block. The initial block B0\mathbf{B}_0 is constructed from the final encoder output HL\mathbf{H}^L: B0=HL[NM+1:N]RM×d. \mathbf{B}_0 = \mathbf{H}^L [ N - M + 1 : N ] \in \mathbb{R}^{M \times d} .

  • B0\mathbf{B}_0: The initial reasoning block, extracted from the last MM hidden states of the Transformer's final layer output HL\mathbf{H}^L.

    For subsequent reasoning steps k1k \geq 1, the block Bk\mathbf{B}_k is extracted from the output of the previous reasoning step: Bk=Hk1L[N+(k2)M+1:N+(k1)M]RM×d. \mathbf{B}_k = \mathbf{H}_{k-1}^L [ N + ( k - 2 ) M + 1 : N + ( k - 1 ) M ] \in \mathbb{R}^{M \times d} .

  • Hk1L\mathbf{H}_{k-1}^L: The final output hidden states from the Transformer after step k-1 (this includes the base sequence I\mathcal{I} plus all blocks up to B~k1\tilde{\mathbf{B}}_{k-1}).

    To distinguish different reasoning steps, Reasoning Position Embeddings (RPE) are introduced. Let ERPERK×d\mathbf{E}_{\mathrm{RPE}} \in \mathbb{R}^{K \times d} be a learnable embedding matrix, where KK is the maximum number of reasoning steps. The enhanced blocks B~k\tilde{\mathbf{B}}_k are defined as: B~0=B0,B~k=Bk+1MERPE[k,:]fork1, \begin{array}{l} \tilde{\mathbf{B}}_0 = \mathbf{B}_0 , \\ \tilde{\mathbf{B}}_k = \mathbf{B}_k + \mathbf{1}_M \otimes \mathbf{E}_{\mathrm{RPE}} [ k, : ] \quad \mathrm{for } k \geq 1 , \end{array}

  • 1MRM\mathbf{1}_M \in \mathbb{R}^M: A vector of ones.

  • \otimes: Outer product.

  • ERPE[k,:]\mathbf{E}_{\mathrm{RPE}} [ k, : ]: The kk-th row of the RPE matrix, representing the positional embedding for reasoning step kk.

    At each step kk, the base sequence I\boldsymbol{\mathcal{I}} is concatenated with all previous enhanced blocks B~<k\tilde{\mathbf{B}}_{<k} and the current block B~k\tilde{\mathbf{B}}_k. This concatenated sequence is then passed through the Transformer backbone with a block-wise causal mask: [I;B~<k;B~k]Fθ(;Mk)HkLR(N+kM)×d, [ \boldsymbol{\mathcal{I}} ; \tilde{\mathbf{B}}_{<k} ; \tilde{\mathbf{B}}_k ] \xrightarrow[]{ \mathcal{F}_{\boldsymbol{\theta}} ( \cdot ; \mathcal{M}_k ) } \mathbf{H}_k^L \in \mathbb{R}^{(N + kM) \times d} ,

  • Fθ\mathcal{F}_{\boldsymbol{\theta}}: The Transformer update function.

  • Mk\mathcal{M}_k: The block-wise causal mask. As shown in Figure 4(a), this mask ensures that current block tokens B~k\tilde{\mathbf{B}}_k can attend to all base tokens I\boldsymbol{\mathcal{I}} and all historical blocks B~<k\tilde{\mathbf{B}}_{<k}, but tokens within the current reasoning block cannot attend to future reasoning block tokens.

    Figure 4 | Block-wise reasoning mask and progressive multi-task training. (a) Causal attention mask enables reasoning blocks to attend to input and previous blocks. (b) Progressive training assigns t… 该图像是图表,展示了区块推理掩码和渐进式多任务训练的概念。左侧 (a) 展示了在区块推理中,各层如何通过因果注意力掩码相互连接;右侧 (b) 表示渐进式训练中任务的复杂性逐步增加,提供有效的过程监督。

Figure 4 | Block-wise reasoning mask and progressive multi-task training. (a) Causal attention mask enables reasoning blocks to attend to input and previous blocks. (b) Progressive training assigns tasks of increasing complexity to successive reasoning steps to provide effective process supervision.

This iterative procedure yields progressively refined reasoning states B~1,B~2,,B~K\tilde{\mathbf{B}}_1, \tilde{\mathbf{B}}_2, \ldots, \tilde{\mathbf{B}}_K.

The block size MM is task-dependent:

  • Retrieval Mode: MM is set equal to the length of the Situational Descriptor (SD) segment. The user (hU)(\mathbf{h}_U) and query (hQ)(\mathbf{h}_Q) tokens are designated as aggregation blocks, allowing iterative reasoning to reinforce personalization and relevance dimensions.
  • Ranking Mode: MM is set equal to CC, the number of candidate items in a group. Each block corresponds to all candidate item tokens. The final block B~K\tilde{\mathbf{B}}_K contains the refined representations used for ranking. Randomized candidate grouping is applied during training to encourage robust set-wise reasoning.

4.2.3. Progressive Multi-Task Training

Building on the block-wise multi-step reasoning, OnePiece obtains intermediate block representations {Bk}k=1K\{ \mathbf{B}_k \}_{k=1}^K. To effectively supervise this trajectory, a progressive multi-task training paradigm is introduced, implementing curriculum learning through gradually increasing task complexity. Figure 4(b) depicts this strategy.

KK learning objectives T={τ1,τ2,,τK}\mathcal{T} = \{ \tau_1, \tau_2, \ldots, \tau_K \} are arranged in a progressive curriculum (e.g., exposure \rightarrow click \rightarrow add-to-cart \rightarrow purchase). Each reasoning step kk is assigned to optimize a single task τk\tau_k, providing structured guidance and enabling the model to gradually align with deeper levels of user preference.

Retrieval Mode

In retrieval, user representations are extracted from reasoning blocks and optimized using a combination of calibrated probability estimation (Binary Cross-Entropy) and bidirectional contrastive learning objectives. For each reasoning block BkRM×d\mathbf{B}_k \in \mathbb{R}^{M \times d}, a step-specific user representation is extracted via layer normalization followed by mean pooling: rk=Mean(LN(Bk))Rd,k{1,2,,K}. \mathbf{r}_k = \mathrm{Mean} ( \mathrm{LN} ( \mathbf{B}_k ) ) \in \mathbb{R}^d, \quad k \in \{ 1, 2, \ldots, K \} .

  • rk\mathbf{r}_k: The user representation derived from reasoning block kk.

  • Mean()\mathrm{Mean}(\cdot): Mean pooling operation across the block.

  • LN()\mathrm{LN}(\cdot): Layer Normalization.

    For each training instance, a candidate pool Ω\Omega is constructed. For task τk\tau_k assigned to step kk, and candidate νΩτk\nu \in \Omega_{\tau_k}, yνk{0,1}y_{\nu}^k \in \{0, 1\} is the behavioral label. The candidate pool is partitioned into positive Ωτk+\Omega_{\tau_k}^+ and negative Ωτk\Omega_{\tau_k}^- sets.

OnePiece employs two complementary learning objectives:

(i) Binary Cross-Entropy Loss (BCE) This provides point-wise calibrated probability estimates for individual user-item pairs: LkBCE=νΩτk+logσ(rk,zν)+νΩτklog(1σ(rk,zν)), \mathcal{L}_k^{\mathrm{BCE}} = \sum_{\nu \in \Omega_{\tau_k}^+} - \log \sigma ( \langle \mathbf{r}_k, \mathbf{z}_{\nu} \rangle ) + \sum_{\nu \in \Omega_{\tau_k}^-} - \log \big ( 1 - \sigma ( \langle \mathbf{r}_k, \mathbf{z}_{\nu} \rangle ) \big ) ,

  • LkBCE\mathcal{L}_k^{\mathrm{BCE}}: The BCE loss for reasoning step kk and its assigned task τk\tau_k.
  • σ()\sigma(\cdot): The sigmoid function, σ(x)=11+ex\sigma(x) = \frac{1}{1 + e^{-x}}, which squashes values between 0 and 1, interpreting them as probabilities.
  • rk,zν\langle \mathbf{r}_k, \mathbf{z}_{\nu} \rangle: The inner product (similarity) between the user representation rk\mathbf{r}_k and the item embedding zν\mathbf{z}_{\nu}.
  • zνRd\mathbf{z}_{\nu} \in \mathbb{R}^d: The item embedding for candidate ν\nu.

(ii) Bidirectional Contrastive Learning (BCL) Inspired by CLIP, BCL operates at the batch level, enabling global contrastive reasoning across in-batch samples. It has two symmetric components:

  • User-to-Item (U2I) Contrastive Learning: Enables each user representation rk\mathbf{r}_k to distinguish positive items from negative candidates. LkU2I=νΩτk+logexp(rk,zν/η)ν+Ωτk+exp(rk,zν+/η)+νΩτkexp(rk,zν/η), \mathcal{L}_k^{\mathrm{U2I}} = \sum_{\nu \in \Omega_{\tau_k}^+} - \log \frac{ \exp ( \langle \mathbf{r}_k, \mathbf{z}_{\nu} \rangle / \eta ) }{ \sum_{\nu^+ \in \Omega_{\tau_k}^+} \exp ( \langle \mathbf{r}_k, \mathbf{z}_{\nu^+} \rangle / \eta ) + \sum_{\nu^- \in \Omega_{\tau_k}^-} \exp ( \langle \mathbf{r}_k, \mathbf{z}_{\nu^-} \rangle / \eta ) } ,

    • LkU2I\mathcal{L}_k^{\mathrm{U2I}}: The U2I contrastive loss for step kk.
    • η>0\eta > 0: The temperature parameter, which scales the logits before the softmax function, influencing the sharpness of the probability distribution. A smaller η\eta makes the distribution sharper.
  • Item-to-User (I2U) Contrastive Learning: Enables each positive item to identify its corresponding user representation within the batch. Let Rk={rk(i)}i=1B\mathcal{R}_k = \{ \mathbf{r}_k^{(i)} \}_{i=1}^B be the set of user representations for step kk in the current training batch of size BB. LkI2U=νΩτk+logexp(rk,zν/η)rRkexp(r,zν/η). \mathcal{L}_k^{\mathrm{I2U}} = \sum_{\nu \in \Omega_{\tau_k}^+} - \log \frac{ \exp ( \langle \mathbf{r}_k, \mathbf{z}_{\nu} \rangle / \eta ) }{ \sum_{\mathbf{r}' \in \mathcal{R}_k} \exp ( \langle \mathbf{r}', \mathbf{z}_{\nu} \rangle / \eta ) } .

    • LkI2U\mathcal{L}_k^{\mathrm{I2U}}: The I2U contrastive loss for step kk.

    • rRk\mathbf{r}' \in \mathcal{R}_k: Represents a user representation from the batch.

      The complete BCL objective for step kk combines both symmetric components: LkBCL=LkU2I+LkI2U. \begin{array}{r} \mathcal{L}_k^{\mathrm{BCL}} = \mathcal{L}_k^{\mathrm{U2I}} + \mathcal{L}_k^{\mathrm{I2U}} . \end{array}

The overall retrieval loss aggregates the objectives across all KK reasoning steps: Lretrieval=k=1K(LkBCE+LkBCL). \mathcal{L}^{\mathrm{retrieval}} = \sum_{k=1}^K \big ( \mathcal{L}_k^{\mathrm{BCE}} + \mathcal{L}_k^{\mathrm{BCL}} \big ) .

Ranking Mode

In the ranking stage, the block size MM equals the candidate group size CC. Each reasoning block BkRC×d\mathbf{B}_k \in \mathbb{R}^{C \times d} contains CC hidden states {hi,k}i=1C\{ \mathbf{h}_{i,k} \}_{i=1}^C, where hi,kRd\mathbf{h}_{i,k} \in \mathbb{R}^d is the hidden state for the ii-th candidate at reasoning step kk. For task τk\tau_k assigned to step kk, candidate-wise logits are computed via a task-specific scoring network: si,k=MLPτk(hi,k)R,i{1,,C}, s_{i,k} = \mathrm{MLP}_{\tau_k} ( \mathbf{h}_{i,k} ) \in \mathbb{R}, \quad i \in \{ 1, \ldots, C \} ,

  • si,ks_{i,k}: The predicted score (logit) for candidate ii at reasoning step kk for task τk\tau_k.

  • MLPτk()\mathrm{MLP}_{\tau_k}(\cdot): A Multi-Layer Perceptron (MLP) specifically designed for task τk\tau_k.

    Two complementary learning objectives are employed:

(i) Binary Cross-Entropy Loss (BCE) Provides point-wise probability calibration for individual candidates: LkBCE=i=1C[yiτklogσ(si,k)(1yiτk)log(1σ(si,k))], \mathcal{L}_k^{\mathrm{BCE}} = \sum_{i=1}^C \left[ - y_i^{\tau_k} \log \sigma ( s_{i,k} ) - ( 1 - y_i^{\tau_k} ) \log ( 1 - \sigma ( s_{i,k} ) ) \right] ,

  • yiτk{0,1}y_i^{\tau_k} \in \{0, 1\}: The behavioral label of candidate ii for task τk\tau_k (e.g., 1 if clicked, 0 otherwise).

(ii) Set Contrastive Learning (SCL) Operates at the set-wise level, enabling each positive candidate to distinguish itself from negative candidates within the group: LkSCL=i:yiτk=1logexp(si,k/η)j=1Cexp(sj,k/η), \mathcal{L}_k^{\mathrm{SCL}} = \sum_{i: y_i^{\tau_k} = 1} - \log \frac{ \exp ( s_{i,k} / \eta ) }{ \sum_{j=1}^C \exp ( s_{j,k} / \eta ) } ,

  • The summation i:yiτk=1\sum_{i: y_i^{\tau_k} = 1} is over all positive candidates in the group.

  • The denominator j=1Cexp(sj,k/η)\sum_{j=1}^C \exp ( s_{j,k} / \eta ) includes all candidates in the group, making each positive candidate compete against all others for ranking position.

    The overall ranking loss combines both objectives across all reasoning steps: Lranking=k=1K(LkBCE+LkSCL). \mathcal{L}^{\mathrm{ranking}} = \sum_{k=1}^K \left( \mathcal{L}_k^{\mathrm{BCE}} + \mathcal{L}_k^{\mathrm{SCL}} \right) .

4.2.4. Time Complexity Analysis

The time complexity analysis for OnePiece considers both the backbone encoder and the reasoning phase.

  • Backbone Encoder (without reasoning): The time complexity of each Transformer layer is O(N2d+Nd2)O ( N^2 d + N d^2 ), where NN is the base sequence length, dd is the hidden dimension, and LL is the number of layers. The total cost is O(L(N2d+Nd2))O ( L ( N^2 d + N d^2 ) ).

    • N2dN^2 d: Cost for attention mechanism (computing QKTQK^T).
    • Nd2N d^2: Cost for linear projections (Q, K, V, and output projection) and the FFN.
  • Reasoning Phase: OnePiece employs KV Caching (Key-Value Caching) to reuse historical key-value pairs, meaning each new reasoning step only incurs attention between the MM new block tokens and the cached tokens. At reasoning step kk:

    1. Compute Q, K, V for MM new block tokens: O(Md2)O ( M d^2 ).
    2. Calculate attention between MM new tokens and all N+(k1)MN + (k-1)M cached tokens: O(M(N+kM)d)O ( M ( N + k M ) d ).
    3. Apply output projection (part of FFN and subsequent projections): O(Md2)O ( M d^2 ). Therefore, the complexity per layer at step kk is O(M(N+kM)d+Md2)O \big ( M ( N + k M ) d + M d^2 \big ) . Aggregating over KK reasoning steps and LL layers, the total additional reasoning cost is: O(LKM(Nd+MKd+d2)). O \big ( L K M ( N d + M K d + d^2 ) \big ) .
    • LL: Number of Transformer layers.
    • KK: Number of reasoning steps.
    • MM: Block size.
    • NN: Base sequence length.
    • dd: Hidden dimension. This formula shows that the reasoning cost scales linearly with LL, KK, MM, and NN, which is crucial for controlling computational overhead in industrial settings.

5. Experimental Setup

5.1. Datasets

The experiments were conducted using 30-day logs from Shopee, a large e-commerce platform operating in Southeast Asia and Latin America, serving billions of users. The dataset includes multi-behavior user interactions.

The following are the results from Table 1 of the original paper:

#User#Item#Query#Impression#Click#Add-to-Cart#Order
10M93M12M0.24B60M12M6M
  • Source: Shopee E-commerce Platform.
  • Scale: Contains data for 10 million unique users, 93 million unique items, and 12 million unique queries. The total number of impressions is 0.24 billion, with 60 million clicks, 12 million add-to-cart events, and 6 million orders.
  • Characteristics: The dataset captures multi-behavior user interaction, which is essential for progressive multi-task training as it provides diverse feedback signals (impression, click, add-to-cart, order) for different reasoning steps.
  • Domain: E-commerce, specifically personalized search.

5.1.1. Offline Dataset Construction (from Appendix A)

Retrieval Stage: The objective is to retrieve items users may potentially interact with. The training samples focus on impression and click objectives.

  1. Filtering: Session request data where users did not exhibit click behaviors are filtered out.
  2. Positive Samples: mm clicked items serve as positive samples for both impression and click tasks.
  3. Mixed Samples: nn exposed but unclicked items serve as positive samples for the impression task but simultaneously as negative samples for the click task.
  4. Additional Negative Samples: kk items are sampled from unexposed items within the top-500 results from the ranking stage. These serve as additional negative samples.
  5. Hard Negative Samples: ll items from the same category as the clicked items are sampled. These serve as hard negative samples to enhance model convergence and mitigate homogeneous recommendation risks (i.e., recommending too many similar items). The specific values of m, n, k, l are determined by domain experts and empirical validation.

Ranking Stage: As a downstream stage, ranking requires finer refinement. The focus is on session requests with click behaviors.

  1. Positive Samples: Task-specific interaction types (impression, click, add-to-cart, order) are used as positive samples for their respective tasks.
  2. Negative Samples: Interactions from preceding tasks in the conversion funnel serve as negative samples for each respective task. For example, for the order prediction task, items that were exposed, clicked, and added to the cart but not purchased serve as negative samples.
  3. Augmented Hard Negative Samples: Similar to retrieval, items are randomly sampled from the top-500 ranking results that were not exposed to users. These serve as augmented hard negative samples to improve model performance.

5.2. Evaluation Metrics

5.2.1. Offline Evaluation Metrics

Retrieval Stage: The primary concern is the number of clicked items successfully recalled.

  • Recall@K (R@K): Measures the proportion of relevant items (clicked items in this case) that are successfully retrieved within the top KK items.
    • Conceptual Definition: Recall quantifies the model's ability to find all relevant items in the corpus. Recall@K specifically measures this within the top KK retrieved items. A higher Recall@K indicates that the model is better at bringing relevant items into the candidate pool for subsequent stages.
    • Mathematical Formula: Recall@K=Number of relevant items in top KTotal number of relevant items \mathrm{Recall@K} = \frac{\text{Number of relevant items in top K}}{\text{Total number of relevant items}}
    • Symbol Explanation:
      • Number of relevant items in top K: The count of items truly relevant to the user's intent that appear within the top KK items returned by the retrieval model.
      • Total number of relevant items: The total count of items that are truly relevant to the user's intent in the entire corpus (or ground truth set). The paper reports Recall@100 and Recall@500.

Ranking Stage: Evaluates the model's ability to precisely estimate preference scores for different types of user feedback.

  • AUC (Area Under the Receiver Operating Characteristic Curve):
    • Conceptual Definition: The ROC curve plots the True Positive Rate (sensitivity) against the False Positive Rate (1 - specificity) at various threshold settings. AUC represents the area under this curve. It quantifies the model's ability to distinguish between positive and negative classes. An AUC of 1.0 means perfect classification, while 0.5 means random classification. In recommendation, it indicates how well the model ranks a randomly chosen positive item higher than a randomly chosen negative item.
    • Mathematical Formula: AUC=i=1Pj=1NI(si>sj)+0.5I(si=sj)PN \mathrm{AUC} = \frac{\sum_{i=1}^{P} \sum_{j=1}^{N} \mathbb{I}(s_i > s_j) + 0.5 \mathbb{I}(s_i = s_j)}{P \cdot N}
    • Symbol Explanation:
      • PP: The total number of positive samples (e.g., clicked items).
      • NN: The total number of negative samples (e.g., unclicked items).
      • sis_i: The predicted score for the ii-th positive sample.
      • sjs_j: The predicted score for the jj-th negative sample.
      • I()\mathbb{I}(\cdot): An indicator function that returns 1 if the condition is true, and 0 otherwise.
      • The numerator sums up cases where a positive sample is ranked higher than a negative sample, with ties counting as 0.5.
  • GAUC (Group AUC / User-wise AUC):
    • Conceptual Definition: AUC can sometimes be misleading if the distribution of positive/negative samples varies greatly across users. GAUC addresses this by calculating the AUC for each user (or query group) separately and then averaging these per-user AUCs, often weighted by the number of impressions or positive samples for that user. This gives a more personalized and often more robust evaluation of ranking performance.
    • Mathematical Formula: GAUC=uUAUCuwuuUwu \mathrm{GAUC} = \frac{\sum_{u \in \mathcal{U}} \mathrm{AUC}_u \cdot w_u}{\sum_{u \in \mathcal{U}} w_u}
    • Symbol Explanation:
      • U\mathcal{U}: The set of all users.
      • AUCu\mathrm{AUC}_u: The AUC calculated for user uu.
      • wuw_u: The weight for user uu, typically the number of impressions or positive samples generated by user uu. This ensures that users with more interactions contribute more to the overall GAUC. The paper reports AUC and GAUC for three feedback types: click (C-), add-to-cart (A-), and order (O-).

5.2.2. Online Evaluation Metrics (from Section 5.1.3)

These metrics track business and user engagement indicators in real-world A/B tests.

  • GMV/UU (Gross Merchandise Volume per Unique User):
    • Conceptual Definition: The average total value of goods sold per unique user. It's a key business metric reflecting the overall revenue generated from users.
  • GMV(99.5%)/UU:
    • Conceptual Definition: GMV per user excluding the top 0.5% high-value orders. This metric is used to filter out extreme outliers (e.g., very large, rare purchases) and reflect the stable contributions from regular transactions, providing a more robust measure of typical user spending.
  • AR/UU (Advertising Revenue per Unique User):
    • Conceptual Definition: The average advertising revenue generated per unique user. This metric reflects the effectiveness of the system in converting ad exposures into ad-related revenue.
  • Order/UU:
    • Conceptual Definition: The average number of orders placed per user, capturing transaction frequency.
  • Paid Order/UU:
    • Conceptual Definition: The average number of successfully paid orders per user, counting only completed purchases without refunds. This is a more stringent measure of conversion than Order/UU.
  • CTR (Click-Through-Rate):
    • Conceptual Definition: The ratio of clicked impressions to total impressions. It measures how often users click on items after seeing them, reflecting the attractiveness and relevance of the ranked results.
    • Mathematical Formula: CTR=Number of ClicksNumber of Impressions \mathrm{CTR} = \frac{\text{Number of Clicks}}{\text{Number of Impressions}}
    • Symbol Explanation:
      • Number of Clicks: Total count of times users clicked on items.
      • Number of Impressions: Total count of times items were displayed to users.
  • CTCVR (Click-to-Conversion Rate):
    • Conceptual Definition: The ratio of successful conversions (e.g., purchases) to total clicks. It measures the effectiveness of transforming user engagement (clicks) into completed transactions, reflecting the quality of the clicked items.
    • Mathematical Formula: CTCVR=Number of ConversionsNumber of Clicks \mathrm{CTCVR} = \frac{\text{Number of Conversions}}{\text{Number of Clicks}}
    • Symbol Explanation:
      • Number of Conversions: Total count of times users completed a desired action (e.g., order, add-to-cart).
      • Number of Clicks: Total count of times users clicked on items.
  • Buyer:
    • Conceptual Definition: The proportion of unique users who placed at least one order. It indicates the breadth of user conversion.
  • Bad Query Rate:
    • Conceptual Definition: The percentage of queries for which human evaluators judge the recommended content as irrelevant. This serves as an inverse measure of recommendation accuracy and user satisfaction, aiming for lower values.

5.3. Baselines

OnePiece is compared against several representative baselines to demonstrate its superiority:

  • DLRM (Production baseline in Shopee): This is Shopee's highly optimized internal production model, representing a strong industrial benchmark.

    • Retrieval Mode: Uses a two-tower architecture (inspired by DSSM). User context (query, history) is encoded separately from items. It incorporates DIN-like attention (for relevance), zero-attention, lightweight text CNN (for keyword features), DCNv2 (for high-order cross features), and mean pooling for sequential features, all fused via an MLP.
    • Ranking Mode: Uses a single-tower architecture where candidate items are jointly encoded with user features. The backbone is ResFlow (Fu et al., 2024), combined with DIN-like target attention, cross-attention across sequential behaviors, DCNv2 for higher-order interactions, and MLP fusion. A SENet module (Hu et al., 2017) further supports adaptive feature selection for different tasks.
  • HSTU (Zhai et al., 2024): This is a generative recommendation framework proposed by Meta.

    • Core Idea: It typically considers Interaction History (IH) and Situational Descriptors (SD).
    • Adaptation for Comparison: For a fair comparison, its parameter size is aligned with OnePiece. An additional variant, HSTU+PAHSTU+PA, is also evaluated, where Preference Anchors (PA) are introduced into its input sequence, consistent with OnePiece's context engineering.
  • ReaRec (Tang et al., 2025): This is a reasoning-enhanced recommendation model that models user representation as a multi-step reasoning process over item sequences.

    • Core Idea: The vanilla ReaRec supports retrieval tasks using user interaction history.
    • Adaptation for Comparison: Its backbone and feature inputs are adapted to match OnePiece. For ranking, it's adapted by introducing candidate items into the input sequence and applying a target-aware attention mask (similar to HSTU's design), where sequence tokens can attend to the candidate but candidates remain mutually invisible. A ReaRec+PAReaRec+PA variant is also evaluated, augmenting IH and SD with Preference Anchors.

6. Results & Analysis

This section delves into the experimental results, including overall performance, ablation studies, scaling analysis, and online A/B testing, to validate the effectiveness and practicality of OnePiece.

6.1. Overall Performance

The following are the results from Table 2 of the original paper:

ModelRetrieval ModeRanking Mode
R@100R@500C-AUCC-GAUCA-AUCA-GAUCO-AUCO-GAUC
DLRM0.4580.6790.8560.8510.8930.8430.9310.854
HSTU0.4430.6580.8330.8290.8780.8270.9130.839
HSTU+PA0.4720.6800.8550.8520.9010.8480.9260.849
ReaRec0.4520.6740.8430.8380.8820.8340.9190.843
ReaRec+PA0.4850.7010.8620.8630.9080.8510.9270.851
OnePiece0.5170.7310.9110.9090.9520.8970.9630.886

Table 2 presents the performance comparison of different models on both retrieval and ranking tasks using 30 days of training data.

  • DLRM as a Strong Baseline: The DLRM baseline, highly optimized within Shopee, shows strong performance, often outperforming the vanilla HSTU and ReaRec. This indicates that DLRM effectively leverages rich feature interactions and various sequential features.
  • Impact of Preference Anchors (PA): Both HSTU+PAHSTU+PA and ReaRec+PAReaRec+PA consistently outperform their vanilla counterparts across all metrics. For instance, HSTU+PAHSTU+PA improves R@100 from 0.443 to 0.472, and ReaRec+PAReaRec+PA raises R@100 from 0.452 to 0.485. This strongly confirms that enriching user history with auxiliary preference anchors provides valuable, complementary information, guiding the model towards better understanding of context-specific user intents. ReaRec+PAReaRec+PA generally shows higher robustness than HSTU+PAHSTU+PA, likely due to its Transformer backbone with bi-directional attention and reasoning capabilities.
  • OnePiece's Superiority: OnePiece achieves the best overall results across all metrics and tasks.
    • Retrieval: Compared to the strongest baseline ReaRec+PAReaRec+PA, OnePiece significantly improves Recall@100 from 0.485 to 0.517 (a relative gain of ~6.6%) and Recall@500 from 0.701 to 0.731 (a relative gain of ~4.3%).

    • Ranking: In ranking, OnePiece boosts C-AUC from 0.862 to 0.911 (a relative gain of ~5.7%), A-AUC from 0.908 to 0.952 (a relative gain of ~4.8%), and O-AUC from 0.927 to 0.963 (a relative gain of ~3.9%). Similar improvements are observed for GAUC metrics.

      These consistent and substantial gains validate OnePiece's core design principles, particularly its novel block-wise latent reasoning and progressive multi-task training strategy, which enable finer-grained preference refinement through multi-step reasoning. This positions OnePiece as a more powerful and unified framework for industrial retrieval and ranking.

6.2. Ablation Study

6.2.1. Context Engineering Ablation

The following are the results from Table 3 of the original paper:

VersionModelRetrievalRanking
R@100R@500C-AUCC-GAUCA-AUCA-GAUCO-AUCO-GAUC
V1IH(ID)0.4070.6460.8020.8020.8600.8190.9080.835
V2IH(ID+Side Info)0.4280.6570.8460.8440.8710.8390.9180.845
V3V2+PA(10)0.4590.6770.8790.8760.9230.8630.9400.861
V4V2+PA(20)0.4670.6860.8850.8860.9290.8690.9460.866
V5V2+PA(30)0.4750.6890.8920.8900.9360.8740.9490.871
V6V2+PA(60)0.4910.7070.9010.9000.9450.8860.9560.880
V7V2+PA(90)0.5040.7190.9080.9050.9510.8960.9620.885
V8V7+SD0.5170.7310.9110.9090.9520.8970.9630.886

Table 3 details the ablation study on OnePiece's context engineering design, showing the progressive impact of adding Interaction History (IH), Preference Anchors (PA), and Situational Descriptors (SD).

  • V1 (IH(ID) - Minimal Baseline): Starting with only user interaction sequences composed of raw item IDs and a two-layer Transformer with bi-directional attention, this configuration yields the lowest performance (e.g., R@100 of 0.407, C-AUC of 0.802). This highlights the need for richer features and context.
  • V2 (IH(ID+Side Info) - Adding Item Features): Introducing side information (additional features beyond raw IDs) for each item in the IH sequence leads to a clear improvement across all metrics (e.g., R@100 increases to 0.428, C-AUC to 0.846). This demonstrates the importance of comprehensive item features.
  • V3-V7 (V2+PA(L) - Incorporating Preference Anchors): Gradually increasing the length (LL) of Preference Anchors from 10 to 90 consistently boosts performance. For example, R@100 increases from 0.459 (PA(10)) to 0.504 (PA(90)), and C-AUC from 0.879 to 0.908. This shows a clear scaling effect of PA, where longer auxiliary item sequences provide richer query-specific context, enabling the model to capture more fine-grained user intent. PA introduces query-dependent signals that are absent in plain IH, helping differentiate user preferences under various queries.
  • V8 (V7+SD - Adding Situational Descriptors): Finally, incorporating Situational Descriptors (static user features and query-specific information) yields the best overall results.
    • Retrieval Impact: Recall@100 improves from 0.504 (V7) to 0.517, and R@500 from 0.719 to 0.731. This significant gain suggests that SD provides stronger contextual grounding, which is particularly beneficial for the retrieval stage to find a broader set of relevant items.

    • Ranking Impact: The gains in ranking metrics are marginal (e.g., C-AUC from 0.908 to 0.911). This is because IH already provides rich personalization, and PA captures detailed query-specific preferences. Since ranking focuses on fine-grained comparisons among highly relevant candidates, SD serves as a weaker anchor in this stage.

      In summary, this ablation study clearly demonstrates the effectiveness of OnePiece's structured context engineering. IH captures long-term personalization, PA offers scalable and query-specific anchors, and SD provides stable contextual grounding, all contributing in complementary ways to enrich user-query representations and achieve consistent improvements.

6.2.2. Training Strategy Ablation

The following are the results from Table 4 of the original paper:

VersionTraining StrategyR@100R@500
V1Causal Mask0.4640.671
V2Bi-Directional0.4700.676
V3V2 + 1-Step Reasoning, Click Task on Last Step0.4900.708
V4V2 + 1-Step Reasoning, Multi-Task on Last Step0.4950.714
V5V2 + 2-Step Reasoning, Multi-Task on Last Step0.5100.726
V6V2 + 2-Step Reasoning, Progressive Multi-Task0.5170.731

Table 4 shows the impact of different training strategies on retrieval performance.

The following are the results from Table 5 of the original paper:

VersionTraining StrategyC-AUCC-GAUCA-AUCA-GAUCO-AUCO-GAUC
V1Causal Mask0.8390.8360.8760.8300.9110.838
V2Bi-Directional, CIS Inter-Invisible0.8600.8590.9030.8480.9200.847
V3Bi-Directional, CIS Inter-Visible0.8810.8790.9180.8570.9370.854
V4V3 + 1-Step Reasoning, Multi-Task on Last Step0.8900.8890.9310.8710.9460.867
V5V3 + 2-Step Reasoning, Multi-Task on Last Step0.8930.8940.9360.8760.9480.869
V6V3 + 3-Step Reasoning, Multi-Task on Last Step0.9060.9020.9460.8890.9570.881
V7V3 + 3-Step Reasoning, Progressive Multi-Task0.9110.9090.9520.8970.9630.886

Table 5 details the impact of training strategies on ranking performance.

  • Impact of Bi-Directional Attention (V1 vs. V2): Switching from causal mask (V1) to bi-directional attention (V2) yields significant gains across both tasks. In retrieval, R@100 improves from 0.464 to 0.470. In ranking, C-AUC jumps from 0.839 to 0.860. This validates that bi-directional attention, by allowing tokens to condition on the full context, provides more comprehensive representation information crucial for non-autoregressive recommendation tasks.
  • Impact of Candidate Inter-Visibility (V2 vs. V3 for Ranking): For ranking tasks, enabling Candidate Item Set (CIS) inter-visibility (V3) (allowing candidates to attend to each other) provides a major boost, with C-AUC increasing from 0.860 to 0.881. This confirms that the ability to perform rich comparative reasoning among candidates in a shared latent space is essential for accurate ranking.
  • Impact of Block-Wise Reasoning (V2/V3 to V4-V6): The introduction of the block-wise reasoning mechanism consistently demonstrates cumulative performance gains:
    • Retrieval: Moving from simple bi-directional attention (V2) to 1-step reasoning with click prediction (V3) improves R@100 to 0.490. Using multi-task learning on the final step (V4) further increases R@100 to 0.495. Extending to 2-step reasoning (V5) pushes R@100 to 0.510.
    • Ranking: Starting from CIS inter-visible (V3), 1-step reasoning (V4) improves C-AUC to 0.890. 2-step reasoning (V5) brings C-AUC to 0.893, and 3-step reasoning (V6) achieves C-AUC of 0.906. These results indicate that each additional reasoning step meaningfully contributes to performance by capturing more nuanced user behavioral patterns, enabling increasingly sophisticated preference modeling.
  • Impact of Progressive Multi-Task Training (V5/V6 vs. V6/V7): OnePiece's progressive multi-task training strategy consistently outperforms single-embedding multi-task learning (where all tasks supervise only the final step's embedding).
    • Retrieval: Progressive multi-task (V6) achieves R@100 of 0.517, surpassing the 2-step reasoning with multi-task on the last step (V5) at 0.510.
    • Ranking: Progressive multi-task (V7) achieves C-AUC of 0.911, compared to 3-step reasoning with multi-task on the last step (V6) at 0.906. The key advantage lies in distributing different tasks across multiple reasoning steps, preventing gradient conflicts and encouraging each reasoning step to specialize in extracting task-specific information. The optimal number of reasoning steps also differs by task: retrieval benefits most from two steps (impression-click hierarchy), while ranking benefits from three steps (full conversion funnel), demonstrating the adaptive nature of the progressive framework.

6.3. Scaling Analysis

6.3.1. Training Data Scaling

Figure 5 | Training convergence curves of different models on retrieval and ranking tasks.
该图像是图表,展示了不同模型在检索和排名任务上的训练收敛曲线。图中左侧为检索模式下的 Recall@100 曲线,右侧为排名模式下的 Click AUC 曲线,均以训练数据跨度(天)为横坐标,说明了 OnePiece 相较于其他模型的表现趋势。

Figure 5 | Training convergence curves of different models on retrieval and ranking tasks.

Figure 5 illustrates the training convergence curves of OnePiece compared to DLRM and HSTU over increasing training data spans (up to 60 days).

  • Superior Data Efficiency: OnePiece already surpasses both baselines after only 7-10 days of training. This indicates OnePiece's superior data efficiency, attributable to its context-aware and multi-step reasoning architecture, which can extract more value from less data.

  • Stronger Scaling Capabilities: While DLRM and HSTU quickly converge to a plateau, OnePiece continues to improve with longer training spans, and the performance gap widens. By day 60, OnePiece demonstrates a pronounced lead, showing continuous improvement potential. This suggests OnePiece possesses a stronger modeling capacity that can effectively exploit richer behavioral supervision from extended time horizons, indicating superior scaling capabilities compared to baselines.

  • Robust Optimization: The training curves for OnePiece exhibit smooth and stable growth without significant fluctuations, demonstrating robust optimization under its progressive multi-task supervision.

    These results confirm that OnePiece not only achieves higher sample efficiency but also scales more effectively as more training data becomes available, making it suitable for industrial environments with vast and continuously growing datasets.

6.3.2. Reasoning Scaling

The following are the results from Table 6 of the original paper:

Block SizeC-AUCC-GAUCA-AUCA-GAUCO-AUCO-GAUC
M = C = 10.8850.8810.9230.8610.9470.871
M = C = 40.9130.9110.9510.8960.9610.885
M = C = 80.9200.9180.9560.8990.9640.887
M = C = 120.9270.9230.9580.9030.9690.893

Table 6 investigates the impact of the reasoning block size MM (which equals the number of candidate items CC in ranking mode) on OnePiece's ranking performance, using 60 days of training data.

  • Consistent Improvements: Increasing MM from 1 to 12 yields consistent improvements across all evaluated metrics. For example, C-AUC increases from 0.885 at M=1M=1 to 0.927 at M=12M=12.

  • Largest Initial Gain: The most substantial performance gain occurs when scaling from M=1M=1 to M=4M=4. This is because pointwise modeling (M=1M=1) lacks cross-sample comparisons, meaning candidates are evaluated in isolation. Grouping candidates into blocks (M>1M>1) enables the reasoning mechanism to contrast preferences more effectively, directly aligning with the intrinsic nature of ranking.

  • Diminishing Returns: As the block size continues to increase beyond M=4M=4, the improvements become smaller yet remain positive (e.g., C-AUC increases from 0.913 at M=4M=4 to 0.920 at M=8M=8, then to 0.927 at M=12M=12). This suggests diminishing returns, possibly because overly large blocks might overload the reasoning medium with redundant information, saturating its representational capacity.

    These findings reveal a trade-off: expanding reasoning bandwidth is beneficial, but there's a point where information redundancy might limit further gains. Selecting an appropriate block size is crucial for maximizing the effectiveness of block-wise reasoning.

6.4. Online A/B Testing

OnePiece was subjected to large-scale online A/B testing on Shopee's production search system to assess its real-world effectiveness. 10%10\% of traffic was allocated for these experiments.

6.4.1. Online Inference Details (from Section 5.1.1)

  • Retrieval Stage: Offline training generates vector representations for the entire item pool. An Approximate Nearest Neighbor (ANN) index is constructed using the Hierarchical Navigable Small World (HNSW) algorithm (Malkov and Yashunin, 2018) to support efficient online retrieval.
  • Ranking Stage: A score fusion strategy is employed to integrate outputs from different tasks: pfinal=αpctrapctcvrb+βpctrapctcvrbprice+γpctrecpm, p_{\mathrm{final}} = \alpha \cdot p_{\mathrm{ctr}}^a \cdot p_{\mathrm{ctcvr}}^b + \beta \cdot p_{\mathrm{ctr}}^a \cdot p_{\mathrm{ctcvr}}^b \cdot \mathrm{price} + \gamma \cdot p_{\mathrm{ctr}} \cdot \mathrm{ecpm},
    • pfinalp_{\mathrm{final}}: The final relevance score for an item.
    • α,β,γ\alpha, \beta, \gamma: Hyperparameters controlling the importance weights of the respective components, enabling a balance between user experience and business revenue.
    • pctrp_{\mathrm{ctr}}: Click-through rate predicted by OnePiece's final reasoning step (logit of the click task).
    • pctcvrp_{\mathrm{ctcvr}}: Click-to-conversion rate predicted by OnePiece's final reasoning step (logit of the order task).
    • a, b: Parameters modulating the influence of click and conversion tasks in the final ranking.
    • price: The item's price information.
    • ecpm: The item's advertising value component (Effective Cost Per Mille, or cost per thousand impressions).

6.4.2. Overall Performance - Retrieval Mode

The following are the results from Table 7 of the original paper:

GMV/UUGMV(99.5%)/UUOrder/UUPaid Order/UUCTCVRBuyerBad Query Rate
+1.08%+0.91%+0.71%+0.98%+0.66%+0.41%-0.17%

Table 7 shows the online A/B testing results for OnePiece in retrieval mode, replacing a User-to-Item (U2I) recall route.

  • Consistent Business Gains: GMV/UU increases by +1.08+1.08%. GMV(99.5GMV(99.5%)/UU also improves by +0.91+0.91%, indicating that gains are driven by stable, regular transactions, not just occasional high-value orders.
  • Improved Conversion: Order/UU rises by +0.71+0.71%, and Paid Order/UU increases even faster (+0.98+0.98%), suggesting higher conversion rates and reduced refunds. Buyer expands by +0.41+0.41%, meaning more unique users are completing purchases. CTCVR improves by +0.66+0.66%, reflecting a better end-to-end conversion from exposure to transaction.
  • Enhanced User Experience: Crucially, Bad Query Rate decreases by 0.17%. This indicates better query relevance and an improved user experience, balancing personalization with relevance. Unlike previous personalized recall strategies that might boost GMV at the expense of relevance, OnePiece achieves balanced improvements.

6.4.3. Overall Performance - Ranking Mode

The following are the results from Table 8 of the original paper:

GMV/UUGMV(99.5%)/UUAR/UUOrder/UUBuyerCTRBad Query Rate
+1.12%+0.65%+2.90%+0.08%+0.08%+0.29%+0.21%

Table 8 summarizes the online A/B testing results for OnePiece deployed in the pre-ranking stage.

  • Strong Business Metrics: GMV/UU improves by +1.12+1.12%. Most notably, Advertising Revenue per Unique User (AR/UU) shows a substantial boost of +2.90+2.90%.

  • Utility Translation: Order/UU and Buyer increase marginally (+0.08+0.08%), which is consistent with the score fusion function (Eq. 21) designed to translate order-related utility into GMV and advertising gains.

  • Engagement and Relevance Trade-off: CTR improves by +0.29+0.29%, indicating stronger attractiveness of ranked results. However, Bad Query Rate increases by +0.21+0.21%. This minor increase is attributed to more advertising slots potentially introducing items less relevant to direct user interests, but it is outweighed by the substantial revenue gains.

    Overall, OnePiece ranking strengthens core business metrics, especially advertising revenue, while achieving a practical trade-off between user experience and business objectives in large-scale industrial systems.

6.4.4. Recall Coverage and Exclusive Contribution

To further evaluate OnePiece in the retrieval stage, its overlap with other recall routes and its exclusive contribution are analyzed.

The following are the results from Table 9 of the original paper:

Recall RouteSTR1STR2Swing I21KPopS2I
DLRM37.3%31.3%57.9%62.5%47.6%
OnePiece66.2% (+77.6%)64.4% (+105.8%)76.8% (+32.6%)77.2% (+23.5%)67.8% (+42.4%)

Table 9 compares the overlap coverage between DLRM and OnePiece with respect to other recall strategies (e.g., STR1: sparse text recall with user-input keywords; STR2: sparse text recall with rewritten keywords; Swing I2I: graph-based item-to-item personalized recall; KPop: popularity-based recall under keywords; S2I: semantic vector-to-item recall).

  • Higher Recall Coverage: OnePiece consistently achieves substantially higher recall coverage across all other recall routes compared to DLRM. For instance, STR2 coverage more than doubles (+105.8+105.8%), and STR1 coverage increases by +77.6+77.6%. Significant gains are also observed for Swing I2I (+32.6+32.6%), KPop (+23.5+23.5%), and S2I (+42.4+42.4%).

  • Potential for Unified Model: These improvements suggest that OnePiece has strong potential to replace multiple specialized recall strategies with a single unified model, effectively balancing personalization, popularity, and relevance.

    Figure 6 | Exclusive contribution of OnePiece in the retrieval stage. 该图像是柱状图,展示了 OnePiece 相较于 DLRM 在印象和点击阶段的独特贡献。印象阶段,OnePiece 达到 9.9%,而 DLRM 为 3.6%;点击阶段,OnePiece 为 5.7%,DLRM 则为 2.4%。数据表明,OnePiece 在两个阶段的贡献都有显著提升。

Figure 6 | Exclusive contribution of OnePiece in the retrieval stage.

Figure 6 compares the exclusive contribution of OnePiece and DLRM in terms of impressions and clicks.

  • Substantial Unique Contribution: OnePiece demonstrates a significant increase in unique contributions: its exclusive impression share rises from 3.6% to 9.9% (a 2.8x increase), and its exclusive click share grows from 2.4% to 5.7% (a 2.4x increase).
  • Novel Impressions and Clicks: This indicates that OnePiece not only covers the exposure of other recall routes but also contributes significantly more novel impressions and clicks that are not captured by traditional DLRM-based recall. In essence, OnePiece nearly doubles the independent value over traditional DLRM, enhancing overall recall performance.

6.4.5. Efficiency Analysis

The following are the results from Table 10 of the original paper:

Retrieval Mode
MethodInfer. Time↓MFU↑MU↑
DLRM40ms/request35%30%
OnePiece30ms/request (-25%)80% (+129%)50% (+67%)
Ranking Mode (batch size=128, KV-Cache enabled)
MethodInfer. Time↓MFU↑MU↑
DLRM109ms/batch23%29%
OnePiece (M=1)110ms/batch (+0.9%)67% (+191%)38% (+31%)
OnePiece (M=4)112ms/batch (+2.8%)
OnePiece (M=8)115ms/batch (+5.5%)
OnePiece (M=12)120ms/batch (+10.1%)

Table 10 presents a computational efficiency comparison between DLRM and OnePiece on a single NVIDIA A30 GPU. MFU denotes Model FLOPs Utilization (ratio of achieved FLOPs to theoretical peak), and MU denotes Memory Utilization (percentage of GPU memory occupied).

  • Enhanced Hardware Utilization in Retrieval Mode:

    • OnePiece achieves a 25% reduction in inference time (30ms vs. 40ms per request) compared to DLRM.
    • It shows a dramatic 129% increase in MFU (from 35% to 80%) and a 67% increase in MU (from 30% to 50%).
    • This indicates that OnePiece's unified Transformer architecture is more compatible with modern GPU parallelization, effectively leveraging tensor computation units that are often underutilized by DLRM's heterogeneous, embedding-heavy design. The efficiency gains stem from streamlined data flow and reduced memory transfer overhead. This is crucial for reducing operational costs in large-scale industrial deployments.
  • Controlled Computational Scaling in Ranking Mode:

    • While OnePiece incurs a modest overhead relative to DLRM at M=1M=1 (110ms/batch vs. 109ms/batch), its scaling behavior with increasing block size MM (reasoning capacity) is efficient.

    • Inference time increases from 110ms at M=1M=1 to 120ms at M=12M=12, representing only a 10.1% overhead for a 12x expansion in reasoning capacity. The progressive overhead (0.9%, 2.8%, 5.5%, 10.1%) demonstrates efficient computational amortization, where each additional reasoning block incurs diminishing marginal cost.

    • The MFU shows a dramatic 191% improvement (from 23% to 67%) even at M=1M=1. This indicates that OnePiece's architecture inherently aligns better with GPU computational paradigms, regardless of reasoning complexity. This efficiency is partly due to the KV-Caching mechanism enabling efficient batch processing.

    • This controlled scaling offers a favorable efficiency-performance trade-off: as shown in Table 6, C-AUC significantly improves from 0.885 at M=1M=1 to 0.927 at M=12M=12 (a 4.7% relative improvement).

      These findings establish OnePiece as a practical solution for production systems, offering configurable reasoning depth that allows flexible trade-offs between computational efficiency and model performance.

6.5. Attention Visualization Analysis (from Appendix C)

The attention visualization analyses (Figures 8 and 9) provide insights into how OnePiece processes information and performs multi-step reasoning.

该图像是图表,展示了OnePiece注意力分析的案例研究,包括检索模式(a)和排名模式(b)。每个子图表示不同层和头的注意力矩阵,呈现输入、偏好和场景信号之间的关系。
该图像是图表,展示了OnePiece注意力分析的案例研究,包括检索模式(a)和排名模式(b)。每个子图表示不同层和头的注意力矩阵,呈现输入、偏好和场景信号之间的关系。

Figure 8 | OnePiece Attention Analysis in Different Modes. The attention maps visualize the attention weights between different input components: Interaction History (I), Preference Anchor (P), Situational Descriptor (S), and Candidate Item Set (C). In these visualizations, the y-axis represents the Query, while the x-axis represents the Key-Value, corresponding to the attention weight matrix commonly used in Transformer-like architectures.

Attention Analysis of Context Input (Figure 8):

  • Layer-wise Evolution: Early layers (e.g., Layer-1 heads in Figure 8(a)-1/4 and 8(b)-1/3) show concentrated or diagonal attention, indicating localized sequential processing within IH tokens or short-span links between SD and CIS. Later layers (e.g., Layer-2 heads in Figure 8(a)-5/8 and 8(b)-5/8) develop multi-region attention patterns, connecting multiple token groups simultaneously, suggesting a transition from localized processing to global integration of information.
  • Head-level Specialization: Within the same layer, different attention heads learn specialized roles. Some heads emphasize intra-component coherence (e.g., IH-IH diagonals in Figure 8(b)-7), focusing on relationships within a single input type. Others prioritize cross-component flows (e.g., SD to IH in Figure 8(a)-5/6; CIS to IH in Figure 8(b)-6), indicating integration of information across different input types. This validates that the model develops hierarchical and diversified reasoning strategies.
  • Mode-Specific Characteristics:
    • Retrieval Mode (Figure 8(a)): The three-token design (IH, PA, SD) fosters structured and compact cross-component attention. For instance, heads in Layer-2 (e.g., 5, 6) link IH with PA to guide long-term preference recall, while another head (e.g., 8) reinforces SD to IH connections, grounding retrieval in situational context. These interactions remain relatively localized, aligning with retrieval's coarse-grained filtering objective.

    • Ranking Mode (Figure 8(b)): The introduction of CIS tokens fundamentally expands the attention space to four-way interactions. Heads in Layer-2 (e.g., 6, 8) show attention flows spanning IH, PA, SD, and CIS simultaneously, enabling joint evaluation of user preference signals against explicit candidate items. IH tokens maintain temporal sequentiality (e.g., 7), while CIS tokens actively integrate with PA and SD for fine-grained candidate comparison (e.g., 4). This highlights the ranking stage's role in nuanced discrimination.

      Figure 9 | Attention visualization of multi-step block-wise reasoning in OnePiece. The heatmaps show attention weights between reasoning blocks (y-axis, as queries) and input components with previous… 该图像是示意图,展示了OnePiece中多步骤块状推理的注意力可视化。左侧 (a) 为检索模式,包含两个推理步骤 R1R_1R2R_2;右侧 (b) 为排名模式,包含三个推理步骤 R1R_1R2R_2R3R_3。热图通过颜色深浅表示了推理块(y轴)与输入组件及以往推理输出(x轴)之间的注意力权重,输入组件包括交互历史(I)、偏好锚(P)、情境描述符(S)和候选项目(C,仅在排名模式中)。

Figure 9 | Attention visualization of multi-step block-wise reasoning in OnePiece. The heatmaps show attention weights between reasoning blocks (y-axis, as queries) and input components with previous reasoning outputs x˙\mathbf { \dot { x } } -axis, as keys and values) for (a) retrieval mode with two reasoning steps (R1,R2)\left( R _ { 1 } , R _ { 2 } \right) and (b) ranking mode with three reasoning steps ( R _ { 1 } , R _ { 2 } , R _ { 3 } ) . Context Tokens include Interaction History (I), Preference Anchors (P), Situational Descriptors (S), and Candidate Items (C, ranking only).

Attention Analysis of Multi-Step Block-wise Reasoning (Figure 9): This analysis shows how reasoning blocks progressively query different information sources.

  • Retrieval Mode (Figure 9(a) - two steps R1,R2R_1, R_2):
    • R1 (first reasoning block) exhibits strong concentrated attention on Situational Descriptors (S) and moderate attention on Preference Anchors (P), with minimal attention to Interaction History (I). This indicates that initial reasoning prioritizes contextual and query-specific signals for understanding user intent.
    • R2 (second reasoning block) shows a pivotal shift, developing concentrated attention on specific regions within Interaction History (I) while also incorporating information from the previous reasoning block R1.
    • This demonstrates an evolution from a situational-preference focus to selective behavioral pattern recognition, where progressive reasoning transitions from broad contextual understanding to targeted sequential preference extraction.
  • Ranking Mode (Figure 9(b) - three steps R1,R2,R3R_1, R_2, R_3):
    • The three-step process reveals increasingly sophisticated attention integration with a hierarchical information enhancement pattern.
    • As reasoning progresses, later blocks (R3) demonstrate stronger attention to more recent reasoning outputs (R2), while exhibiting relatively weaker attention to earlier outputs (R1).
    • This suggests that each reasoning step progressively consolidates and refines information from previous steps. More recent reasoning blocks contain higher-level abstractions that effectively subsume earlier insights. The model learns to prioritize the most refined representations, indicating an efficient information compression mechanism where each reasoning step builds upon an increasingly compressed preference understanding to achieve discriminative candidate evaluation.

7. Conclusion & Reflections

7.1. Conclusion Summary

The paper introduces OnePiece, a pioneering unified framework that successfully integrates LLM-style context engineering and multi-step reasoning into industrial cascaded ranking systems. Built upon a pure Transformer backbone, OnePiece incorporates structured context engineering to organize heterogeneous signals (interaction history, preference anchors, situational descriptors, and candidate items) into a tokenized sequence. It equips the model with block-wise latent reasoning capacity for iterative representation refinement and optimizes this process through a progressive multi-task training strategy, leveraging natural user feedback chains.

Extensive offline experiments validate the effectiveness of each design component and demonstrate favorable scaling properties with respect to preference anchor length, training data span, and block size. Crucially, online A/B testing in Shopee's main personalized search scenario confirms significant real-world impact, including over +2+2% GMV/UU improvement and +2.90+2.90% advertising revenue increase. OnePiece also exhibits superior efficiency and hardware utilization compared to traditional baselines. These results position OnePiece as a promising new paradigm for building scalable, reasoning-driven ranking models in real-world industrial environments.

7.2. Limitations & Future Work

The authors identify two promising future research directions:

  • Unified Multi-Route Retrieval: Existing industrial multi-route retrieval systems maintain separate models with distinct parameters for various pathways (e.g., I2U, I2I, U2I, Q2I). This is resource-intensive and prone to redundancy. OnePiece's unified architecture, by processing tailored contexts for different retrieval scenarios within a single model, paves the way for "One For All" multi-route retrieval. The empirical evidence from recall coverage and exclusive contribution (Section 5.3) supports the feasibility of such a streamlined system. The future work suggests further exploring how context engineering can be adapted to serve diverse recommendation objectives with a single unified model, reducing system complexity and maintenance overhead.

    Figure 7 | Comparison between existing multi-route retrieval systems and OnePiece-based unified architecture. (a) Traditional multi-route retrieval requires maintaining separate models with distinct…
    该图像是示意图,比较了传统多路径检索系统与基于OnePiece的统一架构。在(a)部分,传统多路径检索需要维护多个参数不同的模型,而(b)部分则展示了OnePiece如何通过单一模型处理不同的检索场景,实现统一检索功能,降低系统复杂性。

    Figure 7 | Comparison between existing multi-route retrieval systems and OnePiece-based unified architecture. (a) Traditional multi-route retrieval requires maintaining separate models with distinct parameters for different retrieval pathways (I2U, I2I, U2I, Q2I, etc.), each utilizing specialized architectures and storage systems. (b) OnePiece achieves unified multi-route retrieval through a single model that processes tailored prompts for different retrieval scenarios, enabling "One For All" functionality while reducing system complexity and maintenance overhead.

  • Scalable Latent Reasoning: While OnePiece represents a successful deployment of latent reasoning at industrial scale, the authors acknowledge inherent limitations in its current reasoning scalability. The primary challenge is obtaining sufficient multi-task signals to effectively supervise intermediate reasoning processes. This constraint limits the ability to scale reasoning capabilities further. Future research should explore more effective methodologies for scaling latent reasoning, such as incorporating online user feedback through reinforcement learning to adaptively determine optimal reasoning depth, or developing organic integration between model self-exploration and multi-task supervision processes to enable autonomous reasoning evolution.

7.3. Personal Insights & Critique

OnePiece presents a significant step forward in industrial recommendation systems by moving beyond mere architectural transplants from LLMs to a more fundamental integration of context engineering and multi-step reasoning.

Innovations and Strengths:

  • Principled Adaptation: The core strength lies in its principled approach to adapting LLM mechanisms. By explicitly defining and integrating structured context engineering and block-wise latent reasoning, the paper offers a robust framework that goes beyond superficial application of Transformers.
  • Practicality and Impact: The large-scale deployment at Shopee and the consistent online gains are compelling evidence of OnePiece's practical value and readiness for real-world industrial settings. The efficiency analysis (MFU, MU) further underscores its viability.
  • Comprehensive Evaluation: The combination of extensive offline experiments (ablation, scaling) and online A/B testing provides strong validation for each component and the overall framework.
  • "One For All" Vision: The idea of a unified model for multi-route retrieval is highly appealing. Current systems are often complex ensembles of specialized models. OnePiece's capability to generalize across different recall routes is a game-changer for system simplicity and maintenance.

Potential Issues and Areas for Improvement:

  • Complexity of Progressive Multi-Task Loss: While effective, the progressive multi-task training strategy involves designing specific tasks for each reasoning step and carefully managing their supervision. This might require significant domain expertise and empirical tuning for new scenarios or larger numbers of reasoning steps. The paper's formulation provides a general framework, but practical implementation could be intricate.
  • Dependency on Preference Anchors (PA): PAs, constructed from domain knowledge (e.g., top-clicked items), are shown to be highly effective. However, the quality and construction of these anchors are crucial. In domains where such rich, structured domain knowledge is scarce or difficult to extract, the PA component's effectiveness might diminish. Further research could explore more automated or self-supervised ways to generate high-quality anchors.
  • Interpretability of Latent Reasoning: While block-wise latent reasoning significantly improves performance, the "latent" nature means the intermediate reasoning steps are not directly human-interpretable in natural language, unlike chain-of-thought in LLMs. This might pose challenges in debugging, understanding model failures, or building user trust, particularly in sensitive recommendation contexts. The attention visualizations offer some insight, but a full "why" remains opaque.
  • Generalizability beyond E-commerce Search: While proven effective in Shopee's personalized search, the direct applicability and optimal configurations for other domains (e.g., news recommendation, video streaming) or recommendation types (e.g., cold-start scenarios, diverse recommendations) might require further adaptation and validation. The specific task definitions for progressive multi-task training might need to be re-calibrated.

Transferability and Future Value: The core methodologies of structured context engineering and block-wise latent reasoning are highly transferable.

  • Context Engineering: The idea of augmenting raw user data with various contextual cues (preference anchors, situational descriptors) can be applied to almost any sequential modeling task where rich context improves understanding. This could benefit fraud detection, medical diagnosis (integrating patient history with similar cases and current symptoms), or ad targeting.

  • Block-Wise Latent Reasoning: The concept of iteratively refining representations in a block-wise manner offers a general mechanism for enhancing complex models beyond recommendation. It could be valuable in any domain requiring multi-step decision-making or hierarchical information processing, such as reinforcement learning agents, time-series forecasting with complex dependencies, or scientific discovery pipelines.

  • Progressive Multi-Task Training: This strategy for supervising latent processes via a curriculum of related tasks is a powerful technique for models that perform multi-step computations without explicit human-provided reasoning traces. It could find applications in robotics (training complex motor skills through simpler sub-tasks), drug discovery (optimizing molecule design based on progressively complex biological properties), or any AI system where an internal "thought process" needs to be guided.

    OnePiece sets a new standard for integrating advanced AI paradigms into industrial systems, highlighting that the future of recommendation lies not just in bigger models, but in smarter, more context-aware, and reasoning-capable architectures.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.