SPRIGHT: Extracting the Server from Serverless Computing! High-performance eBPF-based Event-driven, Shared-memory Processing
TL;DR Summary
SPRIGHT proposes a novel eBPF-based, event-driven, shared-memory serverless framework to resolve performance overheads and cold starts in existing platforms. By avoiding serialization and leveraging eBPF, SPRIGHT achieves DPDK-level dataplane performance with 10x less CPU and eli
Abstract
SPRIGHT: Extracting the Server from Serverless Computing! High-performance eBPF-based Event-driven, Shared-memory Processing Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian-chin Wang, K. K. Ramakrishnan University of California, Riverside, CA ABSTRACT Serverless computing promises an efficient, low-cost compute ca- pability in cloud environments. However, existing solutions, epit- omized by open-source platforms such as Knative, include heavy- weight components that undermine this goal of serverless comput- ing. Additionally, such serverless platforms lack dataplane optimiza- tions to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. ‘Cold-start’ latency is another deterrent. We describe SPRIGHT, a lightweight, high-performance, respon- sive serverless framework. SPRIGHT exploits shared memory pro- cessing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization- deserialization overheads. SPRIGHT extensively leverages event- driven p
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
- Title: SPRIGHT: Extracting the Server from Serverless Computing! High-performance eBPF-based Event-driven, Shared-memory Processing
- Authors: Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian-chin Wang, K. K. Ramakrishnan (University of California, Riverside, CA)
- Journal/Conference: ACM SIGCOMM 2022 Conference (SIGCOMM '22). SIGCOMM is a premier, highly competitive, and prestigious conference in the field of computer networking and communications. Publication here signifies a work of high impact and technical rigor.
- Publication Year: 2022
- Abstract: The paper identifies that current serverless platforms, such as Knative, are burdened by heavyweight components that contradict the efficiency goals of serverless computing. These platforms suffer from poor dataplane performance, especially in function chains, due to complex protocols, data serialization, and 'cold-start' latency. The authors introduce SPRIGHT, a lightweight serverless framework that uses shared memory and event-driven processing with the extended Berkeley Packet Filter (eBPF). SPRIGHT avoids unnecessary protocol processing and serialization overheads, achieving high performance with significantly lower CPU usage compared to polling-based solutions like DPDK. By using eBPF to replace heavyweight components, SPRIGHT can keep functions 'warm' with negligible cost, effectively eliminating cold starts. Preliminary results show an order-of-magnitude improvement in throughput and latency over Knative.
- Original Source Link:
/files/papers/68e38aa933fab70a3d0ebd31/paper.pdf(Published)
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: The promise of serverless computing—efficiency, low cost, and simplicity—is being undermined by the very platforms built to deliver it. Existing frameworks like Knative are constructed from loosely coupled, heavyweight components (e.g., sidecar proxies, message brokers) that introduce significant overhead.
- Key Gaps:
- Heavyweight Components: Each function pod runs with a dedicated, constantly-running sidecar proxy, which consumes substantial CPU and memory resources even when idle.
- Inefficient Function Chaining: Communicating between functions in a chain involves multiple traversals of the kernel's networking stack, repeated data copying, serialization/deserialization, and protocol processing. This leads to high latency and low throughput, making microservice architectures inefficient.
- 'Cold-Start' Latency: The delay in starting a new function instance from scratch ("cold start") harms application responsiveness, especially for interactive or latency-sensitive workloads.
- Innovation: SPRIGHT fundamentally redesigns the serverless dataplane. Instead of relying on traditional networking between containerized components, it introduces a highly efficient, in-kernel, event-driven communication mechanism based on eBPF and shared memory. This approach "extracts the server" (i.e., the wasteful, always-on components) from serverless computing, making resource usage directly proportional to the workload.
-
Main Contributions / Findings (What):
- Shared-Memory Dataplane: A novel dataplane for serverless function chains that uses shared memory for zero-copy data transfer between functions, eliminating protocol processing and serialization overheads for intra-chain communication.
- eBPF-based Event-Driven Proxies: The introduction of
EPROXYandSPROXY, lightweight eBPF programs that replace heavyweight sidecar proxies. They provide essential functionalities like metrics collection in an event-driven manner, consuming CPU only when requests are being processed. - Elimination of Cold Starts: Because the idle resource cost of SPRIGHT's 'warm' functions is negligible, it becomes practical to keep at least one instance of each function running. This completely sidesteps the cold-start problem without the high cost of traditional 'keep-warm' strategies.
- High Performance and Efficiency: Experimental results demonstrate that SPRIGHT achieves up to 5x higher throughput and 53x lower latency than Knative, while using up to 27x less CPU.
- Secure Multi-Tenancy: SPRIGHT provides isolation between different function chains by using private shared memory pools and eBPF-based message filtering, ensuring security in a shared environment.
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Serverless Computing: A cloud computing model where the cloud provider dynamically manages the allocation and provisioning of servers. Users write and deploy code in the form of functions, and only pay for the exact amount of compute time consumed by those functions in response to events.
- Function Chaining (Microservices): An architectural style where complex applications are composed of small, independent services (functions) that communicate over a network. For example, an e-commerce order might trigger a chain of functions:
validate_order->process_payment->update_inventory. - Knative: A popular open-source serverless platform that runs on top of Kubernetes. It provides components for deploying, running, and managing serverless applications. A key component is the
queue-proxy, a sidecar container that sits alongside the user's function to handle requests and collect metrics. - eBPF (extended Berkeley Packet Filter): A revolutionary kernel technology that allows small, safe programs to be run directly inside the Linux kernel without modifying the kernel's source code. It is event-driven, meaning an eBPF program executes only when a specific event occurs (e.g., a packet arrives at a network interface). This makes it extremely efficient for high-performance networking, security, and observability.
- Shared Memory: A technique for inter-process communication (IPC) where multiple processes can access the same block of physical memory. This is the fastest form of IPC because it avoids the overhead of copying data between the kernel and user processes.
- DPDK (Data Plane Development Kit): A set of libraries and drivers for fast packet processing in userspace. It often achieves high performance through polling, where a CPU core continuously checks for new packets. This provides low latency but consumes 100% of the CPU core's capacity, regardless of the traffic load, making it inefficient for intermittent workloads typical in serverless.
- Cold Start: The latency experienced when invoking a serverless function that is not currently running. The platform must first allocate resources, download the code, initialize a container, and start the function runtime before the user's code can execute. This can take seconds and is a major drawback of serverless.
-
Previous Works & Differentiation:
- The paper acknowledges existing serverless platforms like AWS Lambda, OpenFaaS, and Knative. It critiques them for their reliance on standard, but inefficient, communication patterns (HTTP/REST APIs, message brokers) and heavyweight components like sidecars (
Envoy,queue-proxy). - It cites research that has quantified the overhead of the Linux networking stack (data copies, context switches) and serialization, which supports the motivation for SPRIGHT's kernel-bypass approach for function chains.
- While other works have tried to optimize serverless computing by focusing on faster container runtimes (
Firecracker,SOCK) or smarter resource allocation (Mu,GRAF), they do not address the fundamental inefficiency of the dataplane—how data moves between functions. - SPRIGHT's key differentiator is its holistic redesign of the serverless dataplane by synergistically combining eBPF's efficient, event-driven nature with the zero-copy performance of shared memory. This unique combination directly tackles the root causes of inefficiency in function chaining.
- The paper acknowledges existing serverless platforms like AWS Lambda, OpenFaaS, and Knative. It critiques them for their reliance on standard, but inefficient, communication patterns (HTTP/REST APIs, message brokers) and heavyweight components like sidecars (
4. Methodology (Core Technology & Implementation)
SPRIGHT's architecture is designed for efficiency by centralizing external communication and creating an ultra-fast path for internal communication.
该图像是系统架构示意图,展示了SPRIGHT框架在主节点和工作节点上的组成与数据流程。图中包括用户请求经过入口网关,数据通过SPRIGHT网关中的eBPF程序代理(EPROXY)进入共享内存,再通过路由表传递到多个函数容器(包含SPROXY的eBPF程序)。控制流由SPRIGHT控制器、kubelet、共享内存管理器等组件协同管理,自动伸缩器和指标服务器负责性能监控和调整。不同箭头分别表示指标流、数据包流、描述符流和控制流的传递路径。
-
Overall Architecture (Image 3):
- Control Plane: A cluster-wide
SPRIGHT controllerworks with Kubernetes'skubeletto manage the lifecycle of function chains. When a user requests a new chain, the controller sets up the necessary components on a worker node. - Data Plane:
SPRIGHT Gateway: Each function chain gets its own dedicated, lightweight gateway. This gateway acts as the single entry/exit point for the chain. It performs all necessary network protocol processing (e.g., handling an incoming HTTP request) just once, extracts the raw payload, and places it into a shared memory region.- Shared Memory: A memory pool is allocated exclusively for the function chain. All functions within that chain can access this memory directly.
- Functions with
SPROXY: Each function is instrumented with a lightweight eBPF program calledSPROXY. This replaces the traditional heavyweight sidecar.
- Control Plane: A cluster-wide
-
Optimizing Intra-Chain Communication (Image 4): This is SPRIGHT's core innovation. Instead of sending data over the network between functions, SPRIGHT uses a zero-copy, event-driven mechanism.
-
Payload in Shared Memory: The
SPRIGHT Gatewayplaces the incoming request's payload into the shared memory pool. -
Packet Descriptor: The gateway then creates a small (16-byte) packet descriptor. This is essentially a pointer containing the memory address of the payload and the ID of the next function to be invoked.
-
SPROXYandeBPF Socket Map: The gateway sends this descriptor via a standard socket call. The attachedSPROXYeBPF program intercepts this call. It reads the destination function ID from the descriptor, looks up the target function's socket information in a highly efficient in-kernel data structure called aneBPF socket map, and directly redirects the descriptor to the target function's socket. -
Zero-Copy & Kernel Bypass: This entire redirection happens within the kernel, bypassing the entire TCP/IP stack. The target function receives the descriptor, uses the pointer to read the payload directly from shared memory, processes it, and then repeats the process to invoke the next function in the chain.
该图像为系统架构图,展示了SPRIGHT框架中基于共享内存和eBPF的事件驱动处理流程。图中包含SPRIGHT网关Pod和多个函数Pod,通过共享内存进行读写操作,利用eBPF的socket机制实现描述符传递和socket映射访问。不同流(指标流、数据包流、描述符流、socket映射访问)用不同颜色箭头区分,体现出数据在用户态和内核态之间的交互与高效处理。
-
-
Event-based (
S-SPRIGHT) vs. Polling-based (D-SPRIGHT): The paper compares its eBPF-based event-driven approach (S-SPRIGHT) with a more traditional high-performance alternative using DPDK (D-SPRIGHT). While DPDK also uses shared memory, it relies on polling, where a CPU core constantly spins, checking for new descriptors. The evaluation shows thatS-SPRIGHTachieves nearly the same performance asD-SPRIGHTbut with 10x less CPU usage under realistic loads, because its eBPF programs only execute when there is work to do. -
Direct Function Routing (DFR): To further reduce latency, once the
SPRIGHT Gatewayinvokes the first function in a chain, it gets out of the way. Subsequent invocations happen directly between functions (Function 1->Function 2->Function 3) using theSPROXYmechanism. This is managed by a routing table stored in the shared memory. -
Security Domains (Image 5): SPRIGHT ensures isolation between different function chains running on the same node using a two-pronged approach:
-
Private Shared Memory Pools: Each chain is allocated its own private memory pool. A function from
Chain Acannot access the memory ofChain B. This is enforced by thekubeletand a dedicatedShared memory managerfor each chain. -
SPROXYMessage Filtering: TheSPROXYeBPF program can be configured with filtering rules to prevent a malicious or buggy function from sending descriptors to unauthorized functions, even within the same chain.
该图像为复合图表,包含三部分:(a)展示了不同并发数下D-SPRI、S-SPRI和Knative三者的每秒请求数(RPS)及平均延迟(ms)对比;(b)和(c)分别展示了D-SPRI及S-SPRI与Knative在不同并发数下的CPU使用率百分比。结果表明SPRIGHT(D-SPRI和S-SPRI)在吞吐量和延迟上明显优于Knative,同时CPU使用率显著较低。
-
-
eBPF-based Dataplane Acceleration for External Communication: For traffic outside the function chain (e.g., from the main cluster
Ingress Gatewayto theSPRIGHT Gateway), SPRIGHT uses eBPF programs attached toXDP/TC hooks. These hooks are very early points in the kernel's packet processing path. By redirecting packets at this stage, SPRIGHT bypasses slower parts of the kernel stack likeiptables, further improving performance.
5. Experimental Setup
-
Datasets & Workloads: The authors evaluated SPRIGHT using three representative serverless applications to test different performance characteristics.
该图像为系统架构示意图,展示了SPRIGHT框架中两个安全域(chain #1和chain #2)的工作流程。每个安全域内包含共享内存池和对应的SPRIGHT网关,支持函数(Fn 1、Fn 2、Fn 3)通过共享内存池进行数据交互。kubelet作为工作节点的控制中心,通过控制流管理安全域内共享内存管理器和SPRIGHT网关的协作,图中绿色箭头表示数据路径,蓝色虚线箭头表示控制流。- Online Boutique: A complex microservices application with 10 functions, simulating an e-commerce website. This workload features intricate function call graphs and is used to test performance under heavy, complex load.
- IoT Motion Detection: A simple two-function chain (
sensor->actuator) driven by a real-world dataset of motion events. This workload is characterized by intermittent and bursty traffic, perfect for evaluating the 'cold-start' problem. - Parking Image Detection & Charging: A multi-stage pipeline that processes images from a parking lot. This workload is periodic and includes a computationally intensive function (image detection), testing how the framework handles mixed workloads.
-
Evaluation Metrics:
- Requests Per Second (RPS): Measures throughput, or how many requests the system can handle.
- Response Time: Measures latency, including mean, 95th percentile, and 99th percentile, to understand both typical and worst-case performance.
- CPU Usage: Measures resource efficiency.
-
Baselines:
Knative: The standard, open-source serverless platform, representing the state-of-the-art but heavyweight approach.gRPCmode: A "server-full" but optimized setup where functions run as standard Kubernetes pods without sidecars and communicate directly using gRPC. This isolates the overhead of the serverless platform components.D-SPRIGHT: A version of SPRIGHT using DPDK's polling-based shared memory communication, used to highlight the CPU efficiency gains of the event-drivenS-SPRIGHT.
6. Results & Analysis
-
Core Results (Online Boutique Workload):
该图像为示意图,展示了三种不同场景下的服务功能链结构:(a) 线上精品店场景中,各服务如前端、结账、支付、推荐等通过箭头连接,形成复杂的调用关系;(b) 动作检测场景,传感器功能和执行器功能串联响应动作传感器信号;(c) 停车场景,摄像头采集图像,经过车牌检测、车牌搜索、车牌索引和计费等功能处理,实现图像检测及充电流程。整体体现了面向事件驱动的函数链调用模式。- Throughput and Latency:
Knativequickly became overloaded at 5,000 concurrent users, showing unstable throughput and very high tail latency (95th percentile of 693 ms). In contrast, bothS-SPRIGHTandD-SPRIGHTwere stable up to 25,000 concurrent users, achieving ~5x higher RPS than Knative. Their 95th percentile latencies were dramatically lower (11-13 ms at 5K concurrency). - CPU Efficiency: This is where the event-driven nature of
S-SPRIGHTshines. At 25K concurrency,D-SPRIGHT(polling-based) consumed over 11 CPU cores.S-SPRIGHTdelivered nearly identical throughput and latency while consuming only ~3.5 CPU cores—a 3x reduction in CPU usage. The idle CPU usage ofS-SPRIGHTwas near zero, whileD-SPRIGHTconstantly consumed CPU.
- Throughput and Latency:
-
Bypassing Cold Start (IoT and Parking Workloads):
该图像为性能折线图,横轴为时间(秒),纵轴为请求数(Req/sec)。图中展示了四种不同方案(D-SPRIGHT、S-SPRIGHT、gRPC、Kn)的请求处理能力随时间变化的比较。D-SPRIGHT和S-SPRIGHT性能近似,明显优于gRPC和Kn,且请求数稳定保持在约5500 Req/sec左右。图中右侧嵌入放大图展示了100秒到150秒时间段内的细节,进一步凸显D-SPRIGHT和S-SPRIGHT的高吞吐量表现。-
IoT Workload: With its intermittent traffic, this experiment clearly demonstrated the cold-start problem in
Knative. When a burst of events arrived after an idle period, the response time in Knative spiked to as high as 9 seconds.S-SPRIGHT, which keeps one function instance 'warm', showed consistently low and stable response times throughout the experiment. -
CPU Cost of 'Warm' vs. 'Cold': The key finding is that keeping a
S-SPRIGHTfunction 'warm' is virtually free in terms of CPU when it's idle. In contrast, the CPU cost ofKnative's components (like thequeue-proxy) even under light load, combined with the CPU spikes during cold starts or 'pre-warming', made it far less efficient overall.
该图像是包含九个子图的复合图表,展示了三种不同模型(Knative、gRPC、SPRIGHT)下多条链(Ch-1至Ch-6)的响应时间累计分布函数(CDF)、各时间点响应时间及对应CPU使用率。图中多条曲线对比了各链在不同时间段的性能差异,显示SPRIGHT在响应时间和CPU使用方面均优于Knative,且表现更稳定;gRPC性能介于两者之间。部分子图展示了随时间变化的请求响应延迟波动,以及CPU资源利用率,突显SPRIGHT在高性能和低CPU占用方面的优势。 -
Parking Workload: Even when
Knativeused a 'pre-warming' strategy to avoid cold starts,S-SPRIGHTwas more efficient. The CPU spikes required to create and destroyKnativefunction pods were significant.S-SPRIGHTachieved 16% lower latency while saving 41% of total CPU cycles over the experiment's duration, simply by keeping its efficient functions alive.
-
7. Conclusion & Reflections
-
Conclusion Summary: The paper convincingly argues that the "server" — heavyweight, always-on components and inefficient networking protocols — is still deeply embedded in current "serverless" platforms. SPRIGHT successfully extracts this server by redesigning the dataplane around eBPF-based event-driven processing and shared memory. This results in a framework that is not only an order of magnitude faster but also vastly more resource-efficient. By making idle functions nearly cost-free, SPRIGHT effectively solves the cold-start problem, a long-standing pain point in serverless computing, bringing the field closer to its original promise.
-
Limitations & Future Work: The authors acknowledge several practical limitations:
- Co-location Constraint: All functions within a single chain must be deployed on the same physical node to leverage shared memory. This impacts scheduling flexibility.
- Application Porting: Existing applications based on synchronous HTTP/REST APIs need to be modified to use SPRIGHT's asynchronous, shared-memory I/O.
- Language Support: The current implementation is in C, and support for other popular languages (like Go, Python, Java) is needed for broader adoption.
-
Personal Insights & Critique:
- Impact: This paper presents a paradigm shift for high-performance serverless computing. The core idea of replacing heavyweight userspace sidecars with lightweight, in-kernel eBPF programs is powerful and highly relevant, not just for serverless but also for the broader service mesh ecosystem (e.g., Istio, Linkerd), which suffers from similar sidecar overhead.
- Novelty: The creative use of eBPF's
socket mapfor event-driven redirection of descriptors (pointers) instead of full payloads is a key technical insight. It perfectly marries the efficiency of eBPF with the performance of shared memory. - Practicality: While the co-location constraint is a real trade-off, it is a reasonable one for applications where performance is critical. For many microservice applications, tightly coupled services are often scheduled together anyway to reduce network latency. The need for application porting is a hurdle, but the dramatic performance gains could justify the effort for high-throughput systems. The future development of libraries for other languages will be crucial for SPRIGHT's adoption.
- Overall: SPRIGHT is a well-engineered and rigorously evaluated system that provides a compelling blueprint for the future of efficient, high-performance cloud-native computing. It demonstrates that by leveraging modern kernel capabilities like eBPF, it is possible to build systems that are both extremely fast and incredibly resource-efficient.
Similar papers
Recommended via semantic vector search.