Paper status: completed

A Learned Cache Eviction Framework with Minimal Overhead

Published:01/27/2023

Machine Learning Cache Eviction Framework (1)Integration of Traditional Cache Systems with Machine Learning (1)Efficient Caching Algorithms (1)Production Workload Evaluation (1)Low-Overhead Cache Decision Making (1)

Original Link PDF

Price: 0.100000

2 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

The MAT framework reduces the number of ML predictions for cache eviction from 63 to 2 by using a heuristic as a filter, maintaining low miss ratios similar to state-of-the-art ML systems, which enhances practicality for high-throughput environments.

Abstract

Recent work shows the effectiveness of Machine Learning (ML) to reduce cache miss ratios by making better eviction decisions than heuristics. However, state-of-the-art ML caches require many predictions to make an eviction decision, making them impractical for high-throughput caching systems. This paper introduces Machine learning At the Tail (MAT), a framework to build efficient ML-based caching systems by integrating an ML module with a traditional cache system based on a heuristic algorithm. MAT treats the heuristic algorithm as a filter to receive high-quality samples to train an ML model and likely candidate objects for evictions. We evaluate MAT on 8 production workloads, spanning storage, in-memory caching, and CDNs. The simulation experiments show MAT reduces the number of costly ML predictions-per-eviction from 63 to 2, while achieving comparable miss ratios to the state-of-the-art ML cache system. We compare a MAT prototype system with an LRU-based caching system in the same setting and show that they achieve similar request rates.

Mind Map

In-depth Reading

English Analysis~10 min read · 11,149 chars

1. Bibliographic Information

1.1. Title

Machine Learning Over Heuristic: a Learned Cache Eviction Framework with Minimal Overhead (also referred to as MAT: Machine learning At the Tail)

1.2. Authors

Dongsheng Yang: Princeton University.
Daniel S. Berger: Microsoft Research.
Kai Li: Princeton University.
Wyatt Lloyd: Microsoft Research.

1.3. Journal/Conference

This paper was published as a preprint on arXiv (arXiv:2301.11886). The authors represent top-tier institutions (Princeton and Microsoft Research) known for significant contributions to systems and networking research.

1.4. Publication Year

2023 (Published UTC: 2023-01-27)

1.5. Abstract

The paper addresses the high computational overhead of state-of-the-art Machine Learning (ML) based caching systems. While ML can significantly reduce cache miss ratios compared to traditional heuristics (simple rules-of-thumb), existing ML caches like LRB require too many predictions per eviction (around 64), making them too slow for high-throughput systems. The authors introduce Machine learning At the Tail (MAT), a framework that uses a traditional heuristic (like LRU) as a "filter." ML is only applied to the small subset of objects that the heuristic identifies as candidates for eviction. Evaluation on 8 production workloads shows MAT reduces predictions-per-eviction from 63 to 2 while maintaining the same low miss ratios as complex ML models.

1.6. Original Source Link

Official Source: https://arxiv.org/abs/2301.11886
PDF Link: https://arxiv.org/pdf/2301.11886.pdf

2. Executive Summary

2.1. Background & Motivation

In computing, a cache is a high-speed data storage layer which stores a subset of data, so that future requests for that data are served faster. Because cache space is limited, when the cache is full and new data arrives, the system must decide which old data to remove. This is called cache eviction.

The Core Problem: Traditional eviction methods (heuristics like LRU - Least Recently Used) are incredibly fast but often make sub-optimal decisions, leading to "cache misses" (when the requested data isn't in the cache). Recent ML-based approaches significantly reduce these misses but are computationally "heavy." They often require dozens of ML model predictions for every single eviction, which creates a bottleneck in high-speed systems.
Research Gap: Previous work focused either on being fast (heuristics) or being accurate (ML). There was no framework that successfully combined the speed of heuristics with the intelligence of ML without a massive CPU penalty.
Innovative Entry Point: The authors observed that heuristics aren't "bad"—they are actually "mostly right." They correctly identify the general group of objects that should be evicted but struggle to pick the exact best one. MAT's insight is to let the heuristic do the "heavy lifting" of filtering and use ML only to "double-check" the final few candidates at the "tail" of the queue.

2.2. Main Contributions / Findings

The MAT Framework: A novel architecture that integrates an ML module into existing heuristic-based caching systems.
Heuristic Filtering: The discovery that using a heuristic as a filter can reduce the number of objects the ML needs to analyze by orders of magnitude.
Dynamic Thresholding: A method to automatically adjust the ML's strictness based on the current workload.
Significant Overhead Reduction: MAT reduces predictions per eviction by 31x and total computational time by 17x compared to the state-of-the-art LRB system.
Graceful Degradation: If the CPU is too busy to run ML, MAT simply falls back to the heuristic, ensuring the system never crashes or hangs due to ML overhead.

3.1. Foundational Concepts

Cache Miss Ratio: The percentage of requests that cannot be satisfied by the cache. A lower miss ratio is better because it means fewer slow trips to the main database or the internet.
Heuristic Algorithms: Simple, fixed rules for management.
- Least Recently Used (LRU): Discards the items that haven't been accessed for the longest time.
- First-In-First-Out (FIFO): Discards the oldest items regardless of how often they are accessed.
- Least Frequently Used (LFU): Discards items with the lowest access count.
Belady’s MIN (The Oracle): An "impossible" algorithm that knows the future. It always evicts the object that will be needed furthest in the future. It provides the theoretical "perfect" performance limit for any cache.
Gradient Boosted Decision Trees (GBDT): A type of machine learning model that uses a series of simple decision trees to make a prediction. It is popular in systems research because it is faster than "Deep Learning" (neural networks) while still being very accurate for tabular data.

3.2. Previous Works

LRB (Learning Relaxed Belady): The previous state-of-the-art. It uses ML to estimate the Time-To-next-Access (TTA) for 64 randomly sampled objects and evicts the one with the highest TTA.
Cachelib: An open-source caching engine from Meta (Facebook) used for high-performance production environments. MAT is built and tested as an extension of this engine.

3.3. Technological Evolution

Phase 1 (Simple Heuristics): LRU and FIFO dominated due to their $O(1)$ complexity (they take the same amount of time regardless of cache size).
Phase 2 (Advanced Heuristics): Algorithms like TinyLFU or ARC tried to balance recency and frequency but remained "static" rules.
Phase 3 (Heavy ML): Researchers applied ML to predict the future, achieving massive miss ratio reductions but sacrificing throughput (speed).
Phase 4 (MAT): This paper represents the current frontier—making ML-based caching practical for real-world, high-speed production environments.

3.4. Differentiation Analysis

Unlike LRB, which samples objects randomly from the entire cache, MAT samples objects specifically from the "tail" of the heuristic’s priority queue. This "sampling at the tail" ensures the ML is only looking at objects that are already "likely" to be good eviction candidates, which is why it needs so few predictions.

4. Methodology

4.1. Principles

The core intuition of MAT is that a heuristic algorithm like LRU acts as an excellent pre-filter. As shown in Figure 1, heuristics maintain a ranking. The "tail" of this ranking contains the items the heuristic is about to throw away.

The following figure (Figure 1 from the original paper) illustrates how these heuristics typically structure their data:

Figure 1: Heuristic cache algorithms that maintain the rank of objects in a priority queue. 该图像是示意图，展示了启发式缓存算法如何通过优先级队列维护对象的排名。图中标示了请求的查找、插入及驱逐过程，以及对象索引如何影响头部和尾部的操作。

The authors performed a study (Table 1) comparing heuristics to the optimal Belady's MIN algorithm. They found that heuristics often evict the "right" objects (those with a high Time-To-next-Access), but they also accidentally evict "wrong" objects (those that will be needed soon).

The following are the results from Table 1 of the original paper, showing the distribution of "good" vs "bad" decisions:

Workload	LRU		FIFO		LFUDA		LRUK		Belady's MIN
Workload	TTA < T	TTA > T	TTA < T	TTA > T	TTA < T	TTA > T	TTA < T	TTA > T	TTA < T	TTA > T
CDN1	55%	90%	65%	90%	47%	87%	363%	97%	0%	100%
CDN2	47%	95%	57%	95%	36%	96%	487%	92%	0%	100%
Wikipedia	129%	94%	196%	87%	86%	95%	89%	93%	0%	100%

Key takeaway from the table: Heuristics like LRU evict roughly $90\%+$ of the same "bad" objects that the optimal Belady's MIN would (the $TTA > T$ column). This proves the tail is the right place to look.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. The MAT Architecture

MAT splits the work into two modules:

Heuristic Cache: A standard cache (like LRU) that manages the general flow of data.
ML Module: A background process that handles training and inference (making predictions).

The interaction happens through two primary functions:

RemoveFromTail(): The ML module asks the heuristic for its "worst" item.
Insert(obj, rank): If the ML module decides the item is actually "good," it tells the heuristic to put it back into the queue at a specific rank.

The following figure (Figure 4 from the original paper) shows this integrated data flow:

该图像是一个示意图，展示了机器学习（ML）模块与传统启发式缓存系统之间的交互。图中显示了请求如何通过启发式缓存系统处理，并将候选对象发送到eviction线程进行逐出决策，同时训练数据通过训练线程实现模型的优化。

4.2.2. The Eviction Algorithm and Dynamic Threshold

The most critical part of MAT is the Eviction Algorithm. Unlike traditional ML caches that look at a fixed number of items, MAT looks for one item that is "bad enough" to be evicted. It uses a Time-To-next-Access (TTA) threshold, $T$ . If an object's predicted $TTA \ge T$ , it is evicted immediately.

The process follows Algorithm 1 from the paper:

Initialization: Set $r = 1$ (the count of candidates checked).
Loop: While $r \le L$ $r \leq L$ (where $L$ $L$ is a safety limit, e.g., 10):
- Get an object obj from the heuristic's tail: obj = Heuristic.RemoveFromTail().
- Predict its future access time: TTA = MLModel.Predict(obj).
- Decision: If $TTA \ge T$ , stop and evict this object.
- Re-insertion: If $TTA < T$ (meaning the ML thinks it's still useful), the counter $r$ increments, and the object is put back: Heuristic.Insert(obj, TTA).
Adjusting the Filter ( $T$ ): To ensure the ML doesn't work too hard or too little, MAT adjusts $T$ $T$ dynamically:
- If it took too many steps to find a candidate (high $r$ ), it means $T$ is too strict. Decrease it: T = T \times (1 - \delta).
- If it was too easy (low $r$ ), it means $T$ is too lenient. Increase it: T = T \times (1 + \delta).
- Here, $\delta$ is a small adjustment factor (default $1 \times 10^{-4}$ ).

4.2.3. ML Features and Training

To predict TTA, MAT uses three types of features:

Deltas: The time intervals between the last 32 accesses of an object ( $\Delta_i$ ).
Exponential Decayed Counters (EDC): A score that tracks how recently and frequently an object was accessed, where the score decays over time.
Static Features: Fixed data like object size or type.

Training Efficiency: Instead of training on every request, MAT only trains on objects that were "eviction candidates" at the tail. This reduces training data volume by 50% to 90%.

5. Experimental Setup

5.1. Datasets

The authors evaluated MAT using 8 diverse production workloads.

The following is Table 2 from the paper detailing the traces:

Trace	Type	Object Size		Number of Requests		Requested Bytes		Default Cache Size
Trace	Type	Mean	Max	Total	Unique	Total	Unique	Default Cache Size
CDN1	CDN	2 MB	2 MB	300 M	31 M	585 TB	60 TB	4 TB
Wikipedia	CDN	116 KB	1.3 GB	200 M	15 M	7.9 TB	1.7 TB	256 GB
Memcachier	In-memory	4.6 KB	1 MB	500 M	9 M	1 TB	40 GB	1 GB
Microsoft	Storage	445 KB	6 MB	200 M	48 M	5.1 TB	2 TB	512 GB

(Note: Table abbreviated for clarity; original contains 8 rows including CDN2, CDN3, InMem, and IBM merged.)

5.2. Evaluation Metrics

Byte Miss Ratio: The ratio of requested bytes not found in the cache. $ \mathrm{Byte\ Miss\ Ratio} = \frac{\sum \mathrm{Size}(\mathrm{Requested\ Objects\ not\ in\ Cache})}{\sum \mathrm{Size}(\mathrm{All\ Requested\ Objects})} $
Predictions per Eviction: The average number of times the ML model is queried to decide on one eviction.
Software Overhead (Time): The time (in microseconds, $\mu s$ ) spent building features, running predictions, and training.
Request Processing Rate: The number of requests handled per second (throughput).

5.3. Baselines

LRU/FIFO/TinyLFU: Standard heuristic algorithms.
LRB (Learning Relaxed Belady): The state-of-the-art ML-based cache that MAT aims to optimize.
Belady’s MIN: The theoretical optimal (offline) limit.

6. Results & Analysis

6.1. Core Results Analysis

MAT achieves its primary goal: matching ML accuracy with heuristic speed. As shown in Figure 6, MAT's miss ratio (with only 2 predictions) is virtually identical to LRB (with 64 predictions).

The following figure (Figure 6 from the paper) shows this comparison:

该图像是一个对比不同缓存系统（如MAT、LRU等）在不同缓存大小下的字节失效率的折线图，展示了在各种工作负载下，MAT框架表现出的高效性和较低的失效率。

Key observation: In all traces, the "MAT(2)" line (green) overlaps or stays very close to the "LRB(64)" line (red), while significantly outperforming the "LRU" baseline (blue).

6.2. Data Presentation (Overhead)

The following are the results from Table 3 of the original paper, showing the massive reduction in computational cost:

Metric	LRB (Previous SOTA)	MAT (This paper)	Reduction
Number of predictions	63.2	2.0	32x
Prediction time (us)	240	6.4	38x
Feature building time (us)	60	2.9	21x
Total time per eviction (us)	300	9.3	32x

6.3. Prototype Throughput

When implemented in a real system (Cachelib), MAT maintained throughput levels very close to the simple LRU algorithm.

The following are results from Table 5 of the paper:

Trace	Cache Size	MAT (req/s)	LRU (req/s)	Difference
Wikipedia	512 GB	30,258	31,181	-3.0%
Memcachier	Infinite	52,883	51,380	+2.9%

6.4. Tolerance to Slow ML

The authors tested what happens if the ML model is artificially slowed down (e.g., due to high system load). Figure 8 shows that MAT's performance "degrades gracefully"—even with a 1ms delay (huge for a cache), MAT still performs as well as or better than LRU.

7. Conclusion & Reflections

7.1. Conclusion Summary

MAT proves that we do not need to choose between efficiency and accuracy in caching. By using a heuristic as a filter and applying ML "at the tail," MAT achieves state-of-the-art miss ratios with a tiny fraction (roughly $3\%$ ) of the computational cost of previous ML methods. It is practical for high-throughput production systems, as evidenced by its integration into Facebook’s Cachelib.

7.2. Limitations & Future Work

Metadata Storage: While the CPU overhead is low, MAT still needs to store "metadata" (history) for every object in the cache to make predictions. This consumes RAM.
Heuristic Dependency: MAT relies on the heuristic being "mostly right." If a workload is so chaotic that a heuristic is $100\%$ wrong, the filter might fail.
Future Directions: The authors suggest that because MAT is so efficient, researchers can now afford to use even more "sophisticated" ML models (like neural networks) that were previously too slow to even consider.

7.3. Personal Insights & Critique

Inspiration: The "Filter" concept is a powerful systems design pattern. It acknowledges that simple algorithms are great at narrowing down the search space, while complex algorithms are best at final selection. This pattern could be applied to load balancing or network routing.
Critique: One area for improvement is the discussion on write amplification. If MAT re-inserts objects into the cache frequently, how does that affect SSD lifespan? While the paper mentions re-insertion into the priority queue, it doesn't deeply analyze the physical storage impact of "saving" an object from eviction.
Assumptions: The paper assumes that "past is prologue"—that access patterns are consistent enough for GBDT to learn. In highly adversarial or perfectly random workloads, the ML module might provide no benefit over the heuristic.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.