AnyKey: A Key-Value SSD for All Workload Types
TL;DR Summary
AnyKey, a novel KV-SSD design, addresses performance issues in existing systems under workloads with larger keys by optimizing metadata size. It outperforms state-of-the-art KV-SSD designs across various workload types.
Abstract
Key-value solid-state drives (KV-SSDs) are considered as a potential storage solution for large-scale key-value (KV) store applications. Unfortunately, the existing KV-SSD designs are tuned for a specific type of workload, namely, those in which the size of the values are much larger than the size of the keys. Interestingly, there also exists another type of workload, in practice, in which the sizes of keys are relatively large. We re-evaluate the current KV-SSD designs using such unexplored workloads and document their significantly-degraded performance. Observing that the performance problem stems from the increased size of the metadata, we subsequently propose a novel KV-SSD design, called AnyKey, which prevents the size of the metadata from increasing under varying sizes of keys. Our detailed evaluation using a wide range of real-life workloads indicates that AnyKey outperforms the state-of-the-art KV-SSD design under different types of workloads with varying sizes of keys and values.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
AnyKey: A Key-Value SSD for All Workload Types
1.2. Authors
- Chanyoung Park (Hanyang University, Ansan, Republic of Korea)
- Jungho Lee (Hanyang University, Ansan, Republic of Korea)
- Chun-Yi Liu (Micron Technology Inc., Folsom, CA, USA)
- Kyungtae Kang (Hanyang University, Ansan, Republic of Korea)
- Mahmut Taylan Kandemir (Pennsylvania State University, University Park, PA, USA)
- Wonil Choi (Hanyang University, Ansan, Republic of Korea) - Corresponding author
1.3. Journal/Conference
The paper does not explicitly state the specific journal or conference in the provided text. However, the presence of CCS Concepts, Keywords, and ACM Reference Format suggests publication in a computing or systems conference/journal, likely associated with ACM. Given the technical depth and focus on storage systems, venues like USENIX FAST, USENIX ATC, ACM ASPLOS, or ACM SIGMOD are common for such research. These venues are highly reputable and influential in the fields of operating systems, computer architecture, and storage.
1.4. Publication Year
2025 (Published at: 2025-02-06T00:00:00.000Z)
1.5. Abstract
Key-value solid-state drives (KV-SSDs) are a promising storage solution for large-scale key-value (KV) store applications. However, existing KV-SSD designs are primarily optimized for workloads where values are much larger than keys (referred to as high-v/k workloads). The authors discover a significant performance degradation in current KV-SSD designs when subjected to low-v/k workloads, where key sizes are relatively large. This performance issue is attributed to the increased size of metadata, which cannot be fully accommodated in the device-internal DRAM.
To address this, the paper proposes AnyKey, a novel KV-SSD design that prevents metadata size from increasing with varying key sizes. AnyKey achieves this by grouping KV pairs and storing only one key per group in the metadata, while also introducing hash lists and a value log for efficient management and reduced flash accesses. Through extensive evaluation using diverse real-life KV workloads, AnyKey demonstrates superior performance over state-of-the-art KV-SSD designs, particularly in low-v/k scenarios, and its enhanced version () further improves performance across all workload types.
1.6. Original Source Link
/files/papers/6916de05110b75dcc59adfbc/paper.pdf
This is an internal file path, suggesting it's likely a PDF hosted on a server. Its publication status appears to be "published" given the explicit publication date.
2. Executive Summary
2.1. Background & Motivation
The explosive growth of key-value (KV) store-based applications and services has driven the development of Key-Value Solid-State Drives (KV-SSDs). KV-SSDs aim to integrate the KV store directly into the SSD device, offering several advantages over traditional host-based KV stores using block SSDs:
-
Low-latency access: Streamlines the software stack for direct
KVrequests. -
Improved scalability: Simply adding more
KV-SSDsenhances system scalability by reducingCPUandmemoryrequirements on the host. -
Enhanced bandwidth and device lifetime:
SSD-aware KV storemanagement (e.g., reducingwrite amplification).Despite these potential benefits, the adoption of
KV-SSDshas been slow, partly because their performance often lags behindhost-based KV stores. The key challenge lies in effectively utilizing theSSD's limited internal resources to meet performance demands.
The core problem identified by the paper is that existing LSM-tree-based KV-SSD designs are predominantly tuned for specific workload types, namely high-v/k workloads where the size of values is significantly larger than keys. The authors observed that in practice, there are also low-v/k workloads with relatively large key sizes. When existing KV-SSDs (like the state-of-the-art PinK) are re-evaluated under these unexplored low-v/k workloads, their performance degrades significantly. This degradation is attributed to an explosion in metadata size, which can no longer fit into the SSD's internal DRAM. Consequently, requests incur additional flash accesses to retrieve metadata, drastically increasing tail latencies and decreasing IOPS.
The paper's innovative idea, or entry point, is to address this metadata scalability issue for low-v/k workloads. By redesigning how KV pairs are indexed and managed within the KV-SSD, AnyKey aims to minimize metadata size and ensure it always resides in DRAM, thereby making KV-SSDs truly versatile for all workload types.
2.2. Main Contributions / Findings
The paper makes the following primary contributions:
-
Unveiling the
low-v/kWorkload Problem: The authors identify and characterize a range of real-worldKV workloadswithlow value-to-key ratios(low-v/k) from various domains. They demonstrate that existingKV-SSDdesigns exhibit significantly degraded performance under these previously underexplored workloads. This finding highlights a crucial limitation in currentKV-SSDresearch and design. -
Proposing
AnyKeyforlow-v/kWorkloads: They propose and evaluate a novelKV-SSDdesign,AnyKey, specifically targetinglow-v/kworkloads. The core innovation ofAnyKeyis to streamline themetadatarequired for locatingKV pairs, making it compact enough to fit entirely within theSSD's internalDRAM. This design significantly improvestail latencies,IOPS, anddevice lifetimecompared to state-of-the-art designs. -
Presenting for All Workloads: The paper also introduces an enhanced version, , which further improves
IOPSforhigh-v/kworkloads by mitigating high overheads associated withLSM-treemanagement (specifically, "compaction chains"). This enhancement ensures that outperforms the state-of-the-art across all types ofKV workloads, making it a truly universalKV-SSDsolution.The key conclusion is that
AnyKey(and its enhanced version ) successfully resolves the performance bottleneck ofKV-SSDsunderlow-v/kworkloads, which primarily stems from unmanageably largemetadata. By intelligently groupingKV pairsand separating values,AnyKeyensuresmetadataremainsDRAM-resident, leading to substantial performance gains and increasedstorage utilizationanddevice lifetime.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully understand the AnyKey paper, a foundational understanding of key-value stores, SSDs, and Log-Structured Merge (LSM) trees is essential.
3.1.1. Key-Value (KV) Store
A key-value (KV) store is a simple database model designed to store, retrieve, and manage associative arrays (also known as dictionaries or hash maps). In this model, data is stored as a collection of key-value pairs, where each key is a unique identifier, and the value is the data associated with that key. The key is used to efficiently retrieve its corresponding value.
- Example: Imagine a contacts list. The
keycould be a person's name (e.g., "Alice Smith"), and thevaluecould be their phone number, email, and address (e.g., "{'phone': '555-1234', 'email': 'alice@example.com'}"). - Operations: Basic operations include
PUT(insert/update a key-value pair),GET(retrieve a value given a key), andDELETE(remove a key-value pair). - Applications: Widely used in web services, mobile apps, databases, and caches due to their simplicity, scalability, and high performance for specific access patterns.
3.1.2. Solid-State Drive (SSD)
An SSD is a type of non-volatile storage device that stores persistent data on flash memory cells. Unlike traditional hard disk drives (HDDs) that use spinning platters and read/write heads, SSDs have no moving parts, leading to significantly faster data access, lower latency, and higher durability.
- Flash Memory: The core component of an
SSD. Data is stored inNAND flash memorycells. - Pages: The smallest unit of data that can be written to or read from
flash memory. Typically 4KB, 8KB, or 16KB in size. - Blocks: A collection of multiple
pages(e.g., 64, 128, 256 pages). This is the smallest unit of data that can be erased. - Erase-before-write: A fundamental characteristic of
flash memory. Before new data can be written to apage, the entireblockcontaining thatpagemust first be erased. This makesin-place updates(modifying data directly in its location) inefficient and costly forSSDs. - Flash Translation Layer (FTL): A software layer within the
SSD controllerthat manages the mapping between logical block addresses (LBAs) from the host and physical page addresses (PPAs) on theflash memory. It handleswear leveling,garbage collection, andbad block managementto extend theSSD's lifespan and present a block device interface to the host.
3.1.3. Key-Value SSD (KV-SSD)
A KV-SSD is an SSD that exposes a key-value interface directly to the host system, rather than the traditional block-level interface. Instead of the host managing key-value pairs and translating them into block I/Os, the KV-SSD itself understands and manages key-value requests.
- Advantages:
- Reduced Host Overhead: Offloads
KV storemanagement tasks (likeLSM-treecompaction,garbage collection) from the hostCPUandDRAMto theSSDcontroller. - Optimized Storage: The
SSDcontroller has intimate knowledge of theflash memorycharacteristics. By directly managingKV pairs, it can performSSD-awareoptimizations, such as reducingwrite amplification(WA) and improvingdata locality. - Streamlined I/O Path: Eliminates layers of
software stackbetween the application and the physical storage, leading to lowerlatency.
- Reduced Host Overhead: Offloads
- Internal Architecture:
KV-SSDsstill containflash memoryand acontroller. The difference is in the firmware on the controller, which now includes logic to manageKV pairsdirectly using data structures likehash tablesorLSM-treesand their associatedmetadata.
3.1.4. Log-Structured Merge (LSM) Tree
An LSM-tree is a data structure optimized for write-heavy workloads, commonly used in key-value stores like RocksDB and LevelDB. It is particularly flash-friendly due to its out-of-place update mechanism.
- Structure: An
LSM-treeorganizes data into multiple sorted components (levels).- Memtable (C0): An in-memory component (usually
DRAM) where all new writes are buffered. It's typically a sorted data structure (e.g., skip list, B-tree). - SSTables (L1, L2, ...): When the
Memtablebecomes full, its contents are flushed to disk (or flash) as an immutableSorted String Table (SSTable). This becomes the first level (L1). SubsequentSSTablesare merged from higher levels to lower levels (L1 merges into L2, L2 into L3, etc.).
- Memtable (C0): An in-memory component (usually
- Out-of-Place Updates: Instead of modifying data directly, new versions of data are always written to the
Memtable. This means old versions of data on disk are never overwritten; they are eventually cleaned up duringcompaction. This is highly beneficial forSSDsbecause it avoids expensivein-place updatesthat would triggerblock erases. - Compaction: The process of merging
SSTablesfrom a higher level withSSTablesfrom the next lower level. During compaction,KV pairsare read from both levels, sorted, and then written out as new, largerSSTablesfor the lower level. Invalid (overwritten or deleted)KV pairsare discarded during this process, reclaiming space. This is a critical background operation that consumesI/O bandwidthand impacts performance. - Tail Latencies: An
LSM-treecan guaranteebounded tail latenciesbecause its multi-level indexing structure allows for predictable access paths, althoughcompactionoperations can introduce latency spikes if not managed carefully.
3.1.5. Metadata
In the context of storage, metadata is "data about data." For a KV-SSD, metadata includes information necessary to locate KV pairs stored in the flash memory. This typically involves:
- Keys: Or at least prefixes/hashes of keys for indexing.
- Pointers/Addresses: Physical addresses (PPAs) in
flash memorywhere the actualKV valuesorKV entitiesare stored. - Index Structures: Data structures (like
level listsormeta segmentsin PinK, orlevel listsandhash listsin AnyKey) that organize these keys and pointers for efficient searching. - Location: Ideally,
metadatashould reside in theSSD's fast internalDRAMfor quick access. IfDRAMis insufficient,metadatamust be stored inflash, incurring additional, slower flash accesses.
3.1.6. Tail Latency
Latency is the time delay between a request and its response. Tail latency refers to the latency experienced by a small percentage of requests, typically the slowest ones (e.g., the , , or percentile).
- Importance: In large-scale online services, even a small fraction of slow requests can significantly impact user experience or overall system performance. High
tail latenciescan be caused by various factors, includinggarbage collection,compaction, or contention for resources. - Example: A percentile read latency of 10ms means that 95% of read requests complete within 10ms, but 5% take longer.
3.1.7. IOPS (Input/Output Operations Per Second)
IOPS is a common performance metric that measures the number of distinct input and output operations a storage device or system can handle in one second. Higher IOPS generally indicates better performance for transactional workloads.
3.1.8. Write Amplification (WA)
Write amplification is an undesirable phenomenon in flash memory where the actual amount of data written to the flash media is a multiple of the amount of data originally written by the host.
- Causes:
Out-of-place updates,garbage collection(which involves reading valid data from old blocks and rewriting them to new blocks), andwear leveling. - Impact: High
WAreduces theSSD's lifespan (due to a finite number of erase/program cycles perflash cell) and consumesI/O bandwidth, leading to lower performance.
3.1.9. Garbage Collection (GC)
Garbage collection is a background process in SSDs essential for managing flash memory. Due to the erase-before-write constraint, SSDs cannot simply overwrite old data.
- Process: When
flash blockscontain a mix of valid (live) and invalid (stale/deleted)pages,GCidentifiesvictim blocks. It then reads all thevalid pagesfrom thesevictim blocks, consolidates them, and rewrites them to new, emptypagesin otherblocks. Finally, thevictim blocksare erased, making them available for new writes. - Overhead:
GCoperations consume significantI/O bandwidth(reading and writing valid data) andCPUresources, potentially increasinglatencyand reducingIOPSfor foregroundI/Os.
3.2. Previous Works
The paper categorizes existing KV store execution models into three types and discusses the evolution of KV-SSD designs, particularly focusing on LSM-tree-based approaches.
3.2.1. Execution Models of KV Store
- Host + Block Device (Figure 3a):
- Description: The traditional approach where a
KV store(e.g., RocksDB, LevelDB) runs entirely on the host system, using conventional block-basedSSDs. - Performance Issue: This model suffers from performance bottlenecks due to the full software stack required to generate block
I/Os and access the underlying storage. - Prior Efforts: Many research efforts [7, 11, 18-20, 44, 46-49, 61] have focused on optimizing host-based
KV storesto better leverage block devices. More recent work [12, 16, 23, 35, 37, 54, 56, 60] introducedbyte-addressable persistent memory (PM)in the host to reduce blockI/Otraffic.
- Description: The traditional approach where a
- Host + SSD Support (Figure 3b):
- Description: This model attempts to improve performance by enabling the
host-based KV storeto leverage specific features or internal structures of theSSD. TheSSDis no longer treated as a "black box." - Examples: Defining new
SSD interfacesthat expose device-internal parallelism or transactional persistence capabilities [50], or modifying kernel drivers forOpen-Channel SSDsto improve caching and scheduling [63].
- Description: This model attempts to improve performance by enabling the
- KV-SSD (Figure 3c):
-
Description: The most recent approach, where the
KV storelogic is moved directly into theSSD controller. The host issuesKVrequests, and theSSDmanages theKV pairsinternally. -
Advantages: Higher performance, better scalability, and more efficient device management due to
SSD-awareoptimizations [4, 36, 38]. -
Focus of Paper: This paper focuses on improving
KV-SSDdesigns.The following figure (Figure 3 from the original paper) illustrates the three execution models and the internal architecture of a KV-SSD.
该图像是一个柱状图,展示了不同工作负载下的页面写入总数。各个工作负载的页面写入数量(以百万计)通过三种不同的设计(Pink、AnyKey 和 AnyKey+)进行比较,显示出在转换条件下 AnyKey 的优势。
-
Figure 3. (a-c) The comparison of three execution models for a KV store, and (d) the internal architecture of a KV-SSD.
3.2.2. Existing KV-SSD Designs
KV-SSDs share the same hardware architecture as conventional block SSDs (flash chips, blocks, pages). The key difference lies in how they store and locate KV pairs and their metadata (usually in DRAM). The choice of metadata data structure is critical.
-
Hash-based KV-SSD:
- Description: Early
KV-SSDdesigns often usedhash tables[8, 24, 32, 36, 64] formetadatadue to their simplicity. - Issues:
- In-place Updates:
Hash tablestypically requirein-place updatesforKV pairs. OnSSDs, this leads to partial page updates and frequent, costlygarbage collection (GC)operations, wastingbandwidthand reducingdevice lifetime. - Hash Collisions: To avoid frequent
GC,hash tablesmight be minimized to fit inDRAM. However, a smallhash tableincreases the likelihood ofhash collisions, leading to additional flash accesses to resolve collisions and consequently increasingtail latencies.
- In-place Updates:
- Alternative Sorted Indexes: Other sorted indexing structures like
B-trees[15] andlearned indexes[18, 40] also suffer fromin-place updateissues, making them less suitable forwrite-heavy workloadsonSSDs.
- Description: Early
-
LSM-tree-based KV-SSD:
-
Description:
LSM-trees[13, 30, 31, 43, 51] have gained popularity as an alternative due to theirflash-friendly out-of-place update mechanism. They are now the mainstream forKV-SSDdesigns. -
PinK as State-of-the-Art: The paper identifies
PinK[30, 31] as a state-of-the-artLSM-tree-based KV-SSDdesign and uses it as the primary baseline for comparison. -
PinK's Structure (Figure 4):
-
Data Segments:KV pairsare stored directly inflash pagesasdata segments. -
Meta Segments: Keys and theirPhysical Flash Addresses (PPAs)are sorted withinmeta segments. Upper-levelmeta segmentsare kept inDRAM, while lower-level ones are stored inflash. -
Level Lists: For eachmeta segment, alevel list entryexists, containing themeta segment's first key and its location (DRAM or flash address).Level listsare grouped into multiple levels (L1 to Ln) and sorted.The following figure (Figure 4 from the original paper) illustrates the internal structure of PinK.
该图像是图表,展示了三种系统在不同工作负载下的存储利用率。纵轴表示存储利用率,横轴则列出了各个系统,包括 PinK、AnyKey 和 AnyKey+。图表显示,AnyKey 和 AnyKey+ 在大部分工作负载中展现出优于 PinK 的性能。
-
-
Locating a KV pair (Read Operation in PinK, Figure 4):
- Given a
key(e.g., "PP"),PinKsearcheslevel listsfrom L1 downwards. - If an
entryin alevel list(e.g., L1) is found where "PP" falls into itskey range, its correspondingmeta segmentis accessed (e.g., fromDRAM address 0x10). - The
key"PP" is then searched within thismeta segment. If not found (due to overlapping key ranges across levels), the search continues to the nextlevel list. - If "PP" is found in a
meta segmentcorresponding to an entry in a lowerlevel list(e.g., L2), thePPAof thedata segmentcontaining the targetKV pairis obtained. - Finally, the
data segmentat thatPPAis read, and thevalueis extracted.
- Given a
-
Write Operation in PinK: New
write requestsare initially buffered inDRAM. When the buffer fills,KV pairsare merged into L1 (L0-L1compaction), generating newlevel list entriesandmeta segmentsinDRAM, whilevaluesare written todata segmentsinflash.
-
3.3. Technological Evolution
The evolution of storage systems for key-value stores has progressed from entirely host-managed, block-device-agnostic approaches to increasingly SSD-aware and eventually in-storage computation models.
-
Traditional Host-Managed (early 2000s onwards): Initially,
KV storeslike Berkeley DB, and later NoSQL databases like Cassandra and MongoDB, ran on conventionalHDDsorblock-based SSDs. TheKV storesoftware (e.g.,LSM-trees) handled all indexing,compaction, andgarbage collectionlogic. TheSSDwas a passive block device, abstracted by theFTL. This led to highCPUandDRAMoverheads on the host, andI/Opatterns (especially random writes) that were oftenflash-unfriendly. -
SSD-Aware Host-Side Optimizations (mid-2010s): Researchers and developers began optimizing
host-based KV storesto be moreSSD-aware. This involved modifyingLSM-treecompaction strategies (e.g., WiscKey [49] separating keys from values), improving caching, and exploitingSSDparallelism, sometimes throughOpen-Channel SSDsor customSSDinterfaces. The goal was to reducewrite amplificationandtail latenciesby coordinating host-sideKV storelogic withSSDinternals. -
KV-SSDs (late 2010s onwards): The logical next step was to move the
KV storeintelligence directly into theSSD controller. EarlyKV-SSDsprimarily usedhash tablesfor indexing, leveraging theSSD's internalDRAMformetadata. However,hash-baseddesigns faced challenges within-place updates(leading toGCoverheads) andhash collisions. The industry quickly shifted towardsLSM-tree-based KV-SSDs(e.g.,PinK), which are inherentlyflash-friendlydue toout-of-place updatesand tiered storage management. These designs promisedbounded tail latenciesand reducedwrite amplification.This paper's work,
AnyKey, fits into the latest stage of this evolution. It builds upon theLSM-tree-based KV-SSDparadigm but addresses a critical oversight: the assumption thatKV-SSDswould primarily handlehigh-v/kworkloads.AnyKeypioneers optimizations forlow-v/kworkloads, ensuring thatLSM-tree-based KV-SSDscan perform efficiently across a broader spectrum of real-worldKV storeapplications.
3.4. Differentiation Analysis
Compared to PinK and other contemporary LSM-tree-based KV-SSDs, AnyKey introduces fundamental differences in metadata management and LSM-tree compaction strategies, driven by its focus on low-v/k workloads.
-
Metadata Representation and Size:
- PinK: Maintains individual
meta segmentsfor each key (or small groups of keys) which store keys andPPAsin sorted order. Forhigh-v/kworkloads, where keys are small, thesemeta segmentsare compact and can mostly reside inDRAM. However, forlow-v/kworkloads (large keys), thecollective sizeofmeta segmentsexplodes, forcing many to be stored inflash. This incurs multiple flash accesses formetadatalookups, causing severe performance degradation. - AnyKey: Drastically reduces
metadatasize by grouping multipleKV pairsintodata segment groups. Instead of storing metadata for eachKV pair,AnyKeystoresmetadataonly for eachdata segment group. This meanslevel list entriescontain only the smallest key of the group, thePPAof the first page, and a list ofhash valuesfor the first keys in each page within the group. This design ensures themetadata(level lists and hash lists) remains small enough to fit entirely withinDRAM, regardless of the individual key sizes.
- PinK: Maintains individual
-
Handling Metadata in DRAM vs. Flash:
- PinK: Dynamically stores
meta segmentsinDRAMorflashbased on availableDRAMcapacity, leading to unpredictable flash accesses formetadataunderlow-v/kscenarios. - AnyKey: Explicitly designs
metadata(level listsandhash lists) to beDRAM-resident. By minimizing their size,AnyKeyguarantees thatKV pairlocation can almost always be determined without additionalflash accessesformetadata.
- PinK: Dynamically stores
-
Mechanism for Locating KV pairs within a Group:
- PinK: Directly stores keys in
meta segmentsand uses a binary search (or similar) to find the key within themeta segment. - AnyKey: Introduces
hash values(e.g., 32-bitxxHash) for keys.KV pairswithin adata segment groupare sorted by thesehash values. To quickly find aKV pairwithin a group,AnyKeyuses a list ofhash valuesfor the first keys on each page. This allows for efficientpage-leveltargeting using a smallermetadatafootprint than storing actual keys for each page. It also includeshash collision bitsto manage rare cases of hash collisions across page boundaries.
- PinK: Directly stores keys in
-
LSM-tree Compaction Efficiency:
- PinK: During
compaction,KV pairs(both keys and values) are read from source levels and rewritten to the destination level. Whenmeta segmentsare in flash, this involves substantial reads and writes of key data, further exacerbatingcompactionoverhead. - AnyKey: Introduces a
value loginflash, separating values from keys indata segment groups. Duringcompaction, only keys and pointers (if values are in thevalue log) are moved, significantly reducingread/write amplificationfor large values. further refines this by modifyinglog-triggered compactionto prevent "compaction chains," where one compaction triggers another, which is a common issue forLSM-treesunder specific update patterns andvalue loginteractions.
- PinK: During
-
Storage Utilization:
-
PinK: Under
low-v/kworkloads, ifmeta segmentsare too large forDRAM,PinKstores them inflash. This effectively means two copies of key information reside inflash(one inmeta segments, one indata segmentswith values), leading to lowerstorage utilization. -
AnyKey: By eliminating
meta segmentsinflashand always keepingmetadatainDRAM,AnyKeyonly stores one copy of each unique key's information inflash, thus achieving higherstorage utilization.In essence,
AnyKeyrepresents a fundamental architectural shift forKV-SSDsto handlelow-v/kworkloads efficiently, while extends this efficiency to cover all workload types by optimizingLSM-treedynamics.
-
4. Methodology
4.1. Principles
The core principle behind AnyKey is to fundamentally change how metadata is managed within a KV-SSD to ensure it remains compact enough to always reside in the device-internal DRAM, regardless of the key size. This directly addresses the performance degradation observed in existing KV-SSD designs when faced with low value-to-key ratio (low-v/k) workloads.
The intuition is that existing LSM-tree-based KV-SSDs (like PinK) store metadata that explicitly references every single KV pair in flash. When keys are small, this metadata is manageable. However, as key sizes increase (as in low-v/k workloads), the sheer volume of metadata (specifically, the meta segments in PinK) can exceed DRAM capacity, forcing it into slower flash memory. This results in extra flash accesses for every KV lookup, causing tail latencies to skyrocket and IOPS to plummet.
AnyKey's solution is built on two main ideas:
-
Grouping KV pairs: Instead of individual
KV pairs,AnyKeymanagessets of sorted KV pairsin logical units calleddata segment groups.Metadatais then maintained only for thesegroups, rather than for eachKV pair. This dramatically reduces themetadatafootprint. -
Separating values from keys (for efficiency): While grouping helps
metadatasize,compactionoperations inLSM-treesstill involve movingKV pairs. If values are large and physically co-located with keys withindata segment groups,compactionwould incur significantread/write amplification. To mitigate this,AnyKeyintroduces avalue logto store values separately, allowingcompactionto primarily handle smaller key/pointer pairs, thus improvingLSM-tree management efficiencyanddevice lifetime.By implementing these principles,
AnyKeyaims to ensure thatmetadatalookups are alwaysDRAM-speedandLSM-treebackground operations (likecompaction) are optimized, leading to highIOPSand lowtail latenciesacross a wide range ofKV workloads.
4.2. Core Methodology In-depth (Layer by Layer)
AnyKey introduces novel data structures and modifies LSM-tree operations to achieve its goals. The internal structure shown in Figure 6 describes the new components.
The following figure (Figure 6 from the original paper) describes the internal structure of AnyKey.

该图像是图表,显示了在不同工作负载下,Crypto1、ETC 和 W-Pink 三种情形下的读取延迟的累积分布函数(CDF)。图中包含了4KB、8KB和16KB三种页面大小的延迟比较,横坐标为延迟(毫秒),纵坐标为CDF值。不同颜色和线型表示不同的页面大小。
4.2.1. Minimizing the Size of the Metadata
The primary innovation is to manage KV pairs in sorted groups to drastically reduce the metadata size.
4.2.1.1. Data Segment Group (in the flash)
- Concept:
AnyKeycombines multipleflash pagesto form adata segment group. Within eachdata segment group,KV pairsare stored in sorted order. - Implementation: In the current implementation,
AnyKeyintegratesneighboring physical pageswithin aflash blockto form adata segment group. This simplifies access, as thePPAsof pages within a block are sequential. A group can contain 32 8KBflash pages, accommodating 100-5,000KV pairsunder tested workloads. - KV Entity: Each
KV pairwithin adata segment groupis stored as aKV entity, which comprises:(i) The key: The actual key string.(ii) The hash value of the key: A small-sized (e.g., 32-bit)hash valueof the key. This is crucial for sorting and locatingKV pairsefficiently within a group, especially when keys are large, without needing to store the full key in internal lookup structures.(iii) The value: The actual value, or aPPApointer to where the value is stored in a separatevalue log(explained later).
- Hash-based Sorting:
KV pairswithin adata segment groupare sorted based on the 32-bithash valuesof their keys. This allows efficient binary search to locate a targetKV paironce itsdata segment groupis identified. - Hash Collision Handling: A potential issue is
hash collisions, where different keys have the samehash value. SinceKV entitieswith identicalhash valuesare stored consecutively, collisions might span across page boundaries.-
To handle this,
AnyKeyreserves 2 bits per page, calledhash collision bits. -
If a page contains
KV entitieswith ahash valuethat continues into thenext page, the bits are set to01. -
If a
hash valuefrom theprevious pagecontinues into the current page, the bits are set to10. -
This mechanism helps determine if an adjacent page needs to be read to find a
KV entitywith a colliding hash. The paper notes that such collisions are rare (0.075% of groups), so their performance impact is negligible.The following figure (Figure 7 from the original paper) illustrates how hash collision bits are used when two different keys have the same hash value.
该图像是示意图,展示了两种类型的压缩操作:树触发压缩和日志触发压缩。在树触发压缩中,当目标源达到其阈值时,数据段组和价值日志发生合并;而在日志触发压缩中,若日志满则进行合并和回收操作。
-
Figure 7. When two different keys have the same hash value.
4.2.1.2. Level Lists (in the DRAM)
- Concept: Similar to existing
LSM-treedesigns,AnyKeyuseslevel liststo index thedata segment groups. However, eachlevel list entryis redesigned to holdgroup metadata. - Structure of a Level List Entry: For each
data segment group, alevel list entry(residing inDRAM) contains:(i) The smallest key: The smallest key of the entiredata segment group.(ii) The PPA of the first page: ThePhysical Page Addressof the very first page in thedata segment group. Since pages within a group are contiguous (from neighboring pages in a block), this singlePPAallows access to all pages in the group.(iii) A list of the hash values of the first keys in the pages: A list of truncatedhash values(e.g., first 16 bitsHash[0:15]) corresponding to the first key on each page within thedata segment group.
- Lookup Process:
- First, the
level lists(sorted by smallest key) are searched to identify thedata segment groupthat might contain the target key. - Once a
data segment groupis identified, thePPAof its first page is known. - To find the exact
flash pagewithin that group, thehash valueof the target key is compared against thelist of hash valuesfor the first keys in each page (item (iii) above). This quickly narrows down the search to a specific page.
- First, the
- Metadata Size Reduction: By having
metadataat thegroup levelrather thanper-KV pairand using truncatedhash valuesfor intra-group indexing, the collective size oflevel listsis significantly reduced, ensuring they fit inDRAM.
4.2.2. Further Reducing Flash Accesses
Even with level lists identifying data segment groups, there's still a possibility that a key falls into a key range indicated by a level list entry, but the key does not actually exist in the corresponding data segment group (due to overlapping key ranges across LSM-tree levels). This would lead to unnecessary flash accesses. To prevent this:
4.2.2.1. Hash Lists (in the DRAM)
- Concept:
AnyKeyintroduces a newmetadata typecalledhash listfor eachlevel list entry. - Structure: Each
hash liststores thehash valuesof all the keys that actually exist in its correspondingdata segment group. - Purpose: Before accessing the
flashto read adata segment group,AnyKeyfirst checks if thehash valueof the target key is present in thehash listfor the candidatelevel list entry. If the hash is not found, it definitively indicates that theKV pairis not in that group, and the search can immediately proceed to the nextLSM-tree levelwithout a wastefulflash read. - Placement:
Hash listsare stored in theremaining DRAM space. IfDRAMis full,hash listsare not maintained for lowerLSM-tree levels, in which case unnecessary flash accesses might still occur for those levels (similar to otherLSM-tree-based designs). - Implementation: Currently, an
integer arraywithbinary searchis used forhash lists. WhileBloom filtersorCuckoo filterscould saveDRAMspace,PinKalso noted their high computational burden onSSDresources, a reasonAnyKeyavoids them for now.
4.2.3. Improving the Efficiency of the LSM-tree Management
While the above designs improve read latency, LSM-tree compaction can become an overhead if values are large and part of the data segment groups. If values are always co-located with keys and frequently moved during compaction, it would waste device bandwidth and lifetime.
4.2.3.1. Value Log (in the flash)
- Concept: Inspired by designs like WiscKey [49],
AnyKeydetaches values from keys forLSM-tree management. A separateflash areais designated as avalue log. - Storage: The
flash spaceinAnyKeyis divided into two areas: one fordata segment groups(storing keys and pointers/small values) and one for thevalue log(storing large values). - KV Entity Update: If a
valueis stored in thevalue log, theKV entityin thedata segment groupcontains apointer(PPA) to its location in thevalue log, rather than the value itself. - Write Flow: Initially, all
valuesfrom new writes are written into thevalue log. - Compaction Benefit: This separation means that during
compaction, only the smallerkey/pointer pairsindata segment groupsare read and rewritten. The largevaluesin thevalue logare not moved unless alog-triggered compactionexplicitly decides to merge them back intodata segment groups(explained below). This significantly reduceswrite amplification. - Capacity Management:
AnyKeyreserves half of the remainingSSD capacityfor thevalue log. IfSSD capacityis full,over-provisioned (OP)capacity can be used for thevalue log.
4.2.4. Detailed Operations
4.2.4.1. Read
Given a key request:
- Level List Search:
AnyKeysearches thelevel listsfrom to sequentially to find anentrywhosekey rangecontains the target key. - Hash List Check: If a candidate
level list entryis found,AnyKeychecks if thehash valueof the target key exists in thehash listcorresponding to that entry (if ahash listis maintained for that level).- If the
hash valueis not found in thehash list, theKV pairdoes not exist in thatdata segment group, and the search continues to the nextLSM-tree level. - If the
hash valueis found, the targetdata segment grouphas been identified.
- If the
- Page Identification: Using the
PPAof the first page of thedata segment groupand thelist of hash valuesfor the first keys of each page (from thelevel list entry),AnyKeydetermines the specificflash pagewhere the targetKV entityis likely stored. - KV Entity Read: This specific
flash pageis read to retrieve theKV entity. - Value Retrieval:
-
If the
KV entityitself contains thevalue, the read operation is complete. -
If the
KV entitycontains apointer(PPA) to thevalue log,AnyKeyperforms a secondflash readto retrieve the actualvaluefrom thevalue log.Figure 6 (repeated below for convenience) illustrates this read process: A request for
key "PP"(hash "4791") is shown. If L1 doesn't have it, it moves to the next level. If a candidate entry in a lower level is found and its hash is in the hash list, the data segment group is identified. The page at PPA 300 is read. If the KV entity holds a pointer, the value at PPA 901 in the value log is read.
该图像是图表,显示了在不同工作负载下,Crypto1、ETC 和 W-Pink 三种情形下的读取延迟的累积分布函数(CDF)。图中包含了4KB、8KB和16KB三种页面大小的延迟比较,横坐标为延迟(毫秒),纵坐标为CDF值。不同颜色和线型表示不同的页面大小。
-
4.2.4.2. Write
- Buffer in DRAM: A
write request(either an update to an existing key or a new key addition) is initially buffered in the device-internalDRAM. - L0-L1 Compaction: When this
DRAM buffer(effectively L0) becomes full, itsKV pairsare merged into . During thisL0-L1compaction:- New
data segment groupsandlevel list entriesare constructed for . - All
valuesassociated with theseKV pairsare written into thevalue log. (Only pointers or small values are written into theKV entitiesindata segment groups). - This ensures that new values entering the
LSM-treeare placed into thevalue logfirst.
- New
4.2.4.3. Compaction
AnyKey employs two types of compaction operations, both of which involve reading KV pairs from source levels, sorting them, and constructing new data segment groups and level list entries for the destination level. A key distinction is that KV entities are sorted based on hash values for insertion into data segment groups.
The following figure (Figure 8 from the original paper) illustrates the two types of compaction operations.

该图像是图表,展示了在不同 heta 值下,AnyKey、AnyKey+ 和 PinK 的延迟分布曲线(CDF)。从图中可以看出,随着 heta 值的变化,AnyKey 的延迟表现优于其他方案。
Figure 8. The two types of compaction operations.
-
Tree-triggered Compaction:
- Trigger: Invoked when the size of
LSM-tree level(i.e., the collective size of itsdata segment groupsand any values referenced in thevalue log) reaches a predefined threshold. - Purpose: This is the conventional
compactionoperation required byLSM-treesto maintain their structure and reclaim space from invalidatedKV pairsdue toout-of-place updates. - Process: Reads
KV pairsfrom and , merges them, discards invalidKV pairs, and writes newdata segment groupsfor . Values are read from wherever they are currently stored (eitherdata segment grouporvalue log) and written to thedata segment groupsof the new (potentially moving fromvalue logtodata segment groupat this stage if desired, or remaining invalue log).
- Trigger: Invoked when the size of
-
Log-triggered Compaction:
- Trigger: Invoked when the
value logbecomes full. - Purpose: To secure available space in the
value logfor ongoing writes. - Process:
AnyKeyselects a target source level (initially, the level with the largest volume of its values in thevalue log). A standard -to-compactionis performed. Crucially, during thiscompaction, allvaluesofKV pairsfrom both and that are currently in thevalue logare relocated and merged directly into thedata segment groupsof the newly created . This frees up space in thevalue log.
- Trigger: Invoked when the
- Hash List Update: Whenever
compactionoperations modifylevel lists,AnyKeyupdates thehash listsfor the top levels inDRAMaccordingly. This overhead is minimal given theI/O-intensivenature ofcompaction.
4.2.4.4. GC (Garbage Collection)
- Conventional GC: Typically relocates all valid data (page by page) from a
victim blockbefore erasing it. - AnyKey's GC:
- Relocates
data segment groups(unit of multiple pages) from avictim block. - Updates the
new PPAin the correspondinglevel list entry.
- Efficiency: The paper observes that most
victim blocksinAnyKeydo not contain valid data and can be erased immediately. This is becausedata segment groupsbelonging to a specificLSM-tree leveltend to be stored in the sameflash blockand are relocated together duringcompaction. This significantly reducesGC overhead. - Value Log GC:
GCis not triggered in thevalue logitself, as space reclamation there is handled bylog-triggered compaction.
- Relocates
4.2.4.5. Range Query
- Challenge: Since
KV pairswithin adata segment groupare sorted by theirhash values, arange query(retrieving keys betweenkey_minandkey_max) would require re-sorting by actual key to deliver results in lexicographical order. - Solution:
AnyKeymaintains extra information for eachKV pairin adata segment group:{target page, page offset}. This information is sorted by actualkeyand stored in the first page of thedata segment group. - Benefit: For a
range query, this pre-sortedlocation informationallowsKV pairsto be located quickly without any on-the-fly sorting. For longerrange queries,AnyKeyperforms better thanPinKbecause consecutiveKV pairsare often co-located inflash pageswithin adata segment group, reducing the number offlash pagesthat need to be read. For smallrange queries,AnyKey's performance is comparable toPinK.
4.2.5. Analysis on the Size of Metadata
The paper conducts a quantitative analysis comparing the metadata sizes of PinK and AnyKey under varying low value-to-key ratios. This analysis assumes a 64GB SSD with 64MB DRAM and is full of KV pairs.
The following are the results from Table 1 of the original paper:
| Workload | PinK | AnyKey | ||||
| Level lists | Meta segments | Sum | Level lists | Hash lists | Sum | |
| 4.0 (160B/40B) | 56MB | 316MB | 372MB | 29MB | 35MB | 64MB |
| 2.0 (120B/60B) | 117MB | 414MB | 531MB | 34MB | 30MB | 64MB |
| 1.0 (80B/80B) | 200MB | 503MB | 703MB | 38MB | 26MB | 64MB |
Table 1. The analysis on the metadata size in PinK and AnyKey under varying (low) value-to-key ratios, assuming that our 64GB SSD is full of KV pairs (Section 5.1).
Observations from Table 1:
- PinK's Metadata Growth: As the
value-to-key ratiodecreases (meaning key sizes are relatively larger, e.g., from 4.0 to 1.0), the sum oflevel listsandmeta segmentsinPinKgrows dramatically from 372MB to 703MB. This clearly exceeds the assumed 64MBDRAMcapacity, forcing a large portion ofmeta segmentsintoflash. - AnyKey's Stable Metadata: In contrast,
AnyKey's totalmetadatasize (sum oflevel listsandhash lists) consistently remains at 64MB. This is precisely theDRAMcapacity, demonstrating thatAnyKeyeffectively limits itsmetadatato fit withinDRAMacross varyinglow-v/kscenarios. - Mechanism: This stability in
AnyKeyis due to:- The significant reduction in
level listsize by managingmetadataperdata segment grouprather thanper-KV pair. Hash listsbeing dynamically generated and stored using only the remaining DRAM space, ensuring the totalDRAMfootprint formetadatadoes not exceed the availableDRAMcapacity.
- The significant reduction in
4.2.6. Design Overhead
4.2.6.1. Storage Capacity Overhead
AnyKey introduces minor storage overheads:
- Hash Values: Each
KV entityin adata segment groupincludes a 32-bithash valueof its key. For 10 millionKV pairsin a 64GBSSD, this amounts to 38MB, which is only0.05%of the totalSSD capacity. - Range Query Information: For efficient
range queryprocessing, eachdata segment groupstores{target page, page offset}for allKV pairswithin it, sorted by key. Assuming 1,000KV pairsin a 32-page (8KB/page) group, this information is about 2KB, less than1%of the group size. - Overall: These collective overheads are extremely small, especially compared to
PinKwhich might store two copies of each key inflash(one inmeta segmentsand one indata segments) ifmetadataoverflowsDRAM. The paper asserts thatAnyKeyachieves higherstorage utilizationthanPinK(Section 5.4).
4.2.6.2. Computation Overhead
AnyKey introduces some computational tasks on the SSD controller:
- Hash Value Generation:
AnyKeyneeds to generate a 32-bithash valuefor a key when aKV pairis first written and when aread requestis issued. UsingxxHash[14] on a 1.2 GHzARM Cortex-A53 core(a commonSSD controller CPU), generating a 32-bit hash for a 40-byte key takes 79ns. This is considered amarginal overheadcompared to typicalflash read/write latencies. - Sorting KV Entities: During
compaction, when adata segment groupis constructed,AnyKeyneeds to sort a set ofKV entitiesbased on theirhash values. Since theKV entitiesfrom the merginggroupsare already sorted, this is a fastmerge-sortprocess. Merging two groups, each with 8,192KV entities, takes approximately 118µs on theARM Cortex-A53 core, which isnegligiblerelative to the overall time-consumingcompaction operation. - Inclusion in Evaluation: The paper explicitly states that all experimental data presented in Section 5 already includes these computation overheads.
4.2.7. An Enhancement to Compaction Process ()
While base AnyKey excels in low-v/k workloads, an observed issue was decreased IOPS for high-v/k workloads and some low-v/k workloads with relatively large values.
4.2.7.1. Motivational Observations
-
Compaction Chains: The problem stems from
log-triggered compaction. When thevalue logfills up,log-triggered compactionis invoked to movevaluesfrom thevalue logintodata segment groupsin anLSM-tree level. If a significant volume ofvaluesare merged into , this can cause itself to exceed its size threshold, immediately triggering atree-triggered compactionto merge into . This chain of consecutivecompaction invocations(compaction chain) adds substantial overhead.The following figure (Figure 9a from the original paper) illustrates a problematic situation where a compaction chain occurs.
该图像是图表,展示了不同扫描长度下UDB的延迟CDF曲线,包括长度为100、150和200的三个子图。可以看出,AnyKey设计在多种工作负载下的延迟表现优于其他设计。
Figure 9. An illustration of the problematic situations during the compaction process, and an enhancement to avoid them.
- Figure 9a describes this: When
value logfills, log-triggered compaction (1) merges many values into . If then exceeds its threshold, it triggers a tree-triggered compaction (2).
4.2.7.2. Modified Log-Triggered Compaction ()
To mitigate compaction chains, implements a modified log-triggered compaction:
The following figure (Figure 9b from the original paper) illustrates the modified log-triggered compaction scheme.

该图像是图表,展示了不同扫描长度下UDB的延迟CDF曲线,包括长度为100、150和200的三个子图。可以看出,AnyKey设计在多种工作负载下的延迟表现优于其他设计。
-
Threshold Monitoring: During a
log-triggered-to-compaction, continuously monitors whether the size of the target destination level is approaching its threshold. -
Early Termination of Value Merging: If reaches a certain point (e.g., , where ), stops merging
valuesfrom thevalue loginto thedata segment groupsof . Any remainingvaluesthat were supposed to be merged are instead written back into thevalue log. This prevents from exceeding its threshold and avoids triggering a subsequenttree-triggered compaction. -
Improved Target Level Selection: To compensate for potentially reduced space reclamation in the
value log(due to values being written back), changes its strategy for choosing the target source level . Instead of selecting the level with the largest total size of values in thevalue log, it selects the level with the most "invalid" values in thevalue log. This is effective becauseLSM-treesgenerally accumulate many invalid (overwritten or deleted)values. By targeting levels with more invalid values, can still reclaim significant free space in thevalue logwhile avoidingcompaction chains.This enhancement () ensures that
LSM-treemanagement remains efficient even under workloads that would typically inducecompaction chains, thereby providing highIOPSacross all workload types.
5. Experimental Setup
5.1. Datasets
The paper used 14 Key-Value (KV) workloads, explicitly categorized into high value-to-key ratio (high-v/k) and low value-to-key ratio (low-v/k) based on their characteristics.
The following are the results from Table 2 of the original paper:
| Workload | Description | Key | Value |
| KVSSD[36] | The workload used in Samsung. | 16 | 4,096 |
| YCSB[17] | The default key and value sizes of YCSB. | 20 | 1,000 |
| W-PinK[31] | The workload used in PinK. | 32 | 1,024 |
| Xbox[19] | Xbox LIVE Primetime online game. | 94 | 1,200 |
| ETC[6] | General-purpose KV store of Facebook. | 41 | 358 |
| UDB[10] | Facebook storage layer for social graph. | 27 | 127 |
| Cache[58] | Twitter's cache cluster. | 42 | 188 |
| VAR[6] | Server-side browser info. of Facebook. | 35 | 115 |
| Crypto2[55] | Trezor's KV store for Bitcoin wallet. | 37 | 110 |
| Dedup[21] | DB of Microsoft's storage dedup. engine. | 20 | 44 |
| Cache15[59] | 15% of the 153 cache clusters at Twitter. | 38 | 38 |
| ZippyDB[10] | Object metadata of Facebook store. | 48 | 43 |
| Crypto1[9] | BlockStream's store for Bitcoin explorer. | 76 | 50 |
| RTDATA[26] | IBM's real-time data analytics workloads. | 24 | 10 |
Table 2. Our tested KV workloads (sizes are in bytes).
Dataset Characteristics:
- Source: The workloads are derived from real-life applications and services, including:
- Industry benchmarks (e.g.,
YCSB). - Workloads from major tech companies (e.g., Samsung, Facebook, Twitter, Microsoft, IBM).
- Specific application domains (e.g., Xbox LIVE game, Bitcoin wallets, data analytics).
- Industry benchmarks (e.g.,
- Key and Value Sizes: The table clearly lists the key and value sizes in bytes for each workload. This is the crucial factor determining the
value-to-key ratio.- High-v/k Workloads (typical, previously studied):
KVSSD,YCSB,W-PinK,Xbox. These generally have small keys (16-94 bytes) and large values (1000-4096 bytes), leading to high value-to-key ratios. - Low-v/k Workloads (unexplored, focus of paper):
ETC,UDB,Cache,VAR,Crypto2,Dedup,Cache15,ZippyDB,Crypto1,RTDATA. These have more balanced key and value sizes, or even keys larger than values (e.g.,Crypto1,RTDATA), resulting in low value-to-key ratios.
- High-v/k Workloads (typical, previously studied):
- Why these datasets were chosen: These datasets represent a wide spectrum of real-world
KV storeusage, including both the traditionally studiedhigh-v/kpatterns and the newly identifiedlow-v/kpatterns that challenge existingKV-SSDdesigns. Their diversity allows for a comprehensive evaluation ofAnyKey's performance across differentkey/valuesize distributions and access patterns.
5.2. Evaluation Metrics
The paper uses several standard performance metrics to evaluate the KV-SSD designs.
5.2.1. Tail Latency
- Conceptual Definition:
Tail latencymeasures the performance of the slowest requests, often expressed as a high percentile (e.g., , ). It quantifies the time taken for a certain percentage of requests to complete. Hightail latencyindicates that a significant portion of users or applications experience slow responses, which is critical in interactive or large-scale services. The paper specifically focuses on the percentile readtail latency. A percentile latency of ms means that 95% of all read requests completed within ms, while 5% took longer. - Mathematical Formula: There is no single universal formula for
tail latency. It's typically calculated from a distribution of observed latencies. For the -th percentile, if is a sorted list of latencies from requests, the -th percentile latency is the value where . - Symbol Explanation:
- : A sorted list of observed latencies.
- : The total number of requests.
- : The desired percentile (e.g., 95 for percentile).
- : The index in the sorted list corresponding to the -th percentile.
- : The ceiling function, which rounds a number up to the nearest integer.
5.2.2. IOPS (Input/Output Operations Per Second)
- Conceptual Definition:
IOPSquantifies the number ofreadandwriteoperations a storage device or system can perform per second. It is a measure of throughput, indicating how many discrete storage transactions can be processed within a given time frame. HigherIOPSgenerally signifies better performance for transactional workloads common inKV stores. - Mathematical Formula: $ \text{IOPS} = \frac{\text{Total I/O Operations}}{\text{Total Time (seconds)}} $
- Symbol Explanation:
Total I/O Operations: The sum of allreadandwriteoperations performed during the measurement period.Total Time (seconds): The duration of the measurement period in seconds.
5.2.3. Device Lifetime (implied by number of page writes)
- Conceptual Definition:
Device lifetimerefers to the expected operational duration of anSSDbefore itsflash memorycells degrade to an unusable state.Flash memoryhas a finite number ofProgram/Erase (P/E)cycles. Each write to aflash pageconsumes one P/E cycle for the underlying block. Therefore, the total number ofpage writes(includingwrite amplificationfromcompactionandGC) is a direct indicator ofdevice wearand, consequently,device lifetime. Fewerpage writesimply a longerdevice lifetime. The paper uses "total number of page writes experienced" as the metric. - Mathematical Formula: No explicit formula given, but it is implicitly related to
Total Page Writes/P/E Cycles per Block. - Symbol Explanation:
Total Page Writes: The cumulative count of allpageswritten to theflash memorythroughout theSSD's operation, including those due touser writes,compaction, andGC.P/E Cycles per Block: The maximum number of times aflash blockcan be erased and programmed before it becomes unreliable.
5.2.4. Storage Utilization
- Conceptual Definition:
Storage utilizationmeasures the efficiency with which the availablestorage capacityis used to storeunique KV pairs. It's the ratio of the actual size of unique user data to the total physicalstorage capacity. Lowerstorage utilizationindicates that a significant portion of theSSDis occupied by redundant data (e.g., old versions ofKV pairsnot yet cleaned byGC/compaction) or excessivemetadata. Higherstorage utilizationis desirable as it means more user data can be stored for a givenSSDsize. - Mathematical Formula: $ \text{Storage Utilization} = \frac{\text{Total Size of Unique KV Pairs}}{\text{Total SSD Capacity}} $
- Symbol Explanation:
Total Size of Unique KV Pairs: The sum of the sizes of all currently valid and distinctkey-value pairsstored in the system.Total SSD Capacity: The total raw storage capacity of theSSD.
5.3. Baselines
The paper primarily compares AnyKey against PinK and also evaluates , an enhanced version of AnyKey.
- PinK [30, 31]: This is cited as the state-of-the-art
LSM-tree-based KV-SSDdesign. It serves as the main baseline because it represents the most advanced existing approach thatAnyKeyaims to improve upon. - AnyKey (Base Version): The proposed design focusing on minimizing
metadataforlow-v/kworkloads. - AnyKey+ (Enhanced Version): An improved version of
AnyKeythat includes a modifiedlog-triggered compactionmechanism to preventcompaction chainsand enhanceLSM-treemanagement efficiency, particularly forhigh-v/kworkloads.
5.4. Testbed & Configuration
The experiments were conducted in a highly controlled and realistic KV-SSD emulation environment.
- Testbed Components:
- Flash Emulator (FEMU [45]): A publicly available and widely used emulator that accurately models
flash memorybehavior, includinglatencycharacteristics offlash reads,writes, anderases. - Host System Driver (uNVMe [2]): A host-side driver used for generating
KV requestsin user space and directly issuing them to the emulatedSSDwithout passing through a conventionalI/O software stack. This setup allows for directKV-SSDinteraction. - Implementation: The authors ported
PinK's publicly available source code [31] into this testbed and modified/re-engineered its data structures to implement bothAnyKeyand . The source code for their implementation is available on GitHub.
- Flash Emulator (FEMU [45]): A publicly available and widely used emulator that accurately models
- Device Configuration:
- SSD Capacity: 64GB (The paper also mentions evaluating a 4TB
SSDwith 4GBDRAMin Section 5.9 for scalability, but 64GB is the default for most experiments). - Architecture: Consists of eight channels, each with eight
flash chips(similar toPinK's setup). - DRAM Capacity: Default 64MB (approximately
0.1%of totalSSD capacity, a common ratio [25, 39]). A sensitivity study varied this from 32MB to 96MB. - Page Size: Default 8KB. A sensitivity study varied this from 4KB to 16KB.
- Flash Type & Latencies: Assumed a modern
TLC flashwith specific latencies based on [34]:- Read Times: (56.5, 77.5, 106) µs (for typical operations).
- Write Times: (0.8, 2.2, 5.7) ms.
- Erase Time: 3 ms.
- SSD Capacity: 64GB (The paper also mentions evaluating a 4TB
- Workload Configuration:
- Write Ratio: All tested workloads had a
20% write ratio(meaning 20% of operations are writes, 80% are reads). - Key Distribution: By default, a
Zipfian distributionwith was used for key access patterns, simulating a skewed access pattern where a few keys are accessed much more frequently than others. A sensitivity study varied values. - I/O Queue Depth: Set to 64, allowing all 64 underlying
flash chipsto potentially serviceI/O requestsconcurrently, maximizing parallelism.
- Write Ratio: All tested workloads had a
- Motivation for Emulation: The paper notes the lack of publicly available
KV applicationsthat directly interact withKV-SSDs(they typically generate blockI/Os). Therefore,KV-SSDresearch commonly relies onKV generators(likeFIO enginewithuNVMe) to controlkey/value sizes,read/write ratios, andaccess distributions.
5.5. Execution Procedure
The experiments follow a two-stage process to ensure stable and representative performance data.
- Warm-up Stage:
- During this stage, all
KV pairsfor the target workload are initially inserted into the emulatedSSD. GCandcompactionoperations are triggered frequently, filling theSSDand bringing the system into a steady, operational state, akin to a long-running production environment.
- During this stage, all
- Execution Stage (Data Collection):
- After the
warm-up stage, performance data (e.g.,latencies,IOPS) are collected. - This stage continues until the total size of issued
KV requestsreaches twice the fullSSD capacity(e.g., 128GB for a 64GBSSD). - The paper notes that this entire process can take 4-13 hours depending on the workload, emphasizing the thoroughness of the evaluation.
- After the
6. Results & Analysis
The experimental evaluation compares PinK, AnyKey, and across various workloads and configurations, highlighting AnyKey's effectiveness, especially for low-v/k workloads, and 's overall superior performance.
6.1. Tail Latencies and IOPS
The following figure (Figure 10 from the original paper) shows the Cumulative Distribution Function (CDF) of percentile read latency for representative workloads under the three systems.

该图像是图表,展示了不同日志大小对 ZippyDB、UDB 和 ETC 的 IOPS 及页面写入总数的影响。图(a)显示了在不同日志大小下的 IOPS,而图(b)则展示了页面写入的总数。
Figure 10. The CDF of percentilerdtenci the representative workloas unde he thre ystems.
- Analysis of Figure 10 (Tail Latencies):
low-v/kWorkloads (Figures 10a-e: RTDATA, Crypto1, ZippyDB, Cache15, Cache):PinKexhibits significantly longertail latencies(e.g., up to 10ms forCrypto1, 100ms forRTDATA). This validates the paper's initial hypothesis thatPinKperforms poorly under these workloads.AnyKeyand drastically reducetail latenciesfor these workloads. For instance,Crypto1'stail latencyis reduced from ~10ms inPinKto less than 1ms inAnyKey/. This is becauseAnyKey'smetadata(level lists and hash lists) is designed to fit entirely inDRAM, eliminating extra flash accesses formetadatalookups.
high-v/kWorkloads (Figures 10f-g: W-PinK, KVSSD):-
AnyKeyand showtail latenciescomparable toPinK. This indicates thatAnyKey's design, while optimized forlow-v/k, does not negatively impact performance forhigh-v/kworkloads wherePinKalready performs well. This is important forAnyKeyto be a "universal" solution.The following figure (Figure 11 from the original paper) provides an analysis of
metadatasize and its impact onAnyKeyand .
该图像是图表(a)和(b),展示了不同值尺寸与键尺寸比率下,95th百分位尾延迟和IOPS的性能特征。其中(a)展示了延迟的CDF曲线,(b)展示了不同比率下的IOPS值。
-
Figure 2. The observed performance of state-of-the-art PinK [30] under varying value-to-key ratios from to .
The following figure (Figure 11, from image 10.jpg) provides a deeper analysis of the metadata size and its impact on AnyKey and for low-v/k workloads.
![Figure 2. The observed performance of state-of-the-art PinK \[30\] under varying value-to-key ratios from \(\\frac { 2 0 } { 4 0 }\) to \(\\textstyle { \\frac { 1 2 8 0 } { 4 0 } }\) .](/files/papers/6916de05110b75dcc59adfbc/images/2.jpg)
该图像是图表(a)和(b),展示了不同值尺寸与键尺寸比率下,95th百分位尾延迟和IOPS的性能特征。其中(a)展示了延迟的CDF曲线,(b)展示了不同比率下的IOPS值。
Figure 11. An analysis on the metadata size and its impact on AnyKey and AnyKey+.
- Analysis of Figure 11 (Metadata Size and Flash Accesses):
-
Figure 11a (Metadata Size): For
low-v/kworkloads,PinK'smetadata(especiallymeta segments) grows significantly, exceeding the 64MBDRAMlimit and requiring storage inflash. In contrast,AnyKey'slevel listsare much smaller, and itshash listsutilize the remainingDRAMspace, ensuring the totalmetadatasize consistently fits withinDRAM. -
Figure 11b (Flash Accesses for Reads):
PinKoften requires 4-7 flash accesses per read forlow-v/kworkloads (due tometadatain flash).AnyKeyand limit flash accesses to at most 2 in most cases (one for theKV entityin thedata segment groupand potentially one for thevaluein thevalue log). This direct correlation explains thetail latencyimprovements.The following figure (Figure 12 from the original paper) shows the
IOPSachieved by the three systems.
该图像是图表,展示了KV存储的三种执行模型的比较(a-c),以及KV-SSD的内部架构(d)。其中,图(d)显示了用于定位KV对的元数据结构,包括哈希表和LSM树等。
-
Figure 12. The IOPS achieved by the three systems.
- Analysis of Figure 12 (IOPS):
low-v/kWorkloads:AnyKeyachieves a substantial3.15x improvementinIOPSon average compared toPinK. This significant gain is attributed to:- Reduced flash accesses for read requests (as seen in Figure 11b).
- Decreased
compactionoverhead (discussed in the next section).
high-v/kWorkloads:AnyKey'sIOPScan be mixed, sometimes outperformingPinK, sometimes not. However, consistently outperforms bothPinKandAnyKeyfor these workloads. This demonstrates the effectiveness of 's modifiedlog-triggered compactionin preventingcompaction chainsthat can degrade performance under workloads with larger values. Specifically, achieves15% higher IOPSthanPinKforhigh-v/kworkloads.
6.2. Analysis of Compaction and GC
The following are the results from Table 3 of the original paper:
| Workload | Compared systems | The number of page reads or writes |
|||
| Compaction | GC | ||||
| Read | Write | Read | Write | ||
| Crypto1 (low-v/k) |
PinK | 85M | 88M | 340M | 0 |
| AnyKey | 49M | 26M | 0 | 0 | |
| AnyKey+ | 49M | 26M | 0 | 0 | |
| Cache (low-v/k) |
PinK | 41M | 45M | 339M | 0 |
| AnyKey | 64M | 20M | 0 | 0 | |
| AnyKey+ | 48M | 15M | 0 | 0 | |
| W-PinK (high-v/k) |
PinK | 3M | 9M | 27M | 0 |
| AnyKey | 25M | 22M | 0 | 0 | |
| AnyKey+ | 20M | 6M | 26K | 26K | |
| KVSSD (high-v/k) |
PinK | 0.2M | 10M | 15M | 0 |
| AnyKey | 48M | 50M | 0 | 0 | |
| AnyKey+ | 6M | 9M | 0 | 0 | |
Table 3. The analysis on compaction and GC operations.
- Analysis of Table 3 (Compaction and GC Operations):
- GC Operations (Reads/Writes):
AnyKeyand consistently show zero or very fewGCreads/writes across all workloads. This is a significant advantage. The reason is thatdata segment groupsbelonging to the sameLSM-tree leveltend to be co-located inflash blocksand are relocated together duringcompaction. Consequently,victim blocksoften become entirely invalid, allowingGCto erase them immediately without the costly process of reading and rewriting valid data.PinK, in contrast, often incurs millions to hundreds of millions ofGCreads. - Compaction for
low-v/kWorkloads (e.g., Crypto1, Cache):PinKgenerates a high number ofcompactionreads and writes (e.g., 85M reads, 88M writes forCrypto1). This is primarily because its largemeta segments(containing keys) are stored inflashand must be read and rewritten duringcompaction.AnyKey(and ) significantly reducescompactionwrites (e.g., 26M forCrypto1) by eliminating flash-residentmeta segmentsand separatingvaluesinto thevalue log. WhileAnyKeymight sometimes have highercompactionreads thanPinK(e.g., forCache), its totalwrite amplification(especially combined withGC) is much lower.
- Compaction for
high-v/kWorkloads (e.g., W-PinK, KVSSD):AnyKeycan experience increasedcompactionoverheads (e.g., 25M reads, 22M writes forW-PinK) compared toPinK(3M reads, 9M writes). This is wherecompaction chains(log-triggered leading to tree-triggered) occur when largevaluesare merged from thevalue logintodata segment groups.- effectively addresses this. For
W-PinK, reduces compaction writes to 6M (from 22M inAnyKey) and reads to 20M (from 25M). ForKVSSD, reduces compaction reads from 48M (AnyKey) to 6M and writes from 50M (AnyKey) to 9M. This demonstrates 's success in preventingcompaction chainsand optimizingLSM-treemanagement forhigh-v/kworkloads.
- GC Operations (Reads/Writes):
6.3. Device Lifetime and Storage Utilization
The following figure (Figure 13 from the original paper) shows the total number of page writes experienced.

该图像是示意图,展示了 AnyKey KV-SSD 的结构,包括 DRAM、层级列表、元段和数据段。图中标示了不同的键和地址映射,突出显示了键值对在存储中的布局。关键公式如 PPA 代表物理页面地址。
Figure 13. The total number of page writes experienced.
- Analysis of Figure 13 (Total Page Writes / Device Lifetime):
-
This figure directly reflects the cumulative
write amplificationfrom bothcompactionandGCoperations. -
significantly reduces the total number of
page writesby50% on averagecompared toPinK. This is due to its efficientGC(no relocation of valid data) and optimizedcompactionprocess (avoidingcompaction chainsand value movement). -
A lower number of total
page writesdirectly translates to a longerdevice lifetimefor theSSD.The following figure (Figure 14 from the original paper) shows the
storage utilizationof the three systems.
该图像是图表,展示了不同元数据类型的大小及其在DRAM或Flash中的位置。图的左侧部分(a)比较了高v/k和低v/k工作负载下数据结构的大小,标出了64MB DRAM的参考线。右侧部分(b)显示了不同请求下Flash的访问次数比例,揭示了在不同工作负载下的性能差异。这些信息有助于理解KV-SSD设计在不同场景下的表现。
-
Figure 14. The storage utilizations of the three systems.
- Analysis of Figure 14 (Storage Utilization):
low-v/kWorkloads:AnyKeyand achieve significantly higherstorage utilizationthanPinK. This is becausePinK, underlow-v/kworkloads, is forced to store largemeta segments(containing keys) inflashdue toDRAMoverflow. This meansPinKeffectively stores two copies of key information inflash(one inmeta segments, one with values indata segments), leading to lower utilization.AnyKey's Advantage: By eliminating flash-residentmeta segmentsand ensuringmetadataisDRAM-resident,AnyKeyand maintain only one copy of each unique key's information inflash, thus maximizingstorage utilization.
6.4. Sensitivity Analysis
6.4.1. Varying DRAM Sizes
The following figure (Figure 15 from the original paper) shows the read latencies under varying DRAM sizes.

该图像是示意图,展示了 AnyKey 设计中的关键-值 SSD 结构。图中包含了 DRAM、Flash 存储的层次结构,以及如何在不同层次中搜索给定关键字的过程。图示强调了级别列表和哈希列表的组合,以及数据段组的排序机制。
Figure 15. The read latencies under varying DRAM sizes.
- Analysis of Figure 15 (Read Latencies under Varying DRAM Sizes):
low-v/kWorkloads (Crypto1, ETC): WhenDRAMcapacity decreases from 96MB to 32MB, thetail latenciesforCrypto1andETC(wheremetadatasize is large) increase. This happens because even a reduced amount ofmetadatacan no longer be fully accommodated in the smallerDRAM, potentially forcing somehash listsorlevel list entriesto be accessed fromflash.high-v/kWorkloads (W-PinK): ForW-PinK, wheremetadatais inherently small, changingDRAMsize from 96MB to 32MB has minimal impact ontail latencies. A small (32MB)DRAMis sufficient to hold itsmetadata.- Conclusion: This validates that
AnyKey's performance is tied to its ability to keepmetadatainDRAM, and ifDRAMbecomes too constrained forlow-v/kworkloads, performance can degrade.
6.4.2. Varying Flash Page Sizes
The following figure (Figure 16 from the original paper) shows the read latencies under varying flash page sizes.

该图像是示意图,展示了两种类型的压缩操作:树触发压缩和日志触发压缩。在树触发压缩中,当目标源达到其阈值时,数据段组和价值日志发生合并;而在日志触发压缩中,若日志满则进行合并和回收操作。
Figure 16. The read latencies under varying flash page sizes.
- Analysis of Figure 16 (Read Latencies under Varying Flash Page Sizes):
- As the
flash page sizeincreases (from 4KB to 16KB),tail latenciesgenerally decrease forAnyKey. - Reasoning: Assuming a fixed total
SSD capacity, a largerpage sizemeans fewer totalflash pages. This, in turn, implies fewerdata segment groupsinAnyKey. Fewerdata segment groupsresult in a reduced number oflevel list entries(i.e., smallermetadatasize). A smallermetadatafootprint further strengthensAnyKey's ability to keep allmetadatainDRAM, reducingflash accessesand improvinglatency.
- As the
6.5. Different Key Distributions
The following figure (Figure 17 from the original paper) illustrates the latency of ETC under varying key distributions.

该图像是一个示意图,展示了 AnyKey 中的日志触发压实过程中的问题情况(a)和修改后的日志触发压实方案(b)。图中说明了在源目标满时的处理步骤,以及阈值监控机制的变化。
Figure 17. The lat. of ETC under varying key distributions.
- Analysis of Figure 17 (Latencies under Varying Key Distributions):
- The
Zipfian distributionparameter was varied (a lower means keys are more evenly distributed, while a higher means a few keys are accessed very frequently). - PinK's Behavior: As the
key distributionbecomes more even (lower ),PinK'stail latenciesincrease. This is because a more even distribution means more requests are directed toKV pairsin the lowerLSM-tree levels, whosemetadata(meta segments) is more likely to be stored inflash(as seen in Section 3.2). - AnyKey's Advantage:
AnyKeyand exhibit more uniform performance across differentkey distributions. This is because they effectively reduce the size ofmetadatafor allLSM-tree levels, ensuring thatmetadatafor keys in lower levels can also be found inDRAM. This consistently lowmetadata access latencytranslates to stabletail latenciesregardless of how skewed thekey distributionis.
- The
6.6. Range Queries
The following figure (Figure 18 from the original paper) shows the latencies of UDB under varying scan lengths.

该图像是图表,展示了不同系统下 百分位延迟的累积分布函数(CDF)。包含七个子图,分别为RTDATA、Crypto1、ZippyDB、Cache15、Cache、W-PinK及KVSSD,展示了它们在不同延迟下的表现。
Figure 18. The latencies of UDB under varying scan lengths.
- Analysis of Figure 18 (Latencies for Range Queries):
- The experiment used a
scan-centric workloadbased onUDB, varying thescan length(number of consecutive keys retrieved). AnyKey's Advantage:AnyKeyand show increasing benefits asscan lengthincreases. This is becauseAnyKeystores sets of consecutive keys together within theflash pagesof adata segment group. When arange queryrequests multiple consecutive keys, manysub-requestscan be serviced from the same (or very few)flash pagesanddata segment groups, leading to efficientburst reads.- PinK's Disadvantage: In
PinK,KV pairsare logically sorted inmeta segmentsinDRAM, but their correspondingvaluesindata segmentscan be scattered across differentflash pages. This means arange queryinPinKmight require accessing a substantial number of disparateflash pages, increasinglatency. - Small Range Queries: For small
range queries,AnyKey's performance is comparable toPinKbecause it can directly locate targetKV pairsusing thelocation informationstored in the first page of thedata segment group, avoiding sorting overhead.
- The experiment used a
6.7. The Size of The Value Log
The following figure (Figure 19 from the original paper) shows the impact of varying the sizes of the value log.

该图像是图表,展示了不同数据结构的大小及其在元数据中的位置(DRAM或Flash)。图(a)分析各数据结构的大小,图(b)则展示了读取时所需的Flash访问次数,能够反映出AnyKey和AnyKey+在元数据处理上的表现与影响。
Figure 19. The Impact of varying the sizes of the value log.
- Analysis of Figure 19 (Impact of Value Log Size):
- The
value logsize was varied from 3.2GB to 9.6GB (5% to 15% of the 64GBSSD). - Figure 19a (IOPS):
- Small Values (ZippyDB): For workloads with relatively small
values(likeZippyDB, alow-v/kworkload), a smallvalue log(e.g., 3.2GB) is sufficient and already achieves highIOPS. Increasing thevalue logsize beyond this point does not significantly improveIOPS. This is because a smallvalue logfor such workloads rarely becomes full, thus rarely triggering costlylog-triggered compactions. - Larger Values (UDB, ETC): As
value sizesincrease (UDBandETCarelow-v/kbut with relatively larger values thanZippyDB), increasing thevalue logsize leads to slightly higherIOPS. A largervalue logreduces the frequency oflog-triggered compactions, thereby reducingcompaction overheads.
- Small Values (ZippyDB): For workloads with relatively small
- Figure 19b (Total Page Writes): Similar patterns are observed for the number of
page writes. Workloads with smallvaluesdon't see much change with increasingvalue logsize, while those with largervaluesshow a reduction inpage writeswith a largervalue log, reflecting fewerlog-triggered compactions. - Comparison with
AnyKey-(withoutvalue log): The paper also briefly compares with a hypotheticalAnyKey-system (without avalue log).- When
write ratioincreases,AnyKey-experiences a significant decrease inIOPS, while maintains highIOPS. This highlights the crucial role of thevalue login managingwrite-intensiveworkloads by deferringvalue movement. - might have slightly increased
tail latenciescompared toAnyKey-(due to potential second flash access for values in thevalue log), but it is still comparable toPinKand offers much betterIOPS.
- When
- The
6.8. Design Scalability
The paper asserts that AnyKey's design is applicable to any SSD capacity and that larger SSD capacities yield even greater benefits under low-v/k workloads.
- Example: For a 4TB
SSDrunningCrypto1(alow-v/kworkload),PinK'smetadatacould swell to 25.2GB, requiring extensiveflash accesses. In contrast,AnyKeywould limit itsmetadatato approximately 3.65GB, which would still comfortably fit into a proportionally largerDRAM(e.g., 4GBDRAMfor a 4TBSSD). This demonstrates that the fundamental problem ofmetadataoverflow inPinKscales withSSDsize forlow-v/kworkloads, andAnyKey's solution scales effectively to maintainDRAM-resident metadata.
6.9. Multi-Workload Execution Scenarios
To assess applicability in shared environments, the paper considers multi-workload scenarios using partitioning.
- Setup: A scenario with two co-located workloads (
W-PinKandZippyDB) was evaluated on a 64GBSSD, divided into two equal partitions. Each partition was managed independently by eitherPinKorAnyKey. - Results: Compared to
PinK,AnyKeyimproved the percentiletail latenciesofW-PinKby14%andZippyDBby216%. - Conclusion: This indicates that
AnyKey's benefits can be realized inmulti-tenant environmentsby partitioning theSSDand runningAnyKeyon each partition, especially benefitinglow-v/kworkloads likeZippyDB.
7. Conclusion & Reflections
7.1. Conclusion Summary
The paper successfully identifies a critical blind spot in existing Key-Value SSD (KV-SSD) designs: their sub-optimal performance under low value-to-key ratio (low-v/k) workloads, where key sizes are relatively large. This performance bottleneck was directly linked to the excessive growth of metadata in state-of-the-art designs like PinK, forcing metadata into slower flash memory and incurring prohibitive tail latencies and reduced IOPS.
To address this, the authors propose AnyKey, a novel KV-SSD architecture. AnyKey's core contribution is its innovative metadata management, which organizes KV pairs into data segment groups and maintains metadata only at the group level, using hash values for efficient intra-group indexing. This design ensures that metadata remains compact and entirely DRAM-resident, eliminating flash accesses for metadata lookups. Furthermore, AnyKey introduces a value log to separate values from keys, significantly improving the efficiency of LSM-tree compaction by reducing write amplification.
The enhanced version, , further refines the LSM-tree management by modifying log-triggered compaction to prevent "compaction chains," which can degrade performance under high-v/k workloads. Extensive evaluation using diverse real-world workloads demonstrates that AnyKey dramatically improves tail latencies and IOPS for low-v/k scenarios, while extends this superior performance to all workload types, outperforming the state-of-the-art across the board in IOPS, device lifetime (reduced page writes), and storage utilization.
7.2. Limitations & Future Work
The paper doesn't explicitly dedicate a section to "Limitations and Future Work," but some can be inferred from the design choices and discussions:
- Computational Overhead for Hash Generation and Sorting: While the paper states that the
xxHashgeneration andmerge-sortoperations fordata segment groupsare marginal (79ns and 118µs respectively on aCortex-A53 core), these still add computational load to theSSD controller. For extremely latency-sensitive applications or very smallKV pairswhereflash access timeis minimal, this overhead, however small, might become more noticeable. Future work could explore hardware accelerators for hashing or sorting within theSSD controllerto further reduce this. - Range Query Efficiency for Small Ranges: While
AnyKeyimprovesrange queryperformance for longer scans, its performance for very smallrange queriesis only "comparable" toPinK. The overhead of parsing thelocation informationstored in the first page of adata segment groupmight still be present. Further optimizations specific to very shortrange queriescould be explored. - Fixed Value Log Sizing: The current implementation reserves half of the remaining
SSD capacityfor thevalue log. While a sensitivity study onvalue logsize is performed, a dynamic and adaptive sizing mechanism for thevalue logbased on real-timewrite patternsandvalue sizescould potentially optimize storage use and compaction frequency further. - Hash Collision Handling: Although
hash collisionsare rare (0.075%) and their impact is deemed negligible, the current mechanism (2hash collision bitsper page and potentially reading an adjacent page) is a simple heuristic. For future very high-densitydata segment groupsor specific hash functions, more sophisticated, yet lightweight, collision resolution strategies might be beneficial. - Real Hardware Validation: The evaluation is based on an emulation environment (
FEMU). WhileFEMUis widely-used and accurate, validation on realKV-SSD hardwarewould provide definitive proof of performance in a production setting. - Complexity of LSM-tree Management: While addresses
compaction chains,LSM-treemanagement remains complex. Further advancements inLSM-treecompaction algorithms, perhaps leveragingAI/MLor more dynamic resource allocation within theSSD controller, could yield additional benefits.
7.3. Personal Insights & Critique
The AnyKey paper makes a highly relevant and practical contribution to the KV-SSD landscape. My personal insights and critique are as follows:
-
Novelty and Importance: The most significant aspect of
AnyKeyis its identification and systematic address of thelow-v/kworkload problem. This is a crucial "aha!" moment, as much ofKV-SSDresearch seemed implicitly biased towardshigh-v/kscenarios. By showing thatmetadatascalability is the bottleneck, the paper fills a critical gap, makingKV-SSDstruly viable for a broader range of real-world applications (e.g., IoT devices, blockchain, specific database indexing where keys are complex identifiers). The paper's re-evaluation ofPinKunderlow-v/kconditions provides compelling evidence for this overlooked problem. -
Elegance of Solution:
AnyKey's solution is elegant in its simplicity and effectiveness. GroupingKV pairsand abstractingmetadatato the group level is a straightforward idea that yields massive benefits inmetadatafootprint. The separation ofkeysandvaluesintodata segment groupsand avalue logis a well-established technique (WiscKey [49]) but its integration and refinement within theKV-SSDcontext, especially with the enhancements forcompaction chains, is highly effective. -
Applicability and Transferability: The principles of
metadatareduction andvalue-key separationare not unique toKV-SSDs. These ideas could be adapted to otherin-storage computingparadigms or evenhost-based KV storesthat struggle with largemetadatastructures onNVMe SSDs. For instance,learned indexesthat store model parameters asmetadatacould benefit from similargroup-levelindexing if theirmetadatasize becomes an issue. -
Potential for Further Optimization:
- Dynamic Page/Group Sizing: The paper mentions the number of pages in a
data segment groupcan be configured. An adaptive mechanism that dynamically adjustsgroup sizebased onworkload characteristics(e.g.,key/value sizedistributions,read/write patterns) orSSDresources could further fine-tune performance. - Tiered Value Log: For very large
SSDs, thevalue logitself could be tiered (e.g., hot values in a faster, smaller region; cold values in a slower, larger region) to optimize access latency for values, similar to howLSM-treestierSSTables. - Heterogeneous
flash memory: With the advent of differentflash technologies(e.g.,QLC,PLC),AnyKeycould potentially allocatemetadataand hotdata segmentsto faster/more durableflashtypes, and colder data/values to denser/cheaperflash. - DRAM-less/Hybrid
KV-SSDs: The design heavily relies onDRAM. ForDRAM-less SSDsor those with limitedDRAM, a hybridmetadataapproach (e.g., criticalmetadatainon-chip SRAM, less critical inflashwith a very smart caching layer) would be an interesting extension.
- Dynamic Page/Group Sizing: The paper mentions the number of pages in a
-
Critique on
DRAMusage forhash lists: Whilehash listshelp avoid flash accesses, they consumeDRAM. The paper mentionsBloom filterswere considered but rejected due to computational overhead. Re-evaluating this choice with newerSSD controllercapabilities (e.g., specializedAI/MLaccelerators on-chip) might reveal new trade-offs.Bloom filterscould potentially offer a probabilistic way to saveDRAMwhile still significantly reducing unnecessary flash accesses. -
Unverified Assumptions: The claim that
hash collisionimpact is negligible relies on a measured low frequency (0.075%). While plausible, this could be sensitive to the choice ofhash functionandkey distribution. Further analysis on the robustness of this assumption for various extreme cases might be valuable. -
Impact on Standardization:
KV-SSDsare still an evolving standard.AnyKey's insights could influence futureKV-SSD APIdesigns, particularly in howmetadatais exposed or optimized for differentworkload types.Overall,
AnyKeyis a robust and well-thought-out contribution that broadens the applicability ofKV-SSDs, making them a more compelling solution for diverse modern data storage challenges. Its rigorous evaluation and detailed analysis of performance bottlenecks are commendable.
Similar papers
Recommended via semantic vector search.