Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs
TL;DR Summary
Terrace introduces a hierarchical graph container for skewed dynamic graphs, enabling adaptive vertex partitioning and efficient batch updates. It outperforms Aspen in insertion speed and rivals Ligra in query performance.
Abstract
Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs Prashant Pandey ppandey@berkeley.edu Lawrence Berkeley National Lab and University of California Berkeley Brian Wheatman wheatman@cs.jhu.edu Johns Hopkins University Helen Xu hjxu@mit.edu Massachusetts Institute of Technology Aydin Buluc abuluc@lbl.gov Lawrence Berkeley National Lab and University of California Berkeley ABSTRACT Various applications model problems as streaming graphs, which need to quickly apply a stream of updates and run algorithms on the updated graph. Furthermore, many dynamic real-world graphs, such as social networks, follow a skewed distribution of vertex degrees, where there are a few high-degree vertices and many low-degree vertices. Existing static graph-processing systems optimized for graph skewness achieve high performance and low space usage by preprocessing a cache-efficient graph partitioning based on vertex degree. In the streaming setting, the whole graph is not available upfront, however, so finding an optimal partitioning is not feasible in the presence of updates. As a result, existing streaming graph-processing systems take a “one-size-fits-all” approach, leaving perfo
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
- Title: Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs
- Authors: Prashant Pandey (Lawrence Berkeley National Lab and University of California Berkeley), Brian Wheatman (Johns Hopkins University), Helen Xu (Massachusetts Institute of Technology), and Aydin Buluc (Lawrence Berkeley National Lab and University of California Berkeley).
- Journal/Conference: Proceedings of the 2021 International Conference on Management of Data (SIGMOD '21). SIGMOD is a premier, highly-selective international forum for researchers, practitioners, and users in database management, data science, and big data. Publication in SIGMOD signifies a high-impact contribution to the field.
- Publication Year: 2021
- Abstract: The paper addresses the challenge of processing streaming graphs, which are graphs that evolve over time through updates. Many real-world graphs, like social networks, exhibit a skewed degree distribution, meaning they have a few vertices with a very high number of connections (high-degree) and many vertices with few connections (low-degree). Existing static graph systems can optimize for this skew, but dynamic systems often use a "one-size-fits-all" data structure, which is inefficient. The authors present Terrace, a system for streaming graphs that uses a hierarchical data structure. Terrace stores a vertex's neighbors in different data structures based on its degree, allowing it to dynamically adapt to graph skewness. Experiments show that Terrace achieves faster batch insertions than the state-of-the-art streaming system
Aspenfor many batch sizes. For graph query algorithms, Terrace is reported to be 1.7x–2.6x faster thanAspenand performs comparably toLigra, a leading static graph processing system. - Original Source Link: The paper was formally published in SIGMOD '21. The provided link is
/files/papers/68f5ae85778c7766460eccc1/paper.pdf.
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: Modern applications from social media to cybersecurity increasingly rely on analyzing graphs that change rapidly (dynamic or streaming graphs). Efficiently handling both a continuous stream of updates (edge insertions/deletions) and running complex analytical queries on the latest graph version is a major challenge.
- Identified Gap: Real-world graphs are often "skewed," with a power-law degree distribution. While static (non-changing) graph processing systems exploit this skew to improve performance, existing dynamic systems typically use a single, uniform data structure for all vertices, regardless of their degree. This "one-size-fits-all" approach misses significant optimization opportunities, especially concerning cache performance. Accessing neighbors of low-degree vertices is unnecessarily slow, while the chosen structure might not be ideal for updating high-degree vertices.
- Innovation: The paper introduces a fresh approach by proposing a hierarchical, degree-aware data structure. The core idea is that the best way to store a vertex's neighbors depends on how many neighbors it has. Instead of one structure, Terrace uses a multi-level system, dynamically choosing the most appropriate structure for each vertex as its degree changes.
-
Main Contributions / Findings (What):
- A Novel Hierarchical Data Structure Design: The paper proposes a three-level data structure for dynamic graphs that partitions vertices by degree to optimize for both update speed and query locality.
- Implementation of Terrace: The authors built a C++ graph processing system named
Terracebased on this design. It uses in-place arrays for low-degree vertices, a shared Packed Memory Array (PMA) for medium-degree vertices, and individual B-trees for high-degree vertices. - Superior Performance: Extensive experiments demonstrate that
Terraceoutperforms the state-of-the-art dynamic systemAspenin update throughput for small-to-medium batch sizes and is significantly faster (1.7x-2.6x) for analytical queries. Crucially, its query performance is competitive with, and sometimes better than,Ligra, a highly optimized static graph processing system. This shows thatTerraceeffectively bridges the performance gap between static and dynamic systems.
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Dynamic (Streaming) Graphs: Unlike static graphs, which are analyzed once, dynamic graphs continuously evolve. Systems processing them must efficiently handle a stream of updates, such as adding or removing edges and vertices, while remaining ready to execute queries.
- Skewed Degree Distribution: A common property of real-world networks where the vast majority of vertices have very few connections, while a small number of "hub" vertices have an enormous number of connections. This follows a power-law distribution. This skewness is both a challenge (high-degree vertices can be bottlenecks) and an opportunity for optimization.
- Graph Representation: How a graph is stored in memory. A common static format is Compressed Sparse Row (CSR), which stores all edges in a single massive array, offering excellent scan performance but being very difficult to update. The paper contrasts this with dynamic structures.
- External-Memory Model: A theoretical model of computation that analyzes algorithms in terms of memory transfers between a small, fast cache and a large, slow main memory. Operations are measured in cache misses, and algorithms that minimize these transfers (by accessing data that is close together in memory, i.e., has good locality) are faster in practice.
- B-tree: A self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. It is optimized for systems that read and write large blocks of data, making it a good fit for the external-memory model and for storing neighbors of high-degree vertices that require frequent updates.
- Packed Memory Array (PMA): An array-based data structure that maintains elements in sorted order while leaving empty spaces to facilitate fast insertions and deletions. It aims to keep elements contiguous for fast scanning (good locality) while providing better update performance than a simple sorted array. The paper uses it for medium-degree vertices.
-
Previous Works:
- Static Graph Processing Systems (
Ligra,PowerLyra): These systems are designed for graphs that do not change. They perform extensive preprocessing to optimize the graph layout for cache locality and fast queries. For example,Ligrauses CSR, which is highly efficient for traversals but impractical for updates.PowerLyraexplicitly partitions vertices based on degree to optimize computations. These systems set a high bar for query performance but cannot handle dynamic updates. - Dynamic Graph Processing Systems (
Stinger,Aspen): These systems are built to handle updates.Stingeruses a blocked adjacency list, whileAspen(the main competitor in this paper) uses a tree-based structure (a C-tree, which is a type of B-tree) for everything.Aspenstores the entire graph in a hierarchy of trees, providing good theoretical update guarantees. However, this "one-size-fits-all" tree-based approach leads to poor cache locality, as even accessing the neighbors of a low-degree vertex requires multiple pointer-chasing operations, resulting in more cache misses and slower queries compared to static systems.
- Static Graph Processing Systems (
-
Technological Evolution: The field has seen a split between systems optimized for static analysis (fast queries, slow/no updates) and systems for dynamic updates (fast updates, slower queries). The challenge is to get the "best of both worlds": the query speed of static systems with the update flexibility of dynamic ones.
-
Differentiation:
Terrace's key innovation is its hybrid, hierarchical approach. UnlikeLigra, it is fully dynamic. UnlikeAspen, it does not use a single data structure for all vertices. By using different, specialized data structures for different degree ranges, it tailors the storage to the specific needs of each vertex, achieving both good update performance (via B-trees for hubs) and excellent query locality (via in-place storage and PMAs for the vast majority of vertices).
4. Methodology (Core Technology & Implementation)
-
Principles: The design of
Terraceis guided by three core principles derived from analyzing the trade-offs in graph data structures:- Balancing Locality and Updatability: Array-based structures offer superior locality and scan speed (data is contiguous in memory), while tree-based structures offer better asymptotic performance for updates (insertions/deletions).
Terraceacknowledges that neither is universally superior; the best choice depends on the number of elements (i.e., the vertex degree). - Separating Vertices by Degree: For low-degree vertices, the overhead of pointer chasing to an external data structure is significant. Sharing a large data structure among all vertices can also create bottlenecks, where updating a high-degree vertex slows down the entire system. Therefore, high-degree vertices should be isolated in their own dedicated structures.
- One Size Does Not Fit All: This principle synthesizes the previous two.
Terracerejects the monolithic design of previous dynamic systems and instead adopts a flexible, multi-level hierarchy.
- Balancing Locality and Updatability: Array-based structures offer superior locality and scan speed (data is contiguous in memory), while tree-based structures offer better asymptotic performance for updates (insertions/deletions).
-
Steps & Procedures: The Three-Level Hierarchy
Terracestores a vertex's neighbors across a three-level hierarchy, determined by degree-based cutoffs. Let be the maximum number of neighbors stored in-place and be the maximum degree for a vertex to use the shared array structure.-
Level 1: In-place Sorted Array (for degrees )
- Structure: For every vertex in the graph, there is a
vertex block. This block contains metadata (like the vertex's degree) and a small, fixed-size array to store its first neighbors. - Purpose: The vast majority of vertices in skewed graphs have very low degrees. By storing their neighbors "in-place" within the vertex block itself,
Terraceavoids an extra memory lookup (pointer-chasing). If a graph traversal accesses such a vertex, its neighbors are likely already in the same cache line, dramatically reducing cache misses. The paper sets for unweighted graphs to fit a vertex block into a single 64-byte cache line.
- Structure: For every vertex in the graph, there is a
-
Level 2: Shared Packed Memory Array (PMA) (for degrees between and )
- Structure: A single, global PMA is shared among all "medium-degree" vertices. If a vertex has more than neighbors, its neighbors beyond the first (which are still in-place) are stored in this PMA.
- Purpose: The PMA provides a good compromise between the excellent locality of an array and the need for efficient updates. It keeps the neighbors of a given vertex contiguous, enabling fast scans. By limiting the maximum degree of vertices in the PMA (to in the paper), the cost of updates to the shared structure remains bounded.
-
Level 3: Individual B-trees (for degrees )
-
Structure: Each "high-degree" vertex (degree ) is allocated its own private B-tree. The
vertex blockfor such a vertex contains a pointer to the root of this B-tree. The B-tree stores all neighbors beyond the first (which are still in-place). -
Purpose: High-degree vertices are frequently updated and can become a performance bottleneck. B-trees offer efficient logarithmic-time updates. By giving each high-degree vertex its own B-tree, updates to one do not interfere with any other vertex. This isolates the cost of managing these large neighbor lists.
The figure below illustrates the overall architecture. A primary array of
Vertex IDspoints to vertex-specific data. Low-degree neighbors are stored directly, while higher-degree neighbors are stored in separate data structures.
该图像是图1的示意图,展示了一种图存储格式的高层设计。图中有一个顶点结构,用于追踪每个顶点邻居(nghs)存储位置,以及边的具体结构。
-
The next figure shows a concrete example from the paper. Vertex 0 (degree 2) only uses the in-place level. Vertices 1 and 3 (degree 5) use both the in-place level and the shared PMA. Vertex 2 (degree 10) is a high-degree vertex and uses the in-place level and its own private B-tree.
该图像是一个有向图示意图,展示了10个节点及其之间的有向边关系,不同颜色代表不同节点类别或分层,图中节点和边的方向清晰标注,便于理解图结构。
该图像是一个示意图,展示了多层次图邻居存储结构设计,分别使用不同数据结构存储不同度数顶点的邻居,实现动态分层和自适应图偏斜。 -
-
Implementation Details:
- Batch Updates: Updates are performed in phases. A batch of incoming edges is first sorted. Then, for each affected vertex, the new neighbors are merged with existing neighbors. The updated neighbor list is then redistributed across the three levels according to the vertex's new degree. If a vertex's degree crosses a threshold (e.g., from medium to high), its data is migrated from the PMA to a new B-tree.
- Multi-threading:
Terraceuses locks to enable concurrent updates. Since the in-place and B-tree levels are private to each vertex, multiple threads can update different vertices without contention. The shared PMA level requires more sophisticated locking to handle concurrent updates safely. - API:
Terraceimplements theVertexSubset/EdgeMapAPI fromLigra, allowing many existing high-level graph algorithms to run onTerracewith minimal changes.
5. Experimental Setup
-
Datasets: The evaluation uses a mix of real-world and synthetic graphs to test performance under different conditions. All graphs were symmetrized for a fair comparison with
Aspen. The table below (transcribed from Table 5 in the paper) summarizes the datasets.Dataset Vertices Edges Avg. Degree LiveJournal 4,847,571 85,702,474 17.8 Orkut 3,072,627 234,370,166 76.2 rMAT 8,388,608 563,816,288 60.4 Protein 8,745,543 1,309,240,502 149.7 Twitter 61,578,415 2,405,026,092 39.1 -
Evaluation Metrics:
- Throughput (Edges per second):
- Conceptual Definition: This metric measures the speed of updates. It is calculated as the total number of edges inserted or deleted in a batch divided by the time taken to complete the operation. Higher throughput is better. It is a key metric for evaluating the performance of a dynamic graph system.
- Formula:
- Normalized Running Time:
- Conceptual Definition: This metric is used to compare the query performance of different systems. The running time of a specific algorithm (e.g., BFS) on a system (e.g.,
TerraceorAspen) is divided by the running time of the same algorithm on a baseline system (Ligra). A value less than 1.0 means the system is faster than the baseline, while a value greater than 1.0 means it is slower. - Formula:
- Conceptual Definition: This metric is used to compare the query performance of different systems. The running time of a specific algorithm (e.g., BFS) on a system (e.g.,
- Throughput (Edges per second):
-
Baselines:
Ligra: A state-of-the-art static graph processing framework. It represents the "gold standard" for query performance, as its data structures are heavily optimized for fast reads. It cannot, however, handle dynamic updates.Aspen: A state-of-the-art streaming/dynamic graph processing framework. It uses a purely functional, tree-based data structure (C-trees) for the entire graph. It is the primary competitor forTerraceas it is designed for the same problem domain.
-
Graph Kernels: The evaluation uses a standard set of graph algorithms (kernels) to measure query performance, as transcribed from Table 4 of the paper.
Graph kernel Input Output Notes Breadth-first search (BFS) Source vertex V PageRank (PR) V Connected components (CC) V Triangle counting (TC) Number of triangles Betweenness centrality (BC) Source vertex V Single-Source shortest paths (SSSP) Source vertex V
6. Results & Analysis
-
Core Results:
1. Update Throughput: The experiments measure the rate of edge insertions and deletions for various batch sizes.
该图像是图表,展示了LJ和Orkut图上Aspen和Terrace系统在不同批处理大小下的批量插入吞吐量性能对比。The chart and the data (transcribed from Table 6) show that
Terracehas significantly higher insertion throughput for small to medium batch sizes (up to 1M edges), outperformingAspenby up to 3x. For very large batches (10M edges),Aspen's tree-merging approach becomes more efficient and it pulls ahead. The authors argue that the smaller batch sizes are more representative of many real-world streaming scenarios. For deletions,Aspengenerally performs better, which the authors attribute toTerrace's deletion logic not being as optimized as its insertion logic.Manual transcription of Table 6: Update throughput (edges per second).
Insert Delete LJ Orkut LJ Orkut Batch Size Terrace Aspen T/A Terrace Aspen T/A Terrace Aspen T/A Terrace Aspen T/A 1E1 3.93E5 1.25E5 3.14 2.11E5 7.28E4 1.75 1.42E6 1.31E5 10.86 7.49E5 1.28E5 5.86 1E2 1.11E6 7.11E5 1.56 8.12E5 4.32E5 1.11 2.41E6 7.62E5 3.16 1.37E6 7.55E5 1.82 1E3 5.48E6 2.77E6 1.98 3.25E6 1.97E6 1.23 4.72E6 2.98E6 1.59 1.97E6 2.83E6 0.69 1E4 1.96E7 6.56E6 2.99 1.06E7 4.93E6 1.70 5.55E6 7.38E6 0.75 2.52E6 7.05E6 0.36 1E5 4.83E7 1.57E7 3.09 2.35E7 1.26E7 1.70 8.68E6 1.61E7 0.54 3.62E6 1.46E7 0.25 1E6 4.40E7 3.46E7 1.27 1.71E7 2.69E7 0.52 9.23E6 3.43E7 0.27 4.36E6 3.32E7 0.13 1E7 2.82E7 1.03E8 0.27 2.59E7 7.76E7 0.25 6.61E6 1.05E8 0.06 4.62E6 1.05E8 0.04 2. Query Performance: The query performance results demonstrate the key strength of
Terrace's design.
该图像是一个柱状图,展示了图处理系统Ligra、Aspen和Terrace在多个图算法上归一化运行时间的对比。测试算法包括广度优先搜索(BFS)、PageRank(PR)、单源介数中心性(BC)、连通分量(CC)、单源最短路径(SSSP)和三角计数(TC)。This summary chart shows that, on average,
Terrace(purple bars) is significantly faster thanAspen(orange bars) for all common kernels (BFS, PR, BC, CC). More impressively,Terrace's performance is very close toLigra(green bars), the static baseline. In some cases, like BFS,Terraceis even faster thanLigra. This is a major result, as it shows thatTerracelargely eliminates the query performance penalty typically associated with dynamic graph systems. For algorithms not implemented inAspen(SSSP, TC),Terraceis only moderately slower thanLigra.The per-kernel performance charts reinforce this finding. For example, in Connected Components (CC) and Betweenness Centrality (BC),
Terraceconsistently outperformsAspenand is often faster than or competitive withLigra.
该图像是图表,展示了图10中运行连通组件(CC)算法的时间,所有时间均相对于Ligra做了归一化。图中对比了Ligra、Aspen和Terrace三种系统在不同数据集(LJ、Orkut、rMAT、Protein、Twitter)上的性能表现。
该图像是图表,展示了图9中不同图数据集上运行BC算法的时间,相对于Ligra的归一化运行时间。图中比较了Ligra、Aspen和Terrace三种系统的性能,Terrace在多数数据集上表现出更低的归一化运行时间。 -
Ablations / Parameter Sensitivity: The paper includes experiments to validate the hierarchical design and study the sensitivity to its parameters.
1. Impact of Level Cutoffs: This experiment tests how changing the degree cutoffs ( for in-place and for PMA) affects performance.
该图像是图表,展示了不同层级截断参数配置下 Terrace 系统运行时间的归一化对比,横轴为四种图算法(BFS、PR、CC、BC),纵轴为归一化运行时间,参数格式为 S-L,其中 和 分别是就地和 PMA 截断值。The results show that performance is not highly sensitive to the exact choice of cutoffs. While the default configuration (
13-10) is generally the best, other reasonable choices perform similarly. This is an important practical result, as it means the system does not require extensive, dataset-specific tuning to perform well.2. Importance of the Three-Level Hierarchy: This ablation study compares the full three-level
Terrace(, which is the bar in the chart normalized to 1) against two-level variants: one with only in-place and PMA levels (), and one with only in-place and B-tree levels ().
该图像是图表,展示了图12中Terrace系统在不同层次结构配置下归一化运行时间的比较,横轴表示不同算法(BFS、PR、CC、BC),纵轴为归一化运行时间,图中对比了Inplace+PMA、Inplace+Btree和PMA+Btree三种配置。The results clearly show that the three-level design is critical.
- The variant is much slower for algorithms like PR and CC. This is because it forces medium-degree vertices into B-trees, losing the superior scanning performance of the PMA.
- The variant (which is what
Terraceeffectively becomes, represented by the green bar) is faster than using B-trees alone, demonstrating the value of the PMA for medium-degree vertices. The full design (purple bar) is better still, confirming that using B-trees for high-degree vertices is beneficial. - Overall, this experiment validates that each level of the hierarchy plays a distinct and important role.
7. Conclusion & Reflections
-
Conclusion Summary: The paper successfully demonstrates that a "one-size-fits-all" approach is suboptimal for dynamic graph data structures, especially on skewed real-world graphs. The proposed system,
Terrace, introduces a novel three-level hierarchical data structure that dynamically adapts to a vertex's degree. By using in-place arrays for low-degree vertices, a shared PMA for medium-degree vertices, and individual B-trees for high-degree vertices,Terraceachieves an excellent balance between update speed and query locality. The experimental results provide strong evidence for its effectiveness:Terraceoffers both faster updates than the state-of-the-art dynamic systemAspen(for many workloads) and query performance that is competitive with the highly-optimized static systemLigra. -
Limitations & Future Work:
- Suboptimal Deletion Performance: The authors acknowledge that
Terrace's batch deletion performance is not as strong as its insertion performance and lags behindAspen. They identify this as an area for future engineering and optimization. - Concurrent Updates and Queries: The evaluation was performed in a phased manner (updates and queries were not run simultaneously). Analyzing performance in a truly concurrent setting where queries are executed while updates are happening is a direction for future work.
- Automatic Tuning: While the system is not overly sensitive to its cutoff parameters, automatically tuning them based on the observed graph distribution could yield further performance gains.
- Suboptimal Deletion Performance: The authors acknowledge that
-
Personal Insights & Critique:
- Elegant and Practical Idea: The core idea of
Terraceis both simple and powerful. The realization that different data structures excel at different scales and applying this to vertex degrees is an elegant solution to a well-known problem. Its strong empirical results suggest it is a very practical design. - Bridging a Critical Gap:
Terracemakes a significant contribution by narrowing the long-standing performance gap between static and dynamic graph processing systems. This is crucial for applications that require both real-time updates and high-performance analytics. - Question of Generality: The design is explicitly tailored for skewed, power-law graphs. While this covers many important real-world cases (like social networks), its performance on graphs with different degree distributions (e.g., more uniform, as in road networks or meshes) might be less impressive. The
Proteingraph provides some insight, but further evaluation on non-skewed graphs would be interesting. - Complexity: The hierarchical design, while effective, introduces more complexity compared to a "one-size-fits-all" system like
Aspen. Managing data migration between levels as vertex degrees change adds engineering overhead, as seen in the less-optimized deletion performance. However, the paper makes a strong case that this complexity is a worthwhile trade-off for the significant performance gains.
- Elegant and Practical Idea: The core idea of
Similar papers
Recommended via semantic vector search.