Paper status: completed

Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs

Published:06/09/2021

Streaming Dynamic Graph Processing (1)Hierarchical Graph Data Structure Design (1)Skewed Vertex Degree Modeling (1)Batch Updates Optimization in Graphs (1)Real-Time Streaming Graph Query Acceleration (1)

Original Link

Price: 0.100000

11 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

Terrace introduces a hierarchical graph container for skewed dynamic graphs, enabling adaptive vertex partitioning and efficient batch updates. It outperforms Aspen in insertion speed and rivals Ligra in query performance.

Abstract

Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs Prashant Pandey ppandey@berkeley.edu Lawrence Berkeley National Lab and University of California Berkeley Brian Wheatman wheatman@cs.jhu.edu Johns Hopkins University Helen Xu hjxu@mit.edu Massachusetts Institute of Technology Aydin Buluc abuluc@lbl.gov Lawrence Berkeley National Lab and University of California Berkeley ABSTRACT Various applications model problems as streaming graphs, which need to quickly apply a stream of updates and run algorithms on the updated graph. Furthermore, many dynamic real-world graphs, such as social networks, follow a skewed distribution of vertex degrees, where there are a few high-degree vertices and many low-degree vertices. Existing static graph-processing systems optimized for graph skewness achieve high performance and low space usage by preprocessing a cache-efficient graph partitioning based on vertex degree. In the streaming setting, the whole graph is not available upfront, however, so finding an optimal partitioning is not feasible in the presence of updates. As a result, existing streaming graph-processing systems take a “one-size-fits-all” approach, leaving perfo

Mind Map

In-depth Reading

English Analysis~17 min read · 19,418 chars

1. Bibliographic Information

Title: Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs
Authors: Prashant Pandey (Lawrence Berkeley National Lab and University of California Berkeley), Brian Wheatman (Johns Hopkins University), Helen Xu (Massachusetts Institute of Technology), and Aydin Buluc (Lawrence Berkeley National Lab and University of California Berkeley).
Journal/Conference: Proceedings of the 2021 International Conference on Management of Data (SIGMOD '21). SIGMOD is a premier, highly-selective international forum for researchers, practitioners, and users in database management, data science, and big data. Publication in SIGMOD signifies a high-impact contribution to the field.
Publication Year: 2021
Abstract: The paper addresses the challenge of processing streaming graphs, which are graphs that evolve over time through updates. Many real-world graphs, like social networks, exhibit a skewed degree distribution, meaning they have a few vertices with a very high number of connections (high-degree) and many vertices with few connections (low-degree). Existing static graph systems can optimize for this skew, but dynamic systems often use a "one-size-fits-all" data structure, which is inefficient. The authors present Terrace, a system for streaming graphs that uses a hierarchical data structure. Terrace stores a vertex's neighbors in different data structures based on its degree, allowing it to dynamically adapt to graph skewness. Experiments show that Terrace achieves faster batch insertions than the state-of-the-art streaming system Aspen for many batch sizes. For graph query algorithms, Terrace is reported to be 1.7x–2.6x faster than Aspen and performs comparably to Ligra, a leading static graph processing system.
Original Source Link: The paper was formally published in SIGMOD '21. The provided link is /files/papers/68f5ae85778c7766460eccc1/paper.pdf.

2. Executive Summary

Background & Motivation (Why):
- Core Problem: Modern applications from social media to cybersecurity increasingly rely on analyzing graphs that change rapidly (dynamic or streaming graphs). Efficiently handling both a continuous stream of updates (edge insertions/deletions) and running complex analytical queries on the latest graph version is a major challenge.
- Identified Gap: Real-world graphs are often "skewed," with a power-law degree distribution. While static (non-changing) graph processing systems exploit this skew to improve performance, existing dynamic systems typically use a single, uniform data structure for all vertices, regardless of their degree. This "one-size-fits-all" approach misses significant optimization opportunities, especially concerning cache performance. Accessing neighbors of low-degree vertices is unnecessarily slow, while the chosen structure might not be ideal for updating high-degree vertices.
- Innovation: The paper introduces a fresh approach by proposing a hierarchical, degree-aware data structure. The core idea is that the best way to store a vertex's neighbors depends on how many neighbors it has. Instead of one structure, Terrace uses a multi-level system, dynamically choosing the most appropriate structure for each vertex as its degree changes.
Main Contributions / Findings (What):
1. A Novel Hierarchical Data Structure Design: The paper proposes a three-level data structure for dynamic graphs that partitions vertices by degree to optimize for both update speed and query locality.
2. Implementation of Terrace: The authors built a C++ graph processing system named Terrace based on this design. It uses in-place arrays for low-degree vertices, a shared Packed Memory Array (PMA) for medium-degree vertices, and individual B-trees for high-degree vertices.
3. Superior Performance: Extensive experiments demonstrate that Terrace outperforms the state-of-the-art dynamic system Aspen in update throughput for small-to-medium batch sizes and is significantly faster (1.7x-2.6x) for analytical queries. Crucially, its query performance is competitive with, and sometimes better than, Ligra, a highly optimized static graph processing system. This shows that Terrace effectively bridges the performance gap between static and dynamic systems.

Foundational Concepts:
- Dynamic (Streaming) Graphs: Unlike static graphs, which are analyzed once, dynamic graphs continuously evolve. Systems processing them must efficiently handle a stream of updates, such as adding or removing edges and vertices, while remaining ready to execute queries.
- Skewed Degree Distribution: A common property of real-world networks where the vast majority of vertices have very few connections, while a small number of "hub" vertices have an enormous number of connections. This follows a power-law distribution. This skewness is both a challenge (high-degree vertices can be bottlenecks) and an opportunity for optimization.
- Graph Representation: How a graph is stored in memory. A common static format is Compressed Sparse Row (CSR), which stores all edges in a single massive array, offering excellent scan performance but being very difficult to update. The paper contrasts this with dynamic structures.
- External-Memory Model: A theoretical model of computation that analyzes algorithms in terms of memory transfers between a small, fast cache and a large, slow main memory. Operations are measured in cache misses, and algorithms that minimize these transfers (by accessing data that is close together in memory, i.e., has good locality) are faster in practice.
- B-tree: A self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. It is optimized for systems that read and write large blocks of data, making it a good fit for the external-memory model and for storing neighbors of high-degree vertices that require frequent updates.
- Packed Memory Array (PMA): An array-based data structure that maintains elements in sorted order while leaving empty spaces to facilitate fast insertions and deletions. It aims to keep elements contiguous for fast scanning (good locality) while providing better update performance than a simple sorted array. The paper uses it for medium-degree vertices.
Previous Works:
- Static Graph Processing Systems (Ligra, PowerLyra): These systems are designed for graphs that do not change. They perform extensive preprocessing to optimize the graph layout for cache locality and fast queries. For example, Ligra uses CSR, which is highly efficient for traversals but impractical for updates. PowerLyra explicitly partitions vertices based on degree to optimize computations. These systems set a high bar for query performance but cannot handle dynamic updates.
- Dynamic Graph Processing Systems (Stinger, Aspen): These systems are built to handle updates. Stinger uses a blocked adjacency list, while Aspen (the main competitor in this paper) uses a tree-based structure (a C-tree, which is a type of B-tree) for everything. Aspen stores the entire graph in a hierarchy of trees, providing good theoretical update guarantees. However, this "one-size-fits-all" tree-based approach leads to poor cache locality, as even accessing the neighbors of a low-degree vertex requires multiple pointer-chasing operations, resulting in more cache misses and slower queries compared to static systems.
Technological Evolution: The field has seen a split between systems optimized for static analysis (fast queries, slow/no updates) and systems for dynamic updates (fast updates, slower queries). The challenge is to get the "best of both worlds": the query speed of static systems with the update flexibility of dynamic ones.
Differentiation: Terrace's key innovation is its hybrid, hierarchical approach. Unlike Ligra, it is fully dynamic. Unlike Aspen, it does not use a single data structure for all vertices. By using different, specialized data structures for different degree ranges, it tailors the storage to the specific needs of each vertex, achieving both good update performance (via B-trees for hubs) and excellent query locality (via in-place storage and PMAs for the vast majority of vertices).

4. Methodology (Core Technology & Implementation)

Principles: The design of Terrace is guided by three core principles derived from analyzing the trade-offs in graph data structures:
1. Balancing Locality and Updatability: Array-based structures offer superior locality and scan speed (data is contiguous in memory), while tree-based structures offer better asymptotic performance for updates (insertions/deletions). Terrace acknowledges that neither is universally superior; the best choice depends on the number of elements (i.e., the vertex degree).
2. Separating Vertices by Degree: For low-degree vertices, the overhead of pointer chasing to an external data structure is significant. Sharing a large data structure among all vertices can also create bottlenecks, where updating a high-degree vertex slows down the entire system. Therefore, high-degree vertices should be isolated in their own dedicated structures.
3. One Size Does Not Fit All: This principle synthesizes the previous two. Terrace rejects the monolithic design of previous dynamic systems and instead adopts a flexible, multi-level hierarchy.
Steps & Procedures: The Three-Level Hierarchy Terrace stores a vertex's neighbors across a three-level hierarchy, determined by degree-based cutoffs. Let $S$ be the maximum number of neighbors stored in-place and $S+L$ be the maximum degree for a vertex to use the shared array structure.
1. Level 1: In-place Sorted Array (for degrees $\le S$ )
  - Structure: For every vertex in the graph, there is a vertex block. This block contains metadata (like the vertex's degree) and a small, fixed-size array to store its first $S$ neighbors.
  - Purpose: The vast majority of vertices in skewed graphs have very low degrees. By storing their neighbors "in-place" within the vertex block itself, Terrace avoids an extra memory lookup (pointer-chasing). If a graph traversal accesses such a vertex, its neighbors are likely already in the same cache line, dramatically reducing cache misses. The paper sets $S=13$ for unweighted graphs to fit a vertex block into a single 64-byte cache line.
2. Level 2: Shared Packed Memory Array (PMA) (for degrees between $S$ and $S+L$ )
  - Structure: A single, global PMA is shared among all "medium-degree" vertices. If a vertex has more than $S$ neighbors, its neighbors beyond the first $S$ (which are still in-place) are stored in this PMA.
  - Purpose: The PMA provides a good compromise between the excellent locality of an array and the need for efficient updates. It keeps the neighbors of a given vertex contiguous, enabling fast scans. By limiting the maximum degree of vertices in the PMA (to $S+L=1024$ in the paper), the cost of updates to the shared structure remains bounded.
3. Level 3: Individual B-trees (for degrees $> S+L$ )
  - Structure: Each "high-degree" vertex (degree $> S+L$ ) is allocated its own private B-tree. The vertex block for such a vertex contains a pointer to the root of this B-tree. The B-tree stores all neighbors beyond the first $S$ (which are still in-place).
  - Purpose: High-degree vertices are frequently updated and can become a performance bottleneck. B-trees offer efficient logarithmic-time updates. By giving each high-degree vertex its own B-tree, updates to one do not interfere with any other vertex. This isolates the cost of managing these large neighbor lists.
    
    The figure below illustrates the overall architecture. A primary array of Vertex IDs points to vertex-specific data. Low-degree neighbors are stored directly, while higher-degree neighbors are stored in separate data structures.
    
    该图像是图1的示意图，展示了一种图存储格式的高层设计。图中有一个顶点结构，用于追踪每个顶点邻居（nghs）存储位置，以及边的具体结构。
The next figure shows a concrete example from the paper. Vertex 0 (degree 2) only uses the in-place level. Vertices 1 and 3 (degree 5) use both the in-place level and the shared PMA. Vertex 2 (degree 10) is a high-degree vertex and uses the in-place level and its own private B-tree.

该图像是一个有向图示意图，展示了10个节点及其之间的有向边关系，不同颜色代表不同节点类别或分层，图中节点和边的方向清晰标注，便于理解图结构。

该图像是一个示意图，展示了多层次图邻居存储结构设计，分别使用不同数据结构存储不同度数顶点的邻居，实现动态分层和自适应图偏斜。
Implementation Details:
- Batch Updates: Updates are performed in phases. A batch of incoming edges is first sorted. Then, for each affected vertex, the new neighbors are merged with existing neighbors. The updated neighbor list is then redistributed across the three levels according to the vertex's new degree. If a vertex's degree crosses a threshold (e.g., from medium to high), its data is migrated from the PMA to a new B-tree.
- Multi-threading: Terrace uses locks to enable concurrent updates. Since the in-place and B-tree levels are private to each vertex, multiple threads can update different vertices without contention. The shared PMA level requires more sophisticated locking to handle concurrent updates safely.
- API: Terrace implements the VertexSubset/EdgeMap API from Ligra, allowing many existing high-level graph algorithms to run on Terrace with minimal changes.

5. Experimental Setup

Datasets: The evaluation uses a mix of real-world and synthetic graphs to test performance under different conditions. All graphs were symmetrized for a fair comparison with Aspen. The table below (transcribed from Table 5 in the paper) summarizes the datasets.

Dataset	Vertices	Edges	Avg. Degree
LiveJournal	4,847,571	85,702,474	17.8
Orkut	3,072,627	234,370,166	76.2
rMAT	8,388,608	563,816,288	60.4
Protein	8,745,543	1,309,240,502	149.7
Twitter	61,578,415	2,405,026,092	39.1

Evaluation Metrics:
- Throughput (Edges per second):
  - Conceptual Definition: This metric measures the speed of updates. It is calculated as the total number of edges inserted or deleted in a batch divided by the time taken to complete the operation. Higher throughput is better. It is a key metric for evaluating the performance of a dynamic graph system.
  - Formula: $\text{Throughput} = \frac{\text{Number of edges in batch}}{\text{Time taken (seconds)}}$
- Normalized Running Time:
  - Conceptual Definition: This metric is used to compare the query performance of different systems. The running time of a specific algorithm (e.g., BFS) on a system (e.g., Terrace or Aspen) is divided by the running time of the same algorithm on a baseline system (Ligra). A value less than 1.0 means the system is faster than the baseline, while a value greater than 1.0 means it is slower.
  - Formula: $\text{Normalized Running Time} = \frac{\text{Running Time of System X}}{\text{Running Time of Baseline (Ligra)}}$
Baselines:
- Ligra: A state-of-the-art static graph processing framework. It represents the "gold standard" for query performance, as its data structures are heavily optimized for fast reads. It cannot, however, handle dynamic updates.
- Aspen: A state-of-the-art streaming/dynamic graph processing framework. It uses a purely functional, tree-based data structure (C-trees) for the entire graph. It is the primary competitor for Terrace as it is designed for the same problem domain.

Graph Kernels: The evaluation uses a standard set of graph algorithms (kernels) to measure query performance, as transcribed from Table 4 of the paper.

Graph kernel	Input	Output	Notes
Breadth-first search (BFS)	Source vertex		V
PageRank (PR)			V
Connected components (CC)			V
Triangle counting (TC)		Number of triangles
Betweenness centrality (BC)	Source vertex		V
Single-Source shortest paths (SSSP)	Source vertex		V

6. Results & Analysis

Core Results:

1. Update Throughput: The experiments measure the rate of edge insertions and deletions for various batch sizes.

Figure 2: Batch insert throughput in Aspen and Terrace as a function of batch size on the LJ and Orkut graphs. The LJ graph has about 85 million edges, while the Orkut graph has about 234 million edg… 该图像是图表，展示了LJ和Orkut图上Aspen和Terrace系统在不同批处理大小下的批量插入吞吐量性能对比。

The chart and the data (transcribed from Table 6) show that Terrace has significantly higher insertion throughput for small to medium batch sizes (up to 1M edges), outperforming Aspen by up to 3x. For very large batches (10M edges), Aspen's tree-merging approach becomes more efficient and it pulls ahead. The authors argue that the smaller batch sizes are more representative of many real-world streaming scenarios. For deletions, Aspen generally performs better, which the authors attribute to Terrace's deletion logic not being as optimized as its insertion logic.

Manual transcription of Table 6: Update throughput (edges per second).

Insert

Delete

Orkut

Batch Size

Terrace

Aspen

T/A

Terrace

Aspen

T/A

Terrace

Aspen

T/A

Terrace

Aspen

T/A

1E1

3.93E5

1.25E5

3.14

2.11E5

7.28E4

1.75

1.42E6

1.31E5

10.86

7.49E5

1.28E5

5.86

1E2

1.11E6

7.11E5

1.56

8.12E5

4.32E5

1.11

2.41E6

7.62E5

3.16

1.37E6

7.55E5

1.82

1E3

5.48E6

2.77E6

1.98

3.25E6

1.97E6

1.23

4.72E6

2.98E6

1.59

1.97E6

2.83E6

0.69

1E4

1.96E7

6.56E6

2.99

1.06E7

4.93E6

1.70

5.55E6

7.38E6

0.75

2.52E6

7.05E6

0.36

1E5

4.83E7

1.57E7

3.09

2.35E7

1.26E7

1.70

8.68E6

1.61E7

0.54

3.62E6

1.46E7

0.25

1E6

4.40E7

3.46E7

1.27

1.71E7

2.69E7

0.52

9.23E6

3.43E7

0.27

4.36E6

3.32E7

0.13

1E7

2.82E7

1.03E8

0.27

2.59E7

7.76E7

0.25

6.61E6

1.05E8

0.06

4.62E6

1.05E8

0.04

2. Query Performance: The query performance results demonstrate the key strength of Terrace's design.

Figure 3: Average time to run kernels across all graphs in Ligra, Aspen, and Terrace normalized to Ligra. The four kernels tested for all systems were breadth-first search (BFS), PageRank (PR), singl… 该图像是一个柱状图，展示了图处理系统Ligra、Aspen和Terrace在多个图算法上归一化运行时间的对比。测试算法包括广度优先搜索(BFS)、PageRank(PR)、单源介数中心性(BC)、连通分量(CC)、单源最短路径(SSSP)和三角计数(TC)。

This summary chart shows that, on average, Terrace (purple bars) is significantly faster than Aspen (orange bars) for all common kernels (BFS, PR, BC, CC). More impressively, Terrace's performance is very close to Ligra (green bars), the static baseline. In some cases, like BFS, Terrace is even faster than Ligra. This is a major result, as it shows that Terrace largely eliminates the query performance penalty typically associated with dynamic graph systems. For algorithms not implemented in Aspen (SSSP, TC), Terrace is only moderately slower than Ligra.

The per-kernel performance charts reinforce this finding. For example, in Connected Components (CC) and Betweenness Centrality (BC), Terrace consistently outperforms Aspen and is often faster than or competitive with Ligra.

Figure 10: Time to run CC normalized to Ligra. 该图像是图表，展示了图10中运行连通组件（CC）算法的时间，所有时间均相对于Ligra做了归一化。图中对比了Ligra、Aspen和Terrace三种系统在不同数据集（LJ、Orkut、rMAT、Protein、Twitter）上的性能表现。

Figure 9: Time to run BC normalized to Ligra. 该图像是图表，展示了图9中不同图数据集上运行BC算法的时间，相对于Ligra的归一化运行时间。图中比较了Ligra、Aspen和Terrace三种系统的性能，Terrace在多数数据集上表现出更低的归一化运行时间。

Ablations / Parameter Sensitivity: The paper includes experiments to validate the hierarchical design and study the sensitivity to its parameters.

1. Impact of Level Cutoffs: This experiment tests how changing the degree cutoffs ( $S$ for in-place and $L$ for PMA) affects performance.

$Figure 11: Normalized time of Terrace with different level cutoffs. The cutoffs are in the format $S - L$ where $S , 2 ^ { L }$ are the in-place and PMA cutoffs, respectively. For example, the origin…$ 该图像是图表，展示了不同层级截断参数配置下 Terrace 系统运行时间的归一化对比，横轴为四种图算法（BFS、PR、CC、BC），纵轴为归一化运行时间，参数格式为 S-L，其中 $S$ 和 $2^{L}$ 分别是就地和 PMA 截断值。

The results show that performance is not highly sensitive to the exact choice of cutoffs. While the default configuration (13-10) is generally the best, other reasonable choices perform similarly. This is an important practical result, as it means the system does not require extensive, dataset-specific tuning to perform well.

2. Importance of the Three-Level Hierarchy: This ablation study compares the full three-level Terrace ( $Inplace+PMA+Btree$ , which is the $PMA+BTree$ bar in the chart normalized to 1) against two-level variants: one with only in-place and PMA levels ( $Inplace+PMA$ ), and one with only in-place and B-tree levels ( $Inplace+Btree$ ).

该图像是图表，展示了图12中Terrace系统在不同层次结构配置下归一化运行时间的比较，横轴表示不同算法（BFS、PR、CC、BC），纵轴为归一化运行时间，图中对比了Inplace+PMA、Inplace+Btree和PMA+Btree三种配置。

The results clearly show that the three-level design is critical.
- The $Inplace+Btree$ variant is much slower for algorithms like PR and CC. This is because it forces medium-degree vertices into B-trees, losing the superior scanning performance of the PMA.
- The $Inplace+PMA$ variant (which is what Terrace effectively becomes, represented by the green bar) is faster than using B-trees alone, demonstrating the value of the PMA for medium-degree vertices. The full $PMA+Btree$ design (purple bar) is better still, confirming that using B-trees for high-degree vertices is beneficial.
- Overall, this experiment validates that each level of the hierarchy plays a distinct and important role.

7. Conclusion & Reflections

Conclusion Summary: The paper successfully demonstrates that a "one-size-fits-all" approach is suboptimal for dynamic graph data structures, especially on skewed real-world graphs. The proposed system, Terrace, introduces a novel three-level hierarchical data structure that dynamically adapts to a vertex's degree. By using in-place arrays for low-degree vertices, a shared PMA for medium-degree vertices, and individual B-trees for high-degree vertices, Terrace achieves an excellent balance between update speed and query locality. The experimental results provide strong evidence for its effectiveness: Terrace offers both faster updates than the state-of-the-art dynamic system Aspen (for many workloads) and query performance that is competitive with the highly-optimized static system Ligra.
Limitations & Future Work:
- Suboptimal Deletion Performance: The authors acknowledge that Terrace's batch deletion performance is not as strong as its insertion performance and lags behind Aspen. They identify this as an area for future engineering and optimization.
- Concurrent Updates and Queries: The evaluation was performed in a phased manner (updates and queries were not run simultaneously). Analyzing performance in a truly concurrent setting where queries are executed while updates are happening is a direction for future work.
- Automatic Tuning: While the system is not overly sensitive to its cutoff parameters, automatically tuning them based on the observed graph distribution could yield further performance gains.
Personal Insights & Critique:
- Elegant and Practical Idea: The core idea of Terrace is both simple and powerful. The realization that different data structures excel at different scales and applying this to vertex degrees is an elegant solution to a well-known problem. Its strong empirical results suggest it is a very practical design.
- Bridging a Critical Gap: Terrace makes a significant contribution by narrowing the long-standing performance gap between static and dynamic graph processing systems. This is crucial for applications that require both real-time updates and high-performance analytics.
- Question of Generality: The design is explicitly tailored for skewed, power-law graphs. While this covers many important real-world cases (like social networks), its performance on graphs with different degree distributions (e.g., more uniform, as in road networks or meshes) might be less impressive. The Protein graph provides some insight, but further evaluation on non-skewed graphs would be interesting.
- Complexity: The hierarchical design, while effective, introduces more complexity compared to a "one-size-fits-all" system like Aspen. Managing data migration between levels as vertex degrees change adds engineering overhead, as seen in the less-optimized deletion performance. However, the paper makes a strong case that this complexity is a worthwhile trade-off for the significant performance gains.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.

Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs

TL;DR Summary

Abstract

Mind Map

In-depth Reading

English Analysis~17 min read · 19,418 chars

1. Bibliographic Information

2. Executive Summary

3. Prerequisite Knowledge & Related Work

4. Methodology (Core Technology & Implementation)

5. Experimental Setup

6. Results & Analysis

7. Conclusion & Reflections

Similar papers