Paper status: completed

Chameleon

Published:01/01/1999
Original Link
Price: 0.100000
6 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

Chameleon provides real RISC-V SoC power traces obfuscated with four hiding techniques, enabling precise segmentation and attack analysis. It offers realistic data for evaluating side-channel countermeasures and exposes vulnerabilities in current hiding methods.

Abstract

IACR Transactions on Cryptographic Hardware and Embedded Systems ISSN 2569-2925, Vol. 2025, No. 3, pp. 389–412. DOI:10.46586/tches.v2025.i3.389-412 Chameleon: A Dataset for Segmenting and Attacking Obfuscated Power Traces in Side-Channel Analysis Davide Galli, Giuseppe Chiari and Davide Zoni DEIB, Politecnico di Milano, Milan, Italy, firstname.lastname@polimi.it Abstract. Side-channel attacks exploit unintended information leakage emitted by cryptographic devices to extract sensitive data. Hiding techniques are a cost-effective countermeasure designed to obfuscate the side-channel leakage and hinder these attacks. Available open datasets rely on artificial models to simulate hiding effects, preventing a realistic assessment of these countermeasures and, thus, leaving a pressing need for datasets offering real-world, obfuscated side-channel measurements. Chameleon introduces the first comprehensive dataset of real-world, obfuscated power traces collected from a RISC-V-based System-on-Chip. The traces are obfuscated using four state-of-the-art hiding techniques: dynamic frequency scaling, random delay, morphing, and chaffing. Chameleon captures real leakage deformations int

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

  • Title: Chameleon: A Dataset for Segmenting and Attacking Obfuscated Power Traces in Side-Channel Analysis
  • Authors: Davide Galli, Giuseppe Chiari, and Davide Zoni
  • Affiliations: DEIB, Politecnico di Milano, Milan, Italy
  • Journal/Conference: The paper does not specify the publication venue. The format suggests a conference paper (e.g., CHES, DAC, DATE).
  • Publication Year: The provided text does not contain a publication year.
  • Abstract: The paper introduces Chameleon, the first comprehensive dataset of real-world, obfuscated power traces from a RISC-V-based System-on-Chip (SoC). The obfuscation is achieved using four state-of-the-art hiding countermeasures: dynamic frequency scaling, random delay, morphing, and chaffing. Unlike existing datasets that simulate hiding effects, Chameleon captures leakage deformations from actual hardware implementations, providing a realistic platform for evaluating countermeasures. A key feature is its dual focus on both the segmentation (locating cryptographic operations) and attack stages of side-channel analysis. It provides precise metadata for the start and end of each operation, facilitating research into segmentation, a critical but often overlooked step. The dataset enables researchers to develop and test new attacks, assess the vulnerabilities of current hiding techniques, and advance the field of side-channel evaluation.
  • Original Source Link: /files/papers/68fdc34ae75c708a06cc5942/paper.pdf. This appears to be a local file path, indicating the paper might be a preprint or an unpublished manuscript at the time of this analysis.

2. Executive Summary

  • Background & Motivation (Why):

    • Core Problem: Side-channel attacks (SCAs) pose a significant threat to IoT devices by exploiting unintended information leakage, like power consumption. While hiding countermeasures are designed to thwart these attacks by obfuscating the leakage, there is a lack of realistic public datasets to evaluate their effectiveness.
    • Existing Gaps: Prior datasets often rely on artificial data augmentation (e.g., simulated shifts or noise) to mimic hiding effects. This approach fails to capture the true physical deformations and noise introduced by real hardware countermeasures. Furthermore, existing datasets are almost exclusively focused on the attack stage, providing pre-aligned and pre-segmented traces, thereby ignoring the crucial and challenging preliminary step of segmenting (i.e., locating) the cryptographic operation within a noisy, continuous data stream.
    • Innovation: This paper introduces Chameleon, a dataset that addresses these gaps by providing a large collection of real-world power traces collected from a device that natively implements four popular hiding techniques. Its unique contribution is the high-quality metadata that precisely labels the start and end of each cryptographic operation, making it the first dataset designed to systematically benchmark both segmentation and attack methodologies in realistic, obfuscated scenarios.
  • Main Contributions / Findings (What):

    1. A Novel Dataset with Real-World Obfuscation: The paper presents the first power trace dataset collected from a real SoC that implements four hiding methods in hardware and software: Dynamic Frequency Scaling (DFS), Random Delay (RD), Morphing (MRP), and Chaffing (CHF). This provides a more authentic representation of countermeasure effects than simulated models.
    2. Dual-Focus on Segmentation and Attack: Chameleon is specifically designed to address both stages of the SCA pipeline. It provides raw, unaligned traces containing multiple cryptographic operations interleaved with general-purpose tasks, mimicking a real device's activity. Crucially, it includes precise time-sample metadata (pinpoints) for each operation, enabling rigorous research on segmentation algorithms.
    3. Benchmark for Segmentation Methodologies: The authors propose and validate a deep-learning-based segmentation methodology on Chameleon. They demonstrate that it is possible to achieve near-perfect segmentation even on traces obfuscated by complex, real-world countermeasures.
    4. Validation of Modern Attack Viability: The paper claims (though the full results are not in the provided text) that the dataset was validated through successful side-channel attacks. This demonstrates that modern deep-learning attacks can bypass these traditional hiding techniques, highlighting the need for more advanced security measures.

3. Prerequisite Knowledge & Related Work

  • Foundational Concepts:

    • Side-Channel Attack (SCA): An attack that exploits information leaked from a physical device during computation, such as power consumption, electromagnetic (EM) emissions, or timing, rather than attacking the cryptographic algorithm itself. By analyzing these "side channels," an attacker can infer secret data like cryptographic keys.
    • Power Trace: A recording of a device's power consumption over time, captured using an oscilloscope. Variations in the trace can correlate with the data being processed and the operations being performed.
    • Hiding Countermeasures: A class of defenses against SCAs that aim to obscure the correlation between the side-channel leakage and the secret-dependent operations. They do this by introducing noise or temporal desynchronization. The paper focuses on four types:
      • Dynamic Frequency Scaling (DFS): Randomly changing the device's clock frequency to stretch or compress parts of the power trace, misaligning leakage points.
      • Random Delay (RD): Inserting a random number of "do-nothing" instructions between actual program instructions to introduce timing jitter.
      • Morphing (MRP): Using multiple, functionally equivalent versions of an instruction or code block and randomly choosing which one to execute. Each version has a different power profile, confusing template-based attacks.
      • Chaffing (CHF): Executing multiple instances of a cryptographic operation in parallel, where only one uses the correct key (the "wheat") and the others use fake keys (the "chaff"). This floods the side-channel with misleading information.
    • Segmentation: The process of identifying and isolating the specific segment of a long, raw side-channel trace that corresponds to the execution of the target cryptographic operation (e.g., an AES encryption). This is a prerequisite for most SCAs.
    • Advanced Encryption Standard (AES): A widely used symmetric block cipher. Its internal operations, like SubBytes (which uses an S-Box), are common targets for SCAs.
    • RISC-V: An open-standard instruction set architecture (ISA) based on reduced instruction set computer (RISC) principles. It is gaining popularity in both academic research and commercial products, especially for custom SoCs.
    • System-on-Chip (SoC): An integrated circuit that combines all or most components of a computer or other electronic system into a single chip.
    • Convolutional Neural Network (CNN): A type of deep learning model particularly effective at finding patterns in spatial or sequential data, such as images or time-series data like power traces.
  • Previous Works: The paper reviews several existing public SCA datasets, highlighting their limitations:

    • DPAv4, CHES CTF, ASCADv1, ASCADv2, AES_HD, SMAesH, eShard: These datasets are valuable but primarily focus on masking countermeasures, not hiding. Most importantly, ASCADv1 introduces desynchronization artificially via scripts, not from a real hardware implementation.
    • Lack of Real Hiding Techniques: Only two datasets, AES_RD (implementing random delay) and DFS_DESYNCH (implementing DFS), feature a real hiding countermeasure. However, they are limited to a single technique each.
    • Focus on Attack Only: All reviewed datasets provide pre-aligned traces, where each trace contains exactly one cryptographic operation. This completely sidesteps the segmentation problem, which is a major hurdle in real-world attacks.
    • Artificial Scenarios: The traces typically contain only the cryptographic operation, isolated from other tasks. This does not reflect a real system where cryptographic workloads run alongside general-purpose applications, adding more noise and complexity.
  • Differentiation: Chameleon is the first dataset to simultaneously:

    1. Implement four different real-world hiding techniques on a single, modern platform (RISC-V SoC).
    2. Capture traces in a realistic setting with interleaved general-purpose tasks.
    3. Be explicitly designed for segmentation research, providing long, raw traces and precise ground-truth labels for the start and end of each cryptographic operation.

4. Methodology (Core Technology & Implementation)

The core methodology of this paper is the design and creation of the Chameleon dataset.

  • Acquisition Infrastructure (Section 3.1 & Figure 2): The setup is designed for high-fidelity, precisely labeled data acquisition.

    Figure 2: Block diagram of acquisition infrastructure. 该图像是论文“Chameleon”中的图2,展示了采集基础设施的框图。图中包含示波器(DSO)、主机(Host)、FPGA板及其内部模块(包括PINPOINTING UNIT和COMPUTING PLATFORM),体现数据、控制信号和触发信号的交互。

    • Target Device: An AMD Artix-7 FPGA hosts a custom SoC. The SoC contains a 32-bit in-order RISC-V CPU, which has been modified to support the hardware-based hiding countermeasures.
    • Measurement: A shunt resistor is placed in the FPGA's power line. A Picoscope 5244d digital sampling oscilloscope (DSO) measures the voltage drop across this resistor to capture the FPGA's power consumption (W_FPGA,). The sampling rate is 125 MSample/s.
    • Synchronization and Labeling: A custom hardware module, the pinpointing unit, is a key innovation. It is synchronized with the DSO via a shared trigger signal. The unit's operation is described in Algorithm 2.
      • It uses an internal counter synchronized with the system clock.
      • When the CPU begins the overall computation, a trigger signal starts both the DSO acquisition and the pinpointing unit's main loop.
      • The CPU asserts a reset signal at the exact moment the cryptographic operation (CO) begins. The pinpointing unit records the current counter value (the start sample).
      • When the CO finishes, the CPU de-asserts the reset signal, and the unit records the final counter value (the end sample).
      • This provides cycle-accurate start and end labels for every single CO within a long trace.
    • Frequency Recording: For the DFS dataset, an additional hardware module (described in Algorithm 3) records the exact sample at which every frequency change occurs and the new frequency value.
  • Hiding Methods Implemented (Section 2.1): Chameleon provides traces for a baseline and four countermeasures. The choice of hardware vs. software implementation reflects realistic design trade-offs.

    • Dynamic Frequency Scaling (DFS): A hardware implementation. A True Random Number Generator (TRNG) continuously selects a new frequency from a pool of 760 options (5 MHz to 100 MHz), causing significant and unpredictable trace deformation.
    • Random Delay (RD): A hardware implementation. The RISC-V CPU pipeline is modified to insert up to 2 random (NOP-like) instructions between every pair of original program instructions, introducing fine-grained jitter.
    • Morphing (MRP): A software implementation. For AES, AddRoundKey is morphed by randomly choosing one of eight equivalent implementations of the XOR operation. SubBytes is morphed by using two different S-Boxes that are periodically refreshed with random masks.
    • Chaffing (CHF): A software implementation leveraging the FreeRTOS real-time operating system. It spawns three threads: one executing AES with the correct key and two "chaff" threads executing AES with fake keys. A randomized thread scheduler interleaves their execution, mixing leakage from correct and incorrect computations.
  • Dataset Structure (Section 3.2 & Figure 3): The dataset is organized into five subdatasets: BASE (no countermeasures), DFS, RD, MRP, and CHF.

    Figure 3: Dataset file system for each hiding method. Frequencies are available only for the DFS dataset. 该图像是论文Chameleon中的图3,展示了不同隐藏方法的数据集文件系统结构,突出DFS数据集中频率信息的可用性。

    • data folder: Contains the raw power traces. Each trace is very long (134,217,550 samples) and contains multiple AES executions interleaved with other tasks.
    • metadata folder: Contains the ground-truth labels.
      • Ciphers: Plaintexts and keys for each AES execution.
      • Pinpoints: The start and end time samples for each AES execution, generated by the pinpointing unit.
      • Frequencies: For the DFS dataset only, this contains the time sample and value of every frequency change.
  • Dataset Statistics (Section 3.3): The authors provide detailed statistics on the dataset.

    (Manual Transcription of Table 1)

    BASE DFS RD MRP CHF
    Acquisition time 3h 31m 43h 20m 4h 17m 4h 15m 5h 28m
    # Traces 256 256 512 512 1024
    # AES (per trace) mean 821 898 536 575 262
    std 6.89 7.84 7.47 3.87 2.53
    AES executions length mean 148,192 135,621 220,120 195,599 398,274
    std 0.00 13,420.55 5,061.38 4,434.78 88,546.58
    General purpose
    applications length
    mean 15,276 13,695 30,096 37,789 112,878
    std 36,578.06 33,403.41 76,200.47 36,529.91 32,683.79
    # Frequencies
    (per trace)
    mean 1 31,000 1 1 1
    std 0 74.36 0 0 0
    Frequencies duration mean 134,217,550 4,330 134,217,550 134,217,550 134,217,550
    std 0 10.37 0 0 0

    This table highlights the significant impact of the countermeasures. For instance, the RD and CHF countermeasures dramatically increase the mean execution length of an AES operation. The standard deviation of the AES length is zero for BASE but very high for DFS and especially CHF, confirming the desynchronization effect. The DFS traces experience on average 31,000 frequency changes.

  • Signal-to-Noise Ratio (SNR) Analysis (Section 3.3.1 & Figure 4): The SNR is used to quantify the amount of useful leakage. The analysis confirms that the hiding countermeasures are effective.

    Figure 4: SNRs and SW-SNRs related to the intermediate \(\\mathbf { s b o x ( p \[ 1 \] \\Phi \\oplus k \[ 1 \] ) }\) in the first AES round interval on raw traces and traces aligned with sliding window. 该图像是图表,展示了图4中不同隐蔽技术下的信噪比(SNR)及滑动窗口SNR随采样点变化的曲线,反映了原始和对齐跟踪信号的漏信息强度。

    • The SNR of the unprotected BASE traces from Chameleon is already much lower than that of a standard, tightly triggered setup (CW305's SNR), due to the realistic acquisition setup with long captures and environmental noise.
    • For the raw traces of DFS, RD, MRP, and CHF, the SNR is practically flat, showing no discernible leakage peaks. This demonstrates the effectiveness of the countermeasures in hiding the signal.
    • After applying a simple alignment technique (sliding window, SW-SNR), leakage peaks reappear for BASE, RD, and MRP, indicating that these countermeasures can be partially defeated by basic resynchronization.
    • However, for DFS and CHF, even the SW-SNR remains flat, suggesting they are much stronger countermeasures that require more advanced analysis techniques to break.

5. Experimental Setup

The paper uses the Chameleon dataset to benchmark a deep-learning-based segmentation methodology.

  • Segmentation Methodology (Section 4 & Figure 5): The goal is to automatically locate the COs in the raw traces.

    Figure 5: Methodology for segmenting cryptographic operation in Chameleon's obfuscated side-channel traces, divided into training and inference pipelines. 该图像是论文中图5的示意图,展示了Chameleon中加密操作分割方法的训练管线和推断管线。图中说明了从侧信道追踪数据切片、分类、分割,到提取、精炼和筛选的完整流程。

    • Training Pipeline:
      1. Dataset Building: The raw traces are sliced into fixed-size windows (e.g., 10k samples).
      2. Labeling: Using the pinpoint metadata, each window is labeled as start of a CO (class 0), middle of a CO (class 1), or noise (class 2, corresponding to general-purpose applications).
      3. CNN Training: A CNN classifier is trained on this windowed dataset. The architecture uses convolutional layers, residual blocks, and global average pooling to learn the features of each class.
    • Inference Pipeline:
      1. Sliding Window Classification: A new, unseen raw trace is processed by sliding the window across it and having the trained CNN classify each window.
      2. Screening: The noisy sequence of classifications is refined using a majority voting filter to produce a clean segmentation.
      3. Heuristics: Additional rules are applied to fix errors, such as removing isolated start predictions (likely false positives) or splitting overly long detected intervals (likely missed detections/false negatives).
  • Evaluation Metrics (Section 4.1):

    • Intersection over Union (IoU): Measures the quality of the localization.
      • Conceptual Definition: It quantifies the overlap between a predicted segment and the ground-truth segment. A value of 1 means a perfect match, and 0 means no overlap.
      • Mathematical Formula: IoU(P,GT)=PGTPGT IoU(P, GT) = \frac{|P \cap GT|}{|P \cup GT|}
      • Symbol Explanation:
        • PP: The set of samples ürün the predicted segmentation interval.
        • GT: The set of samples ürün the ground-truth segmentation interval.
        • | \cdot |: The number of elements (samples) in a set.
        • \cap: Set intersection.
        • \cup: Set union.
    • False Rates:
      • False Positive Rate (FPR): The proportion of predicted COs that do not correspond to any real CO.
      • False Negative Rate (FNR): The proportion of real COs that were not detected by the segmentation algorithm.
  • Configurations (Section 4.2): A separate CNN model was trained for each subdataset. Key hyperparameters were optimized via random search.

    (Manual Transcription of Table 2)

    BASE DFS RD MRP CHF
    Window size 10k 10k 20k 20k 30k
    Learning rate 0.007 0.007 0.007 0.007 0.007
    Batch size 256 128 128 256 128
    Drop out 0.30 0.45 0.4 0.30 0.35
    Weight decay 3e-5 8e-5 5e-5 3e-5 5e-5
    Stride 50 50 100 100 100
    Screening filter 150 400 300 150 160
  • Baselines: The results are compared to two previous works on SCA segmentation:

    • [CGL+24][CGL+24]: A CNN-based method focused on the random delay countermeasure.
    • [GCZ24]: An extension of the previous work to handle dynamic frequency scaling.

6. Results & Analysis

The paper presents segmentation results to validate the dataset's utility and the proposed segmentation method. Note: The provided text cuts off in this section; the analysis is based on the available data.

  • Core Segmentation Results (Section 4.3): The results demonstrate the high quality of the segmentation achieved by the authors' proposed pipeline.

    (Manual Transcription of Table 3)

    BASE DFS RD MRP CHF
    [CGL+24] IoU (%)
    FNR (%) 0.22 4.40 3.08 0.34 1.22
    FPR (%) 0.59 6.76 0.00 12.44 0.54
    [GCZ24] IoU (%) - - - - -
    FNR (%) 0.01 0.06 1.15 0.00 0.67
    FPR (%) 0.11 0.73 0.13 0.00 0.48
    This work IoU (%) 89.47 88.23 84.19 91.26 87.96
    FNR (%) 0.00 0.00 0.00 0.00 0.00
    FPR (%) 0.00 0.00 0.00 0.00 0.00
    • Interpretation: The method proposed in "This work" achieves perfect detection (0% FNR and 0% FPR) across all five subdatasets, including the highly challenging DFS and CHF cases. This is a massive improvement over the baseline methods, which suffer from both false negatives and false positives, especially on the DFS and MRP datasets.

    • The IoU scores are very high (84-91%), indicating that not only are all COs found, but their boundaries are also located with high precision. This is a strong testament to both the quality of the dataset's labels and the effectiveness of the CNN-based segmentation pipeline.

    • The paper also provides confusion matrices for the window classification task.

      Table 4: Test confusion matrices for the segmentation of the Chameleon dataset. 该图像是Chameleon数据集的测试混淆矩阵图,展示了不同隐藏技术(BASE、DFS、RD、MRP、CHF)下分类器的分类性能,矩阵中以百分比形式显示预测类别与真实类别的对应准确率。

    • Table 4 (Image 7) Interpretation: These confusion matrices show the per-window classification accuracy of the CNN on the test set. For all subdatasets, the diagonal values are very high (typically >98%), indicating that the model is extremely accurate at distinguishing between the start, middle, and noise classes. This low-level accuracy is what enables the high-level segmentation pipeline to achieve perfect detection rates.

7. Conclusion & Reflections

  • Conclusion Summary: The paper introduces Chameleon, a groundbreaking dataset for side-channel analysis. Its primary contribution is providing the community with the first realistic, large-scale collection of power traces obfuscated by four different, natively implemented hiding countermeasures. By capturing real hardware effects and including precise, cycle-accurate metadata for segmentation, Chameleon fills a critical void in SCA research. It moves the field beyond simulated countermeasures and pre-aligned traces, enabling a more rigorous and realistic evaluation of both segmentation algorithms and attack methodologies. The presented benchmarks show that while hiding techniques are effective, they can be overcome with advanced deep learning models, paving the way for a new generation of analysis techniques and, consequently, more robust countermeasures.

  • Limitations & Future Work:

    • Author-Stated: The paper's text cuts off before the conclusion, but the authors imply that "significant work is still needed to develop efficient segmentation methods," suggesting that while their method is highly effective, it may not be the final word, especially regarding computational efficiency or applicability to other, unseen countermeasures.
    • Analysis Limitation: The provided text does not contain Section 5, which discusses the experimental validation of the attack stage. The abstract promises this, stating "modern deep-learning attacks can bypass traditional hiding techniques," but the detailed results and methodology for these attacks are missing from the excerpt.
  • Personal Insights & Critique:

    • Significance: This paper is highly significant. Dataset papers are crucial for scientific progress, as they provide a common ground for comparing methods. Chameleon is not just another dataset; it fundamentally raises the bar for realism in SCA research. The focus on segmentation is particularly commendable, as it is a practical and difficult problem that has been largely ignored by the academic community's focus on pre-aligned traces.
    • Technical Strength: The hardware pinpointing unit is an elegant and powerful piece of engineering. It ensures the ground-truth labels are of the highest possible quality, which is the bedrock of the entire dataset's value.
    • Impact: Chameleon will likely become a standard benchmark for researchers working on countermeasures, attacks on desynchronized traces, and automated SCA tools. It will force the community to confront the full complexity of the SCA pipeline, from raw signal to key recovery.
    • Open Questions: While the segmentation results are impressive, it would be interesting to see how well a single, generalized model trained on all hiding techniques performs, versus the specialized models presented. The missing attack results are the main open question; understanding how the attacks succeed after segmentation (e.g., what model architectures are needed, how much data is required) is the critical next step this dataset enables.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.