Paper status: completed

Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

Published:12/11/2023
Original LinkPDF
Price: 0.100000
Price: 0.100000
3 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This study introduces a ternary spike neuron to address the limited information capacity of binary spikes in spiking neural networks. Utilizing values of {-1, 0, 1} enhances information capacity while maintaining event-driven, multiplication-free advantages, with experiments show

Abstract

The Spiking Neural Network (SNN), as one of the biologically inspired neural network infrastructures, has drawn increasing attention recently. It adopts binary spike activations to transmit information, thus the multiplications of activations and weights can be substituted by additions, which brings high energy efficiency. However, in the paper, we theoretically and experimentally prove that the binary spike activation map cannot carry enough information, thus causing information loss and resulting in accuracy decreasing. To handle the problem, we propose a ternary spike neuron to transmit information. The ternary spike neuron can also enjoy the event-driven and multiplication-free operation advantages of the binary spike neuron but will boost the information capacity. Furthermore, we also embed a trainable factor in the ternary spike neuron to learn the suitable spike amplitude, thus our SNN will adopt different spike amplitudes along layers, which can better suit the phenomenon that the membrane potential distributions are different along layers. To retain the efficiency of the vanilla ternary spike, the trainable ternary spike SNN will be converted to a standard one again via a re-parameterization technique in the inference. Extensive experiments with several popular network structures over static and dynamic datasets show that the ternary spike can consistently outperform state-of-the-art methods. Our code is open-sourced at https://github.com/yfguo91/Ternary-Spike.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

1.2. Authors

The authors are Yufei Guo, Yuanpei Chen, Xiaode Liu, Weihang Peng, Yuhan Zhang, Xuhui Huang, and Zhe Ma. Their affiliations are with the Intelligent Science & Technology Academy of CASIC, China.

1.3. Journal/Conference

This paper was published on arXiv, which is a preprint server, meaning it has not necessarily undergone peer review by a journal or conference. arXiv is widely used by researchers to rapidly disseminate their work before formal publication.

1.4. Publication Year

The paper was published on December 11, 2023, at 13:28:54 UTC.

1.5. Abstract

The paper addresses a significant challenge in Spiking Neural Networks (SNNs), which are brain-inspired models known for their energy efficiency due to binary spike activations. The authors theoretically and experimentally demonstrate that these binary spikes have limited information capacity, leading to information loss and reduced accuracy. To counteract this, they propose a ternary spike neuron which utilizes {1,0,1}\{-1, 0, 1\} spike values instead of the traditional {0,1}\{0, 1\}. This ternary spike neuron is shown to significantly boost information capacity while retaining the crucial event-driven and multiplication-free (addition-based) advantages of binary SNNs. Furthermore, the paper introduces a trainable factor within the ternary spike neuron to allow different layers to learn suitable, distinct spike amplitudes (i.e., {α,0,α}\{-α, 0, α\}). This adaptive approach better accommodates the varying membrane potential distributions observed across different network layers. To maintain efficiency during inference, the trainable ternary spike SNN is converted back to a standard ternary spike SNN through a re-parameterization technique. Extensive experiments across various network architectures and both static (CIFAR10, CIFAR100, ImageNet) and dynamic (CIFAR10-DVS) datasets demonstrate that the ternary spike consistently achieves state-of-the-art performance. The authors have open-sourced their code.

https://arxiv.org/abs/2312.06372v2 (Publication Status: Preprint)

https://arxiv.org/pdf/2312.06372v2.pdf

2. Executive Summary

2.1. Background & Motivation

Spiking Neural Networks (SNNs) represent a promising "next-generation" neural network architecture, drawing inspiration from the biological brain's energy-efficient information processing. Unlike traditional Artificial Neural Networks (ANNs) that use continuous, real-valued activations, SNNs communicate through discrete, sparse binary spikes (typically 0 or 1). This event-driven, binary nature allows SNNs to replace computationally expensive multiplications with simpler additions, leading to significant energy savings, especially on specialized neuromorphic hardware.

However, the core problem the paper identifies is that this very advantage—the use of binary spike activations—becomes a bottleneck for performance. The binary spike activation maps (i.e., the patterns of 0s and 1s transmitted between neurons) are shown to have a severely limited information capacity. This limitation means they cannot adequately capture and transmit all the necessary information derived from the continuous membrane potentials (internal electrical states) of neurons, leading to information loss. This loss ultimately results in decreased task accuracy when SNNs are applied to complex problems like image recognition, hindering their widespread adoption compared to ANNs.

Furthermore, the paper highlights another overlooked issue: the membrane potential distributions (the range and frequency of internal electrical states) vary significantly across different layers of an SNN. Prior work typically quantizes these diverse membrane potentials into the same fixed spike values (e.g., always 0 or 1), which the authors argue is "unnatural" and suboptimal.

The paper's entry point and innovative idea is to address these two problems by:

  1. Expanding the spike representation: Moving beyond binary spikes to ternary spikes ({1,0,1}\{-1, 0, 1\}) to directly increase information capacity.
  2. Introducing adaptability: Allowing the spike amplitude to be learned on a layer-wise basis, thereby adapting to the unique membrane potential distributions of each layer.

2.2. Main Contributions / Findings

The paper makes several significant contributions to the field of Spiking Neural Networks:

  • Theoretical and Experimental Proof of Binary Spike Limitation: The authors rigorously prove, both theoretically using information entropy and through experimental analysis, that binary spike activation maps fundamentally lack sufficient information capacity. This limited capacity causes significant information loss during the quantization of membrane potentials, directly contributing to accuracy degradation in SNNs.
  • Proposal of the Ternary Spike Neuron: To address the information loss problem, the paper introduces a novel ternary spike neuron. This neuron transmits information using {1,0,1}\{-1, 0, 1\} spikes. Crucially, it is demonstrated that this ternary spike significantly boosts the information capacity compared to binary spikes while fully retaining the key advantages of SNNs: event-driven computation (neurons only activate when membrane potential crosses a threshold) and multiplication-free operations (multiplications can be replaced by additions or subtractions). This represents a new paradigm for spike neurons.
  • Development of a Trainable Ternary Spike Neuron: Recognizing that membrane potential distributions differ across layers, the authors extend the concept to a learnable ternary spike neuron. This version allows the spike magnitude (represented by a factor α\alpha) to be learned during the training phase, resulting in layer-wise spike amplitudes of {α,0,α}\{- \alpha, 0, \alpha\}. This adaptive approach enables the SNN to better suit the unique characteristics of each layer. For efficient inference, a re-parameterization technique is employed to fold the learned α\alpha factor into the network weights, converting the trainable ternary spikes back to standard ternary spikes (effectively {1,0,1}\{-1, 0, 1\} after weight scaling) and thus preserving the addition-mostly computational advantage.
  • Extensive Experimental Validation: The proposed methods (Ternary Spike and Trainable Ternary Spike) are thoroughly evaluated on a wide range of popular network architectures (e.g., ResNet20, ResNet19, ResNet18, ResNet34) and diverse datasets, including static image datasets (CIFAR10, CIFAR100, ImageNet) and a dynamic event-based dataset (CIFAR10-DVS). The experimental results consistently demonstrate that the Ternary Spike significantly outperforms state-of-the-art SNN models, often achieving substantial accuracy improvements (e.g., ~3% higher top-1 accuracy on ImageNet with ResNet34 and 4 timesteps) while maintaining high energy efficiency.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand the paper, a foundational grasp of neural networks, information theory, and the specifics of Spiking Neural Networks is essential.

3.1.1. Artificial Neural Networks (ANNs)

Artificial Neural Networks (ANNs) are computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers. Each connection has a weight, and each neuron applies an activation function to the weighted sum of its inputs to produce an output. ANNs have achieved remarkable success in various fields, but they typically use continuous, real-valued activations and perform numerous multiplications, leading to high computational and energy costs, especially for deep and complex models.

3.1.2. Spiking Neural Networks (SNNs)

Spiking Neural Networks (SNNs) are considered the third generation of neural networks, designed to mimic the brain's event-driven, sparse, and asynchronous communication more closely than ANNs.

  • Binary Spike Activations: Unlike ANNs, SNN neurons communicate by firing discrete, binary electrical impulses called spikes (typically represented as 0 for no spike and 1 for a spike).
  • Event-Driven Computation: Computation in SNNs is event-driven, meaning that neurons only process information and consume energy when a spike occurs. If a neuron does not fire, it remains "silent," saving energy.
  • Multiplication-Free Operations: Because activations are binary (0 or 1), the multiplication of an activation by a weight (1×w1 \times w) can be simplified to an addition (0+w0 + w) if a spike occurs, or no operation if no spike occurs (0×w=00 \times w = 0). This replacement of multiplications with additions is a primary source of SNNs' high energy efficiency.
  • Neuromorphic Hardware: SNNs are particularly well-suited for specialized neuromorphic hardware, which is designed to process spike-based information efficiently, further enhancing energy savings.

3.1.3. Information Entropy

Information entropy, a concept from information theory, measures the average level of "surprise," "uncertainty," or "information content" in a random variable. In the context of data representation, a higher entropy value indicates that a system can represent a wider variety of states or carry more information.

  • Representation Capability: The paper uses representation capability, R(S)\mathcal{R}(\mathbf{S}), to quantify how much information a set of samples S\mathbf{S} can encode. It is directly linked to the maximum information entropy, H(S)\mathcal{H}(\mathbf{S}).
  • Formula for Information Entropy: For a discrete random variable SS with possible outcomes s1,s2,,sNs_1, s_2, \ldots, s_N and their respective probabilities pS(s1),pS(s2),,pS(sN)p_S(s_1), p_S(s_2), \ldots, p_S(s_N), the entropy H(S)\mathcal{H}(S) is defined as: $ \mathcal{H}(S) = - \sum_{i=1}^{N} p_S(s_i) \log_2 p_S(s_i) $
    • Symbol Explanation:
      • H(S)\mathcal{H}(S): The information entropy of the random variable SS.
      • NN: The total number of distinct possible outcomes (samples) for SS.
      • sis_i: The ii-th distinct outcome (sample) of SS.
      • pS(si)p_S(s_i): The probability of the ii-th outcome sis_i occurring.
      • log2\log_2: The logarithm base 2, implying that entropy is measured in bits.
  • Maximizing Entropy: Entropy is maximized when all outcomes are equally probable. For NN equally probable outcomes, pS(si)=1/Np_S(s_i) = 1/N for all ii, and the maximum entropy becomes log2(N)\log_2(N). The paper states this as Proposition 1.

3.1.4. Leaky-Integrate-and-Fire (LIF) Neuron Model

The Leaky-Integrate-and-Fire (LIF) neuron model is a widely used, biologically plausible model for SNNs. It simplifies the complex dynamics of biological neurons into a set of differential or iterative equations.

  • Mechanism:
    1. Integrate: The neuron accumulates incoming synaptic currents, which increase its internal state called membrane potential (uu).
    2. Leak: Over time, if no inputs are received, the membrane potential leaks (decays) back towards its resting state.
    3. Fire: If the membrane potential reaches a predefined firing threshold (VthV_{\mathrm{th}}), the neuron fires a spike (outputs 1).
    4. Reset: After firing, the membrane potential is reset (typically to 0 or a resting potential), making the neuron temporarily unable to fire again immediately.
  • Iterative Model (Equations 1-3 from paper): The standard iterative LIF model is governed by: $ u^{\mathrm{before}, t} = (1 - \frac{dt}{\tau}) u^{t-1} + \frac{dt}{\tau} R \cdot I^t $ $ u^t = \left{ \begin{array}{ll} {0, \mathrm{if~} u^t \ge V_{\mathrm{th}}} \ {u^{\mathrm{before},t}, \mathrm{otherwise}} \end{array} \right. $ $ o^t = \left{ \begin{array}{ll} {1, \mathrm{~if~} u^{\mathrm{before},t} \ge V_{\mathrm{th}}} \ {0, \mathrm{~otherwise}} \end{array} \right. $
    • Symbol Explanation:
      • ubefore,tu^{\mathrm{before}, t}: Membrane potential before firing at timestep tt.
      • utu^t: Membrane potential after firing/reset at timestep tt.
      • oto^t: Output spike at timestep tt.
      • ut1u^{t-1}: Membrane potential at the previous timestep t-1.
      • dt: Time constant for the simulation step.
      • τ\tau: Membrane time constant, influencing the leak rate.
      • RItR \cdot I^t: Charging voltage from input current ItI^t and resistance RR.
      • VthV_{\mathrm{th}}: Firing threshold. The paper simplifies the input voltage dtτRIt\frac{dt}{\tau} R \cdot I^t to jwjoj,pret\sum_j w_j o_{j,\mathrm{pre}}^t, representing the weighted sum of spikes from the previous layer. The term (1dtτ)(1 - \frac{dt}{\tau}) is also simplified to τ\tau (a different constant, usually also denoted as τ\tau). This leads to the simplified iterative model used in the paper: $ u^t = \tau u^{t-1} (1 - o^{t-1}) + \sum_j w_j o_{j,\mathrm{pre}}^t $ $ o^t = \left{ \begin{array}{ll} {1,} & {\mathrm{if~} u^t \geq V_{\mathrm{th}}} \ {0,} & {\mathrm{otherwise}} \end{array} \right. $
    • Symbol Explanation (simplified model):
      • utu^t: Membrane potential at timestep tt.
      • oto^t: Output spike (0 or 1) at timestep tt.
      • τ\tau: Leakage factor (simplified from 1dtτ1 - \frac{dt}{\tau}), a constant between 0 and 1.
      • ut1u^{t-1}: Membrane potential at the previous timestep t-1.
      • (1ot1)(1 - o^{t-1}): This term ensures that if a spike was fired at t-1 (ot1=1o^{t-1}=1), the membrane potential is reset to 0 (effectively ut10u^{t-1} \cdot 0). If no spike was fired (ot1=0o^{t-1}=0), the potential leaks (effectively ut1τu^{t-1} \cdot \tau).
      • wjw_j: Weight connecting the jj-th neuron of the previous layer to the current neuron.
      • oj,preto_{j,\mathrm{pre}}^t: Binary spike output from the jj-th neuron of the previous layer at timestep tt.
      • VthV_{\mathrm{th}}: Firing threshold.

3.2. Previous Works

The paper organizes related work into Learning Methods of Spiking Neural Networks and Information Loss in Spiking Neural Networks.

3.2.1. Learning Methods of Spiking Neural Networks

  • ANN-SNN Conversion: This common approach involves training a traditional ANN first and then converting its parameters to an SNN.
    • Principle: It maps ANN activations to SNN average firing rates.
    • Advantages: Often simpler to implement and can achieve high accuracy (approaching the ANN's performance) because ANN training is more established.
    • Deficiencies (as pointed out by the paper):
      1. Rate-coding Limitation: Primarily relies on rate-coding (information encoded in firing rates), ignoring the richer temporal dynamic behaviors of SNNs, making it less suitable for neuromorphic datasets (event-based data like DVS cameras).
      2. High Timesteps: Usually requires a large number of timesteps (simulation steps) to match ANN accuracy, which increases energy consumption and contradicts SNNs' low-power goal.
      3. Accuracy Ceiling: SNN accuracy cannot exceed the original ANN accuracy, limiting its potential.
    • Examples: SpikeNorm (Sengupta et al. 2019), RMP (Han, Srinivasan, and Roy 2020), SpikeConverter (Liu et al. 2022).
  • Direct Training from Scratch: This method involves training SNNs directly using backpropagation with surrogate gradients (a technique to approximate the non-differentiable spiking function).
    • Advantages: Better suited for neuromorphic datasets and can achieve high performance with very few timesteps (sometimes <5), leading to higher energy efficiency.
    • Examples: STBP-tdBN (Zheng et al. 2021), TET (Deng et al. 2022), Dspike (Li et al. 2021b), SEW ResNet (Fang et al. 2021a).
  • Hybrid Learning: Combines elements of both ANN-SNN conversion and direct training.
    • Examples: Hybrid-Train (Rathi and Roy 2020).
  • Paper's Focus: This paper focuses on improving the performance of directly training-based SNNs by addressing the information loss problem.

3.2.2. Information Loss in Spiking Neural Networks

The paper highlights a specific area of research dedicated to mitigating information loss in SNNs, an area it aims to advance.

  • InfLoR-SNN (Guo et al. 2022b): Proposed a membrane potential rectifier to adjust membrane potentials closer to quantization spikes, reducing quantization error.
  • RMP-Loss (Guo et al. 2023a): Used a loss function to adjust membrane potentials to reduce quantization error, similar in idea to InfLoR-SNN.
  • IM-Loss (Guo et al. 2022a): Argued that information entropy of activations could be maximized to reduce information loss, proposing an information maximization loss function.
  • RecDis-SNN (Guo et al. 2022c): Introduced a loss to penalize undesired shifts in membrane potential distributions, aiming for a bimodal distribution which helps mitigate information loss.
  • MT-SNN (Wang, Zhang, and Zhang 2023): Proposed a multiple threshold (MT) algorithm for LIF neurons to partially recover information lost during quantization.
  • Common Limitation of Prior Work: The paper points out that all these prior works, despite their efforts to reduce information loss, still quantize membrane potentials to binary spikes (0 or 1) and ignore the differences in membrane potential distributions along layers.
  • Alternative Ternary Spike: The paper also briefly mentions another ternary spike neuron proposed in (Sun et al. 2022) which uses {0,1,2}\{0, 1, 2\} spikes. The key distinction is that this alternative cannot benefit from the multiplication-addition transform (as 2×w2 \times w is still a multiplication), losing a core energy efficiency advantage of SNNs. This paper's proposed ternary spike using {1,0,1}\{-1, 0, 1\} does retain this advantage.

3.3. Technological Evolution

The field of neural networks has evolved from traditional ANNs, which prioritize accuracy but consume significant power, towards more energy-efficient models. This evolution includes techniques like quantization (reducing the precision of weights/activations), pruning (removing redundant connections), and knowledge distillation (transferring knowledge from a large model to a smaller one). SNNs emerged as a distinct path, seeking biological plausibility and extreme energy efficiency by adopting spike-based communication. Early SNN research focused on biological modeling, then shifted to achieving competitive accuracy on benchmark tasks, often relying on ANN-SNN conversion. More recently, direct training methods have gained prominence for their ability to achieve high accuracy with fewer timesteps and better suitability for dynamic neuromorphic data. Within this direct training paradigm, a critical challenge has been information loss due to the binary nature of spikes. This paper's work represents an advancement within this lineage, pushing beyond binary spikes to ternary spikes and introducing layer-wise adaptability to further close the performance gap between SNNs and ANNs while maintaining efficiency.

3.4. Differentiation Analysis

Compared to the main methods in related work, this paper's approach offers several core differences and innovations:

  1. Increased Information Capacity via Ternary Spikes: Unlike prior works that primarily focused on refining the binary spiking process (e.g., adjusting membrane potentials, thresholds, or loss functions to minimize error within binary quantization), this paper fundamentally alters the spike representation itself by moving from binary {0,1}\{0, 1\} to ternary {1,0,1}\{-1, 0, 1\}. This directly and theoretically boosts the information capacity per spike, a novel approach to addressing information loss.

  2. Preservation of SNN Advantages: Crucially, the proposed ternary spike is designed such that it still retains the event-driven and multiplication-free (convertible to addition/subtraction) advantages that are fundamental to SNNs' energy efficiency. This differentiates it from other non-binary spiking schemes (like the {0,1,2}\{0, 1, 2\} scheme mentioned in related work) that might sacrifice the multiplication-free benefit.

  3. Layer-wise Adaptive Spike Amplitudes: The paper introduces a learnable factor α\alpha to create trainable ternary spikes of magnitudes {α,0,α}\{- \alpha, 0, \alpha\}. This is a significant innovation because it directly addresses the observation that membrane potential distributions vary across different layers. Previous works largely ignored this heterogeneity, applying a uniform quantization scheme. By allowing layer-specific spike amplitudes, the model can better adapt to and represent the unique firing characteristics of each layer.

  4. Re-parameterization for Efficiency: To ensure that the learnable spike amplitudes do not compromise inference efficiency (by reintroducing multiplications), the paper utilizes a re-parameterization technique. This allows the learned α\alpha values to be folded into the weights during inference, effectively converting the network back into a standard ternary spike SNN that enjoys multiplication-free operations. This training-inference decoupling maintains the benefits of adaptive learning without incurring runtime overhead.

    In essence, this paper proposes a more expressive and adaptive spiking mechanism that tackles information loss at its source (spike representation) and adapts to network heterogeneity, all while carefully preserving the inherent energy efficiency principles of SNNs.

4. Methodology

4.1. Principles

The core principle behind the proposed method is that the limited information capacity of binary spikes (0 or 1) is a primary bottleneck for the accuracy of Spiking Neural Networks (SNNs). By expanding the possible spike values from binary to ternary (i.e., {1,0,1}\{-1, 0, 1\}), the information carried by each spike is significantly increased. This directly mitigates information loss during the quantization of membrane potentials. Furthermore, the paper posits that a static, uniform quantization to the same spike values across all layers is suboptimal because membrane potential distributions differ substantially between layers. To address this, learnable spike amplitudes are introduced, allowing each layer to dynamically adjust its output spike magnitude, thus better adapting to its unique membrane potential characteristics. Crucially, these enhancements are designed to maintain the energy-efficient event-driven and multiplication-free (or addition-only) advantages that define SNNs.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Information Loss in Spiking Neural Networks

The paper begins by theoretically justifying the claim that binary spike activation maps suffer from limited information capacity, leading to information loss and decreased accuracy. This justification uses the concept of information entropy.

Theoretical Analysis using Information Entropy: The representation capability, R(S)\mathcal{R}(\mathbf{S}), of a set of samples S\mathbf{S} can be measured by the maximum information entropy, H(S)\mathcal{H}(\mathbf{S}). The general formula for R(S)\mathcal{R}(\mathbf{S}) is given by:

$ \mathcal{R}(\mathbf{S}) = \operatorname*{max} \mathcal{H}(\mathbf{S}) = \operatorname*{max} \left( - \sum_{s \in \mathbf{S}} p \mathbf{s}(s) \log p \mathbf{s}(s) \right) $ Where:

  • R(S)\mathcal{R}(\mathbf{S}): The representation capability of the set S\mathbf{S}.

  • H(S)\mathcal{H}(\mathbf{S}): The information entropy of the set S\mathbf{S}.

  • sSs \in \mathbf{S}: A sample from the set S\mathbf{S}.

  • ps(s)p \mathbf{s}(s): The probability of a specific sample ss occurring from S\mathbf{S}.

  • log\log: The logarithm (implicitly base 2 for bits).

    The paper then presents Proposition 1, which states: Proposition 1: When p_S(s_1) = p_S(s_2) = p_S(s_3) \cdots = p_S(s_N), H(S)\mathcal{H}(S) reaches its maximum, log(N)\log(N). Here, NN denotes the total number of distinct samples (or states) possible from SS.

Using this proposition, the authors calculate the representation capability for both binary spike feature maps and real-valued membrane potential maps.

  • Binary Spike Feature Map: Let FBBC×H×W\mathbf{F}_B \in \mathbb{B}^{C \times H \times W} denote a binary feature map. Each individual binary spike output oo can be one of 2 values ({0,1}\{0, 1\}), representing 1 bit of information. For a feature map of size C×H×WC \times H \times W (Channels ×\times Height ×\times Width), the total number of distinct samples (possible states) is 2(C×H×W)2^{(C \times H \times W)}. Therefore, its representation capability is: $ \mathcal{R}(\mathbf{F}_B) = \log_2 2^{(C \times H \times W)} = C \times H \times W $
  • Real-valued Membrane Potential Map: Let MRRC×H×W\mathbf{M}_R \in \mathbb{R}^{C \times H \times W} denote a real-valued membrane potential map. Assuming each real-valued potential requires 32 bits (standard floating-point precision), this corresponds to 2322^{32} possible samples per potential. For a feature map of size C×H×WC \times H \times W, the total number of distinct samples is 232×(C×H×W)2^{32 \times (C \times H \times W)}. Its representation capability is: $ \mathcal{R}(\mathbf{M}_R) = \log_2 2^{32 \times (C \times H \times W)} = 32 \times C \times H \times W $ The comparison clearly shows that R(MR)\mathcal{R}(\mathbf{M}_R) is 32 times greater than R(FB)\mathcal{R}(\mathbf{F}_B). This significant difference theoretically demonstrates that quantizing real-valued membrane potentials to binary spikes causes excessive information loss. The paper also notes that increasing timesteps in SNNs, which is known to improve accuracy, implicitly increases the representation capability by accumulating spike information over time, thus aligning with this information theory.

4.2.2. Ternary Spike Neuron Model

To address the identified information loss while preserving SNN's energy efficiency, the paper proposes a ternary LIF spike neuron. This neuron extends the output spike values from binary {0,1}\{0, 1\} to ternary {1,0,1}\{-1, 0, 1\}.

The iterative model for the ternary LIF spike neuron is given by: $ u^t = \tau u^{t-1} (1 - |o^{t-1}|) + \sum_j w_j o_{j,\mathrm{pre}}^t $ $ o^t = \left{ \begin{array}{l} {1, \mathrm{~if~} u^t \geq V_{\mathrm{th}}} \ {-1, \mathrm{~if~} u^t \leq -V_{\mathrm{th}}} \ {0, \mathrm{~otherwise}} \end{array} \right. $ Where:

  • utu^t: Membrane potential at timestep tt.
  • oto^t: Output spike (either -1, 0, or 1) at timestep tt.
  • τ\tau: Leakage factor (a constant between 0 and 1).
  • ut1u^{t-1}: Membrane potential at the previous timestep t-1.
  • ot1|o^{t-1}|: The absolute value of the output spike from the previous timestep. This term is crucial for the reset mechanism:
    • If a spike was fired at t-1 (either ot1=1o^{t-1}=1 or ot1=1o^{t-1}=-1), then ot1=1|o^{t-1}|=1. The term (1ot1)(1 - |o^{t-1}|) becomes 0, effectively resetting the membrane potential to 0 before integrating new inputs.
    • If no spike was fired at t-1 (ot1=0o^{t-1}=0), then ot1=0|o^{t-1}|=0. The term (1ot1)(1 - |o^{t-1}|) becomes 1, allowing the membrane potential to leak by τut1\tau u^{t-1}.
  • wjw_j: Weight connecting the jj-th neuron of the previous layer to the current neuron.
  • oj,preto_{j,\mathrm{pre}}^t: Ternary spike output from the jj-th neuron of the previous layer at timestep tt.
  • VthV_{\mathrm{th}}: Positive firing threshold.
  • Vth-V_{\mathrm{th}}: Negative firing threshold. If the membrane potential drops below this, a negative spike is fired.

4.2.2.1. Representation Capacity Improvement of Ternary Spike Neuron

The authors argue that firing ternary spikes significantly increases the representation capacity of SNNs. Let FTTC×H×W\mathbf{F}_T \in \mathbb{T}^{C \times H \times W} denote a ternary feature map, where T={1,0,1}\mathbb{T} = \{-1, 0, 1\}. Each ternary spike output can be one of 3 values. Thus, for a feature map of size C×H×WC \times H \times W, the total number of distinct samples is 3(C×H×W)3^{(C \times H \times W)}. According to Proposition 1, its representation capability is: $ \mathcal{R}(\mathbf{F}_T) = \log_2 3^{(C \times H \times W)} = (C \times H \times W) \log_2 3 $ Since log231.585\log_2 3 \approx 1.585, the ternary spike feature map has approximately 1.585 times the representation capacity of a binary spike feature map (where R(FB)=C×H×W\mathcal{R}(\mathbf{F}_B) = C \times H \times W). This theoretical improvement indicates enhanced information expressiveness and potential for better performance.

4.2.2.2. Event-driven and Addition-only Advantages Retaining

The paper emphasizes that the proposed ternary spike neuron retains the critical energy-efficiency advantages of vanilla SNNs:

  • Event-driven Characteristic: Similar to binary SNNs, the ternary spike neuron is event-driven. It only activates and performs computations if its membrane potential exceeds the positive threshold VthV_{\mathrm{th}} (firing a 1 spike) or drops below the negative threshold Vth-V_{\mathrm{th}} (firing a -1 spike). If the potential stays between Vth-V_{\mathrm{th}} and VthV_{\mathrm{th}}, it remains silent (fires a 0 spike), conserving energy.
  • Multiplication-addition Transform: The core energy-saving mechanism of SNNs is replacing weight-activation multiplications with additions/subtractions.
    • For a binary spike neuron: If a spike is fired (o=1o=1), the operation is: $ x = 1 \times w $ This multiplication can be replaced by an addition: $ x = 0 + w $
    • For a ternary spike neuron: If a spike is fired (o=1o=1 or o=1o=-1), the operation is: $ x = 1 \times w, \mathrm{or}, -1 \times w $ These multiplications can also be replaced by additions or subtractions: $ x = 0 + w, \mathrm{or, 0 - w} $ This demonstrates that the proposed ternary spike neuron successfully enhances the expression ability of SNNs while fully retaining the event-driven and addition-only (or addition/subtraction-only) advantages, which are key for energy efficiency.

4.2.3. Trainable Ternary Spike

The paper then addresses a second problem: the "unnatural" practice of quantizing membrane potentials from different layers to the same fixed spike values, despite membrane potential distributions varying significantly across layers. To demonstrate this, the authors provide empirical evidence in Figure 2 (images/2.jpg):

该图像是一个示意图,其中展示了不同情况下的分布直方图,包括八个子图(a)至(h),展示了信息传输过程中的信息容量变化。这些变化与三元脉冲神经元的性能相关。 该图像是一个示意图,其中展示了不同情况下的分布直方图,包括八个子图(a)至(h),展示了信息传输过程中的信息容量变化。这些变化与三元脉冲神经元的性能相关。

The image 2.jpg presents a schematic showing distribution histograms related to information capacity. While the VLM description for this image is generic, the accompanying text in the paper in Section 4.3 specifically refers to "Fig. 2. Membrane potential distributions of different layers of a spiking ResNet20 with 1&2 timesteps on the CIFAR-10 dataset." It shows that these distributions "are very different along layers." This visual evidence supports the claim that a one-size-fits-all quantization is suboptimal.

To tackle this, the paper introduces a trainable ternary spike neuron, where the spike amplitude can be learned during the training phase.

The equations for the trainable ternary spike neuron are: $ u^t = \tau u^{t-1} (1 - |b^{t-1}|) + \sum_j w_j o_{j,\mathrm{pre}}^t $ $ o^t = \left{ \begin{array}{l} {1 \cdot a, \mathrm{~if~} u^t \geq V_{\mathrm{th}}} \ {-1 \cdot a, \mathrm{~if~} u^t \leq -V_{\mathrm{th}}} \ {0 \cdot a, \mathrm{~otherwise}} \end{array} \right. $ Where:

  • utu^t, τ\tau, ut1u^{t-1}, wjw_j, oj,preto_{j,\mathrm{pre}}^t, VthV_{\mathrm{th}} are the same as defined for the ternary spike neuron.

  • bt1{1,0,1}b^{t-1} \in \{-1, 0, 1\}: Represents the normalized ternary spike value at the previous timestep before scaling by aa. This is used in the reset term (1bt1)(1 - |b^{t-1}|) to ensure the membrane potential reset logic is based on whether a non-zero spike was fired, regardless of its amplitude.

  • ot=abto^t = a \cdot b^t: The actual output spike at timestep tt, which is a normalized ternary spike bt{1,0,1}b^t \in \{-1, 0, 1\} scaled by a trainable factor aa.

  • aa: A trainable factor (scalar) that determines the spike amplitude. This factor is set in a layer-wise manner, meaning each layer learns its own aa. This allows neurons in different layers to fire spikes of different magnitudes, adapting to their specific membrane potential distributions.

    The paper notes that while the concept of a trainable factor is similar to Real Spike (Guo et al. 2022d), the motivations differ. In this paper, aa adjusts spike amplitude to suit membrane potential distributions, whereas Real Spike used a trainable factor to learn unshared convolution kernels.

4.2.3.1. Re-parameterization Technique

A potential issue with the trainable ternary spike is that the multiplication awa \cdot w (when o=1ao=1 \cdot a) or aw-a \cdot w (when o=1ao=-1 \cdot a) would reintroduce actual multiplications, thus losing the multiplication-free advantage during inference. To circumvent this, the paper adopts a training-inference decoupled technique via re-parameterization. This technique allows the learned aa factor to be folded into the network's weights, effectively restoring the multiplication-free inference.

The re-parameterization technique is illustrated using a convolution layer: Let F\mathbf{F} be the input feature map and G\mathbf{G} be the output feature map of a convolution layer. The convolution operation is: $ \mathbf{G} = \mathbf{K} * \mathbf{F} $ Where:

  • G\mathbf{G}: Output feature map.

  • K\mathbf{K}: Convolution kernel tensor (weights).

  • F\mathbf{F}: Input feature map.

  • ()(*): Convolution operation.

    In a trainable ternary SNN, the input feature map F\mathbf{F} consists of real-valued spikes scaled by the trainable factor aa. So, F\mathbf{F} can be expressed as: $ \mathbf{F} = \boldsymbol{a} \cdot \mathbf{B} $ Where:

  • a\boldsymbol{a}: A tensor representing the layer-wise trainable factor aa that scales the spikes. For a given layer, this a\boldsymbol{a} would effectively be a scalar aa multiplied across all spatial dimensions of the input feature map.

  • B\mathbf{B}: A tensor representing the normalized ternary spikes ({1,0,1}\{-1, 0, 1\}) before scaling by aa.

    During inference, the trainable factor a\boldsymbol{a} can be folded into the convolution kernels K\mathbf{K}. This transformation creates a new set of convolution kernels K~\tilde{\mathbf{K}} such that the network can operate with normalized ternary spikes (B\mathbf{B}) without changing the output feature map G\mathbf{G}: $ \mathbf{G} = \mathbf{K} * (\boldsymbol{a} \cdot \mathbf{B}) = (\boldsymbol{a} \cdot \mathbf{K}) * \mathbf{B} = \tilde{\mathbf{K}} * \mathbf{B} $ Where:

  • K~=aK\tilde{\mathbf{K}} = \boldsymbol{a} \cdot \mathbf{K}: The transformed convolution kernel tensor, where each original weight wijw_{ij} in K\mathbf{K} is scaled by the corresponding layer's learned factor aa, resulting in w~ij=awij\tilde{w}_{ij} = a \cdot w_{ij}.

    This re-parameterization allows the SNN to be trained with real-valued spikes (scaled by aa) for better learning and adaptation, but then converted into an equivalent network that only emits normalized ternary spikes ({1,0,1}\{-1, 0, 1\}) during inference. This ensures that the multiplication-free advantage and thus the energy efficiency of SNNs are fully retained in the deployment phase.

5. Experimental Setup

5.1. Datasets

The experiments were conducted on a diverse set of datasets, including both static image datasets and a dynamic event-based dataset, to thoroughly evaluate the proposed method's performance across different data modalities.

  • CIFAR-10 (Krizhevsky, Nair, and Hinton 2010):
    • Source: Canadian Institute for Advanced Research (CIFAR).
    • Scale: 60,000 32x32 color images in 10 classes, with 6,000 images per class. 50,000 images for training and 10,000 for testing.
    • Characteristics: Small-scale, low-resolution natural images, commonly used for image classification benchmarks.
    • Domain: Object recognition.
    • Purpose: To evaluate performance on a fundamental image classification task.
  • CIFAR-100 (Krizhevsky, Nair, and Hinton 2010):
    • Source: Canadian Institute for Advanced Research (CIFAR).
    • Scale: 60,000 32x32 color images in 100 classes, with 600 images per class. 50,000 for training and 10,000 for testing.
    • Characteristics: Similar to CIFAR-10 but with a larger number of classes and fewer images per class, making it a more challenging classification task. The classes are grouped into 20 superclasses.
    • Domain: Object recognition.
    • Purpose: To test the method's ability to handle more fine-grained classification problems.
  • ImageNet (Deng et al. 2009):
    • Source: Large-scale hierarchical image database.
    • Scale: Over 14 million images across more than 20,000 categories. The standard ImageNet-1k subset used for benchmarks contains 1.28 million training images and 50,000 validation images across 1,000 object categories.
    • Characteristics: Large-scale, high-resolution natural images, considered a standard and highly challenging benchmark for computer vision models.
    • Domain: Object recognition.
    • Purpose: To validate the method's scalability and effectiveness on complex, real-world image classification.
  • CIFAR10-DVS (Li et al. 2017):
    • Source: Event-stream dataset for object classification, recorded using a Dynamic Vision Sensor (DVS) camera.

    • Scale: Event data corresponding to the CIFAR-10 dataset, capturing asynchronous events (pixel brightness changes) rather than traditional frames.

    • Characteristics: A neuromorphic dataset that captures temporal dynamics of visual information. SNNs are particularly well-suited for this type of data due to their event-driven nature.

    • Domain: Event-based object classification.

    • Purpose: To demonstrate the method's performance on dynamic, biologically inspired data, where SNNs typically show stronger advantages.

      The choice of these datasets is effective for validating the method's performance as they cover a range of complexities and data types, from small static images to large-scale static images and dynamic event streams, allowing for a comprehensive evaluation of the proposed Ternary Spike and Trainable Ternary Spike models. The paper does not provide concrete examples of data samples.

5.2. Evaluation Metrics

The primary evaluation metric reported for classification tasks in this paper is Top-1 Accuracy. For energy efficiency analysis, the paper uses FLOPs, SOPs, and Sign operations, along with their associated energy costs.

5.2.1. Top-1 Accuracy

  • Conceptual Definition: Top-1 Accuracy is a common metric in multi-class classification, representing the proportion of correctly classified instances where the model's highest-confidence prediction matches the true label. It quantifies the overall correctness of the model's single best prediction.
  • Mathematical Formula: $ \text{Top-1 Accuracy} = \frac{\text{Number of Correct Top-1 Predictions}}{\text{Total Number of Samples}} $
  • Symbol Explanation:
    • Number of Correct Top-1 Predictions: The count of instances where the class predicted by the model with the highest probability (its top-1 prediction) is identical to the actual true class label.
    • Total Number of Samples: The total number of instances (e.g., images) in the dataset being evaluated.

5.2.2. Energy Estimation Metrics

The paper quantifies energy consumption using three types of operations:

  • FLOPs (Floating Point Operations):
    • Conceptual Definition: FLOPs measure the number of floating-point arithmetic operations (e.g., additions, multiplications involving real numbers) performed by a model. These are typically associated with layers that do not solely operate on binary or ternary spikes, such as the initial rate-encoding layer in SNNs which converts input pixels into spike trains.
    • Mathematical Formula: There isn't a single universal formula for FLOPs; it's calculated by summing the floating-point operations of each layer (e.g., for a convolution, it's approximately 2×Cin×Kh×Kw×Cout×Hout×Wout2 \times C_{\text{in}} \times K_h \times K_w \times C_{\text{out}} \times H_{\text{out}} \times W_{\text{out}} for multiplications and additions).
    • Symbol Explanation: Specific to the operations performed (e.g., multiplications, additions). The paper assigns an energy cost of 12.5pJ12.5 \mathrm{pJ} per FLOP.
  • SOPs (Synaptic Operations):
    • Conceptual Definition: SOPs represent the energy-efficient operations specific to SNNs, where multiplications of binary or ternary spikes by weights are replaced by additions or subtractions. This metric is used for layers that process spikes. The paper calculates SOPs as s×T×As \times T \times A, where ss is the mean sparsity, TT is the timestep, and AA is the addition number in ANN (presumably the number of synaptic computations if it were an ANN).
    • Mathematical Formula: $ \text{SOPs} = s \times T \times A $
    • Symbol Explanation:
      • ss: Mean sparsity of spikes (the proportion of non-zero spikes).
      • TT: Number of timesteps (simulation steps).
      • AA: The number of additions that would be performed in an equivalent ANN layer. The paper assigns an energy cost of 77fJ77 \mathrm{fJ} per SOP.
  • Sign Operations:
    • Conceptual Definition: This refers to the operation performed by the LIF neuron to determine if a spike should be fired (i.e., checking if the membrane potential crosses a threshold and assigning a sign). The number of Sign operations is generally much smaller than convolution operations.
    • Mathematical Formula: Not explicitly provided, but implicitly refers to the number of times a neuron's output state is determined based on its membrane potential crossing a threshold.
    • Symbol Explanation: Calculated based on the number of LIF neurons and timesteps. The paper assigns an energy cost of 3.7pJ3.7 \mathrm{pJ} per Sign operation.

5.3. Baselines

The paper compares its Ternary Spike and Trainable Ternary Spike methods against numerous state-of-the-art (SoTA) SNN models. These baselines represent leading approaches in both ANN-SNN conversion and direct SNN training, providing a comprehensive comparison of performance.

Common Architectures Used for Baselines:

  • VGG11/VGG16: Deep convolutional networks known for their simplicity and effectiveness in image classification.
  • ResNet18/ResNet19/ResNet20/ResNet34: Residual Networks, which use skip connections to enable the training of much deeper networks, are widely adopted as backbones in SNN research.
  • CIFARNet, PLIFNet, 7B-wideNet: Other specific SNN or ANN architectures used in prior works.

Key Baselines Mentioned:

  • ANN-SNN Conversion Methods:
    • SpikeNorm (Sengupta et al. 2019)
    • RMP (Han, Srinivasan, and Roy 2020)
  • Hybrid Training Methods:
    • Hybrid-Train (Rathi et al. 2020)
    • LTL (Yang et al. 2022)
  • Direct SNN Training Methods (most relevant for comparison):
    • TSSL-BP (Zhang and Li 2020)

    • TL (Wu et al. 2021b) / PTL (Wu et al. 2021c) - Tandem Learning methods

    • PLIF (Fang et al. 2021b) - Incorporates learnable membrane time constants.

    • DSR (Meng et al. 2022) - Differentiation on spike representation.

    • KDSNN (Xu et al. 2023) - Knowledge Distillation for SNNs.

    • Joint A-SNN (Guo et al. 2023b) - Joint training of ANNs and SNNs.

    • Diet-SNN (Rathi and Roy 2020) - Direct Input Encoding.

    • Dspike (Li et al. 2021b) - Differentiable spike for training.

    • STBP-tdBN (Zheng et al. 2021) - Spatio-temporal backpropagation with time-dependent Batch Normalization.

    • TET (Deng et al. 2022) - Temporal Efficient Training.

    • RecDis-SNN (Guo et al. 2022c) - Rectifying Membrane Potential Distribution.

    • RMP-Loss (Guo et al. 2023a) - Regularizing Membrane Potential Distribution.

    • IM-Loss (Guo et al. 2022a) - Information Maximization Loss.

    • Real Spike (Guo et al. 2022d) - Learning real-valued spikes.

    • MPBN (Guo et al. 2023c) - Membrane Potential Batch Normalization.

    • InfLoR-SNN (Guo et al. 2022b) - Reducing Information Loss.

    • SEW ResNet (Fang et al. 2021a) - Deep residual learning in SNNs.

    • OTTT (Xiao et al. 2022) - Online Training Through Time.

    • GLIF (Yao et al. 2022) - Gated Leaky Integrate-and-Fire Neuron.

      These baselines are representative because they cover a wide spectrum of recent advancements in SNN training, including approaches to handle information loss, optimize learning, and improve performance on various datasets and architectures. This allows the authors to position their work effectively within the current research landscape.

6. Results & Analysis

The experimental results demonstrate the effectiveness and efficiency of the proposed Ternary Spike and Trainable Ternary Spike methods across various datasets and network architectures.

6.1. Core Results Analysis

6.1.1. Ablation Study

The ablation study, conducted on the ImageNet dataset using ResNet18 and ResNet34 backbones with different timesteps, systematically evaluates the contribution of the ternary spike and trainable ternary spike components.

The following are the results from Table 1 of the original paper:

Architecture Method Time-step Accuracy
ResNet18 Binary spike 2 58.30%
ResNet18 Ternary spike 2 65.87%
ResNet18 Trainable ternary spike 2 66.40%
ResNet18 Binary spike 4 61.07%
ResNet18 Ternary spike 4 66.90%
ResNet18 Trainable ternary spike 4 67.68%
ResNet34 Binary spike 2 62.81%
ResNet34 Ternary spike 2 69.48%
ResNet34 Trainable ternary spike 2 69.51%
ResNet34 Binary spike 4 63.82%
ResNet34 Ternary spike 4 70.12%
ResNet34 Trainable ternary spike 4 70.74%

Analysis:

  • Impact of Ternary Spike: A significant performance boost is observed when moving from Binary spike to Ternary spike. For instance, with ResNet18 at 4 timesteps, accuracy jumps from 61.07% to 66.90% (a 5.83% absolute improvement). Similarly, for ResNet34 at 4 timesteps, accuracy increases from 63.82% to 70.12% (a 6.30% absolute improvement). This clearly validates the theoretical claim that increasing the information capacity via ternary spikes effectively mitigates information loss and improves task performance.
  • Impact of Trainable Ternary Spike: Further performance gains are achieved by incorporating the trainable ternary spike. For ResNet18 at 4 timesteps, accuracy rises from 66.90% to 67.68% (an additional 0.78% improvement). For ResNet34 at 4 timesteps, it moves from 70.12% to 70.74% (an additional 0.62% improvement). This demonstrates the benefit of allowing layer-wise spike amplitudes to adapt to different membrane potential distributions, further optimizing information representation.
  • Timestep Dependency: As expected in SNNs, increasing the number of timesteps generally leads to higher accuracy, as more temporal information can be processed. However, the ternary spike methods consistently show substantial improvements over binary spikes at both 2 and 4 timesteps, indicating their effectiveness regardless of the timestep budget.

6.1.2. Comparison with SoTA methods on CIFAR-10(100)

The paper compares its methods against numerous state-of-the-art SNNs on CIFAR-10 and CIFAR-100 datasets.

The following are the results from Table 2 of the original paper:

Dataset Method Type Architecture Timestep Accuracy
CIFAR-10 SpikeNorm (Sengupta et al. 2019) ANN2SNN VGG16 2500 91.55%
Hybrid-Train (Rathi et al. 2020) Hybrid training VGG16 200 92.02%
TSSL-BP (Zhang and Li 2020) SNN training CIFARNet 5 91.41%
TL (Wu et al. 2021b) Tandem Learning CIFARNet 8 89.04%
PTL (Wu et al. 2021c) Tandem Learning VGG11 16 91.24%
PLIF (Fang et al. 2021b) SNN training PLIFNet 8 93.50%
DSR (Meng et al. 2022) SNN training ResNet18 20 95.40%
KDSNN (Xu et al. 2023) SNN training ResNet18 4 93.41%
Joint A-SNN (Guo et al. 2023b) SNN training ResNet18 4 95.45%
Diet-SNN (Rathi and Roy 2020) SNN training ResNet20 10 92.54%
Dspike (Li et al. 2021b) SNN training ResNet20 2 93.13%
Dspike (Li et al. 2021b) SNN training ResNet20 4 93.66%
STBP-tdBN (Zheng et al. 2021) SNN training ResNet19 2 92.34%
STBP-tdBN (Zheng et al. 2021) SNN training ResNet19 4 92.92%
TET (Deng et al. 2022) SNN training ResNet19 2 94.16%
TET (Deng et al. 2022) SNN training ResNet19 4 94.44%
RecDis-SNN (Guo et al. 2022c) SNN training ResNet19 2 93.64%
RecDis-SNN (Guo et al. 2022c) SNN training ResNet19 4 95.53%
RMP-Loss (Guo et al. 2023a) SNN training ResNet19 2 95.31%
RMP-Loss (Guo et al. 2023a) SNN training ResNet19 4 95.51%
RMP (Han, Srinivasan, and Roy 2020) ANN2SNN ResNet20 2048 94.96%±0.10
Ternary Spike SNN training ResNet20 1 91.89%
ResNet19 2 95.60% ±0.09
ResNet20 2 94.29%±0.08
Trainable Ternary Spike SNN training ResNet19 1 95.58%±0.08
ResNet19 2 95.80%±0.10
ResNet20 2 94.48%±0.09
ResNet20 4 94.96%±0.10
CIFAR-100 Real Spike (Guo et al. 2022d) SNN training ResNet20 4 67.82%
LTL (Yang et al. 2022) Tandem Learning ResNet20 5 66.60%
Diet-SNN (Rathi and Roy 2020) SNN training ResNet20 31 76.08%
RecDis-SNN (Guo et al. 2022c) SNN training ResNet19 4 74.10%
Dspike (Li et al. 2021b) SNN training ResNet20 2 71.68%
TET (Deng et al. 2022) SNN training ResNet19 2 72.87%
TET (Deng et al. 2022) SNN training ResNet19 4 73.35%
Ternary Spike SNN training ResNet19 1 78.13%±0.11
ResNet19 2 79.66%±0.08
ResNet20 2 73.00%±0.08
Trainable Ternary Spike SNN training ResNet19 1 78.45% ±0.08
ResNet19 2 80.20%±0.10
ResNet20 2 73.41%±0.12
ResNet20 4 74.02%±0.08

Analysis:

  • CIFAR-10:
    • The Trainable Ternary Spike achieves a remarkable 95.80% ±0.10 accuracy with ResNet19 at only 2 timesteps. This surpasses previous state-of-the-art results like RecDis-SNN (95.53% at 4 timesteps) and RMP-Loss (95.51% at 4 timesteps) with fewer timesteps.
    • Even with 1 timestep, Trainable Ternary Spike reaches 95.58% ±0.08 (ResNet19), which is competitive with or better than many methods using 2-4 timesteps.
    • The Ternary Spike (non-trainable) also shows strong performance, e.g., 95.60% ±0.09 (ResNet19, 2 timesteps).
    • Compared to ANN-SNN conversion methods like SpikeNorm (91.55% at 2500 timesteps) or RMP (94.96% at 2048 timesteps), the proposed methods achieve higher accuracy with drastically fewer timesteps, highlighting superior efficiency.
  • CIFAR-100:
    • On the more challenging CIFAR-100 dataset, Trainable Ternary Spike achieves 80.20% ±0.10 with ResNet19 at 2 timesteps, which is a significant improvement over other methods. For example, TET achieves 73.35% at 4 timesteps, and RecDis-SNN achieves 74.10% at 4 timesteps.
    • Ternary Spike also performs very well, reaching 79.66% ±0.08 with ResNet19 at 2 timesteps.
    • This demonstrates that the proposed methods are effective not only for simpler datasets but also for more complex classification tasks.

6.1.3. Comparison with SoTA methods on ImageNet

ImageNet is a much larger and more complex dataset. The ability to perform well here is crucial for validating the scalability of the proposed methods.

The following are the results from Table 3 of the original paper:

Method Type Architecture Timestep Accuracy
STBP-tdBN (Zheng et al. 2021) SNN training ResNet34 6 63.72%
TET (Deng et al. 2022) SNN training ResNet34 6 64.79%
RecDis-SNN (Guo et al. 2022c) SNN training ResNet34 6 67.33%
OTTT (Xiao et al. 2022) SNN training ResNet34 6 65.15%
GLIF (Yao et al. 2022) SNN training ResNet34 4 67.52%
DSR (Meng et al. 2022) SNN training ResNet18 50 67.74%
IM-Loss (Guo et al. 2022a) SNN training ResNet18 6 67.43%
Real Spike (Guo et al. 2022d) SNN training ResNet18 4 63.68%
Real Spike (Guo et al. 2022d) SNN training ResNet34 4 67.69%
RMP-Loss (Guo et al. 2023a) SNN training ResNet18 4 63.03%
RMP-Loss (Guo et al. 2023a) SNN training ResNet34 4 65.17%
MPBN (Guo et al. 2023c) SNN training ResNet18 4 63.14%
MPBN (Guo et al. 2023c) SNN training ResNet34 4 64.71%
InfLoR-SNN (Guo et al. 2022b) SNN training ResNet18 4 64.78%
InfLoR-SNN (Guo et al. 2022b) SNN training ResNet34 4 65.54%
SEW ResNet (Fang et al. 2021a) SNN training ResNet18 4 63.18%
SEW ResNet (Fang et al. 2021a) SNN training ResNet34 4 67.04%
Ternary Spike SNN training ResNet18 4 66.90%±0.19
Ternary Spike SNN training ResNet34 4 70.12%±0.15
Trainable Ternary Spike SNN training ResNet18 4 67.68%±0.13
Trainable Ternary Spike SNN training ResNet34 4 70.74%±0.11

Analysis:

  • The Trainable Ternary Spike achieves 70.74% ±0.11 top-1 accuracy on ImageNet using ResNet34 with only 4 timesteps. This is a significant improvement of approximately 3% compared to other state-of-the-art SNN models, such as RecDis-SNN (67.33% at 6 timesteps), GLIF (67.52% at 4 timesteps), and Real Spike (67.69% at 4 timesteps).
  • Even the Ternary Spike (non-trainable) shows strong performance, reaching 70.12% ±0.15 with ResNet34 at 4 timesteps.
  • For ResNet18, Trainable Ternary Spike achieves 67.68% ±0.13 at 4 timesteps, outperforming many baselines, though DSR (67.74%) marginally surpasses it with 50 timesteps. However, DSR requires a much higher timestep count (50 vs. 4), making the Trainable Ternary Spike significantly more efficient in terms of latency and energy per inference.
  • These results highlight the scalability and robustness of the ternary spike approach for handling large-scale and complex image recognition tasks, indicating its potential for broader real-world applications.

6.1.4. Comparison with SoTA methods on CIFAR10-DVS

This dataset is critical for evaluating SNNs because it is event-based and dynamic, directly benefiting from the temporal processing capabilities of SNNs.

The following are the results from Table 4 of the original paper:

Method Type Architecture Timestep Accuracy
DSR (Meng et al. 2022) SNN training VGG11 20 77.27%
GLIF (Yao et al. 2022) SNN training 7B-wideNet 16 78.10%
STBP-tdBN (Zheng et al. 2021) SNN training ResNet19 10 67.80%
RecDis-SNN (Guo et al. 2022c) SNN training ResNet19 10 72.42%
Real Spike (Guo et al. 2022d) SNN training ResNet19 10 72.85%
Ternary Spike SNN training ResNet19 10 78.40%±0.21
Ternary Spike SNN training ResNet20 10 78.70%±0.17
Trainable Ternary Spike SNN training ResNet19 10 79.80%±0.16
Trainable Ternary Spike SNN training ResNet20 10 79.80%±0.19

Analysis:

  • On the CIFAR10-DVS dataset, both Ternary Spike and Trainable Ternary Spike achieve significantly higher accuracy compared to baselines.
  • The Trainable Ternary Spike reaches 79.80% ±0.16 (ResNet19, 10 timesteps) and 79.80% ±0.19 (ResNet20, 10 timesteps), approaching 80% accuracy. This is a substantial improvement over methods like DSR (77.27% at 20 timesteps), GLIF (78.10% at 16 timesteps), and Real Spike (72.85% at 10 timesteps).
  • The strong performance on this neuromorphic dataset further validates the effectiveness of the proposed methods in handling dynamic, event-based data, which is a key application area for SNNs.

6.1.5. Energy Estimation

The paper provides an energy cost comparison between Binary Spike and Ternary Spike for ResNet20 on CIFAR10 with 2 timesteps.

The following are the results from Table 5 of the original paper:

Method #Flops #Sops #Sign Energy
Binary Spike 3.54M 71.20M 0.11M 50.14uJ
Ternary Spike 3.54M 79.21M 0.23M 51.20uJ

Analysis:

  • FLOPs (Floating Point Operations): Both Binary Spike and Ternary Spike models have the same number of FLOPs (3.54M), indicating that the initial rate-encoding layer (which involves non-spike operations) remains consistent.
  • SOPs (Synaptic Operations): The Ternary Spike model has a slightly higher number of SOPs (79.21M) compared to Binary Spike (71.20M). This is attributed to the ternary spike having a higher sparsity (18.27%) than the binary spike (16.42%), meaning more non-zero spikes (1 or -1) are fired, leading to more active synaptic operations.
  • Sign Operations: The Ternary Spike model has roughly double the number of Sign operations (0.23M) compared to Binary Spike (0.11M). This is expected because the ternary neuron checks for two thresholds (VthV_{\mathrm{th}} and Vth-V_{\mathrm{th}}) instead of just one (VthV_{\mathrm{th}}).
  • Total Energy: Despite the increases in SOPs and Sign operations, the total energy consumption for Ternary Spike (51.20uJ) is only marginally higher than Binary Spike (50.14uJ). This represents an increase of approximately 2.11% (calculated as (51.2050.14)/50.140.0211(51.20 - 50.14) / 50.14 \approx 0.0211).
  • Conclusion: The energy estimation demonstrates that the substantial accuracy gains achieved by the Ternary Spike (as shown in ablation studies and comparisons) come with a very minimal increase in energy consumption. This indicates that the proposed method is highly energy-efficient and maintains the core low-power advantage of SNNs while significantly boosting performance.

6.2. Ablation Studies / Parameter Analysis

The ablation study presented in Table 1 directly addresses the effectiveness of the model's components:

  • Effectiveness of Ternary Spike: The first step in the ablation shows a consistent and large improvement when switching from Binary spike to Ternary spike. This validates the core hypothesis that increasing the spike's information cardinality (from 2 states to 3 states) significantly enhances the SNN's representation capability and reduces information loss. The 5-6% accuracy jump is a strong indicator.

  • Effectiveness of Trainable Ternary Spike: The second step shows that adding the trainable factor for Trainable Ternary Spike yields further, albeit smaller, improvements (around 0.6-0.8%). This confirms the authors' premise that membrane potential distributions vary across layers, and allowing for layer-wise adaptive spike amplitudes is beneficial for fine-tuning performance.

  • Hyper-parameter timesteps: While not an explicit ablation, the results across 2 and 4 timesteps show that the performance of all methods generally increases with more timesteps. However, the Ternary Spike variants consistently maintain their superior performance over Binary spike at both timestep settings, suggesting that the benefits of ternary spikes are not heavily dependent on a specific timestep count but rather provide a fundamental improvement in information encoding. The ability to achieve high accuracy with very few timesteps (e.g., 1 or 2 on CIFAR datasets) is a key advantage for real-time and low-power applications.

    In summary, the ablation study clearly validates the design choices of the proposed method, demonstrating that both the move to ternary spikes and the introduction of trainable amplitudes contribute positively to the overall performance of SNNs.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper rigorously demonstrates a fundamental limitation of traditional Spiking Neural Networks (SNNs): their binary spike activation maps possess insufficient information capacity, leading to information loss and compromised accuracy. To address this, the authors successfully propose a novel ternary spike neuron that utilizes {1,0,1}\{-1, 0, 1\} values. This innovation is theoretically proven to significantly increase information capacity while crucially preserving the energy-efficient event-driven and multiplication-free (addition/subtraction-based) characteristics of SNNs.

Furthermore, recognizing the varying membrane potential distributions across different network layers, the paper introduces a trainable ternary spike neuron. This model embeds a learnable factor (α\alpha) to enable layer-wise adaptive spike amplitudes ({α,0,α}\{- \alpha, 0, \alpha\}), allowing SNNs to better model neuronal activity. For inference, a clever re-parameterization technique is employed to fold these learned factors into the network weights, converting the model back to a standard ternary spike SNN and thereby maintaining computational efficiency.

Extensive experiments on diverse static (CIFAR10, CIFAR100, ImageNet) and dynamic (CIFAR10-DVS) datasets, using popular architectures, consistently show that both Ternary Spike and Trainable Ternary Spike models achieve state-of-the-art performance with high energy efficiency, often surpassing prior methods by a considerable margin.

7.2. Limitations & Future Work

While the paper presents a highly effective solution, it implicitly points to certain limitations and opens avenues for future work:

  • Increased Energy Consumption (minor): Although the ternary spike maintains the multiplication-free advantage, the energy estimation shows a slight increase (2.11%) in total energy consumption due to higher SOPs (more active spikes) and double the Sign operations (checking two thresholds). For extremely energy-constrained neuromorphic applications, even this small increase might be a consideration, prompting further optimization of ternary spike hardware implementations.
  • Complexity of Training alpha: While the re-parameterization handles inference efficiency, the trainable factor α\alpha introduces an additional learning parameter per layer during training. The paper does not delve into the sensitivity of the training process to this factor or alternative methods for learning optimal amplitudes (e.g., through more complex optimization landscapes).
  • Beyond Ternary Spikes: The paper establishes the benefit of moving from binary to ternary. This naturally raises the question of whether higher-order spiking (e.g., quaternary, quinary, or even small integer-valued spikes) could yield further benefits, and at what point the energy efficiency advantages of SNNs would be significantly compromised. The boundary between a "spike" (discrete, efficient) and a "quantized activation" (continuous-like, less efficient) becomes blurred.
  • Theoretical bounds of information capacity: While the paper provides a good theoretical analysis using log N, further exploration into the effective information carried in a real SNN (which has non-uniform distributions and temporal dependencies) could be a complex but fruitful research direction.
  • Hardware Implementation Details: The paper discusses the theoretical maintenance of efficiency but does not detail specific hardware considerations for implementing ternary spikes (e.g., how the {1,0,1}\{-1, 0, 1\} signals are physically transmitted and processed on neuromorphic chips compared to {0,1}\{0, 1\}).

7.3. Personal Insights & Critique

This paper offers a compelling and intuitively sound solution to a fundamental problem in SNNs. The core insight that binary spikes are inherently information-starved is well-supported both theoretically and experimentally. The move to ternary spikes is a logical next step in enhancing representation capability without abandoning the efficiency paradigm.

Inspirations & Applications:

  • Bridging the SNN-ANN Gap: The substantial accuracy improvements achieved, especially on large-scale datasets like ImageNet, are crucial for making SNNs competitive with ANNs. This work paves the way for SNNs to be deployed in more complex, real-world applications where high accuracy is paramount.
  • Biologically Plausible Learning: The idea of learnable spike amplitudes is fascinating. Biological neurons exhibit complex firing behaviors and signal modulation. Allowing SNN layers to adapt their spike magnitudes might be a step towards more biologically realistic and efficient learning mechanisms beyond simple binary communication.
  • Generalizable Paradigm: The ternary spike and trainable factor concepts appear highly generalizable. They are not tied to a specific SNN architecture or dataset type, suggesting they could be applied across various SNN models and tasks.
  • Future of Low-Power AI: As AI demands grow, the need for energy-efficient hardware and algorithms becomes critical. This paper demonstrates that significant performance gains in SNNs can be achieved with only marginal energy cost increases, reinforcing the potential of SNNs for sustainable AI.

Potential Issues & Areas for Improvement:

  • Optimality of Ternary: While ternary is better than binary, why {1,0,1}\{-1, 0, 1\} specifically? Could other ternary sets (e.g., {C,0,C}\{-C, 0, C\} for some non-unity CC) or even a small set of learned discrete spike values (beyond just scaling {1,0,1}\{-1, 0, 1\}) offer further advantages? The choice of {1,0,1}\{-1, 0, 1\} is elegant because it maintains the addition/subtraction benefit, but exploring broader discrete sets with careful hardware considerations could be interesting.

  • The Reset Mechanism: The use of (1ot1)(1 - |o^{t-1}|) for the reset term is clever. It ensures reset for both positive and negative spikes. However, the precise biological interpretation of a "negative spike" and its reset mechanism might warrant further discussion, especially for truly bio-inspired SNNs.

  • Comparison Fairness (Timesteps): While the paper's results are very strong, some comparisons with baselines that use significantly higher timesteps (e.g., DSR on ImageNet with 50 timesteps vs. 4 timesteps for Ternary Spike) are technically valid but emphasize the efficiency of the proposed method. However, for a pure accuracy comparison, it's worth noting the timestep disparity. The paper does a good job of consistently using low timesteps for its methods.

  • Robustness to Noise: The paper does not specifically discuss the robustness of ternary spikes to noise, especially on neuromorphic hardware, which can be prone to variability. A ternary signal might be slightly more susceptible to noise than a purely binary (on/off) signal if the distinction between -1, 0, and 1 becomes blurred.

    Overall, this work is a substantial contribution to SNN research, offering a practical and theoretically grounded approach to enhancing SNN performance while staying true to their energy-efficient principles. It is likely to inspire further research into alternative spike representations and adaptive SNN learning mechanisms.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.