Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks
TL;DR Summary
This study introduces a ternary spike neuron to address the limited information capacity of binary spikes in spiking neural networks. Utilizing values of {-1, 0, 1} enhances information capacity while maintaining event-driven, multiplication-free advantages, with experiments show
Abstract
The Spiking Neural Network (SNN), as one of the biologically inspired neural network infrastructures, has drawn increasing attention recently. It adopts binary spike activations to transmit information, thus the multiplications of activations and weights can be substituted by additions, which brings high energy efficiency. However, in the paper, we theoretically and experimentally prove that the binary spike activation map cannot carry enough information, thus causing information loss and resulting in accuracy decreasing. To handle the problem, we propose a ternary spike neuron to transmit information. The ternary spike neuron can also enjoy the event-driven and multiplication-free operation advantages of the binary spike neuron but will boost the information capacity. Furthermore, we also embed a trainable factor in the ternary spike neuron to learn the suitable spike amplitude, thus our SNN will adopt different spike amplitudes along layers, which can better suit the phenomenon that the membrane potential distributions are different along layers. To retain the efficiency of the vanilla ternary spike, the trainable ternary spike SNN will be converted to a standard one again via a re-parameterization technique in the inference. Extensive experiments with several popular network structures over static and dynamic datasets show that the ternary spike can consistently outperform state-of-the-art methods. Our code is open-sourced at https://github.com/yfguo91/Ternary-Spike.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks
1.2. Authors
The authors are Yufei Guo, Yuanpei Chen, Xiaode Liu, Weihang Peng, Yuhan Zhang, Xuhui Huang, and Zhe Ma. Their affiliations are with the Intelligent Science & Technology Academy of CASIC, China.
1.3. Journal/Conference
This paper was published on arXiv, which is a preprint server, meaning it has not necessarily undergone peer review by a journal or conference. arXiv is widely used by researchers to rapidly disseminate their work before formal publication.
1.4. Publication Year
The paper was published on December 11, 2023, at 13:28:54 UTC.
1.5. Abstract
The paper addresses a significant challenge in Spiking Neural Networks (SNNs), which are brain-inspired models known for their energy efficiency due to binary spike activations. The authors theoretically and experimentally demonstrate that these binary spikes have limited information capacity, leading to information loss and reduced accuracy. To counteract this, they propose a ternary spike neuron which utilizes spike values instead of the traditional . This ternary spike neuron is shown to significantly boost information capacity while retaining the crucial event-driven and multiplication-free (addition-based) advantages of binary SNNs. Furthermore, the paper introduces a trainable factor within the ternary spike neuron to allow different layers to learn suitable, distinct spike amplitudes (i.e., ). This adaptive approach better accommodates the varying membrane potential distributions observed across different network layers. To maintain efficiency during inference, the trainable ternary spike SNN is converted back to a standard ternary spike SNN through a re-parameterization technique. Extensive experiments across various network architectures and both static (CIFAR10, CIFAR100, ImageNet) and dynamic (CIFAR10-DVS) datasets demonstrate that the ternary spike consistently achieves state-of-the-art performance. The authors have open-sourced their code.
1.6. Original Source Link
https://arxiv.org/abs/2312.06372v2 (Publication Status: Preprint)
1.7. PDF Link
https://arxiv.org/pdf/2312.06372v2.pdf
2. Executive Summary
2.1. Background & Motivation
Spiking Neural Networks (SNNs) represent a promising "next-generation" neural network architecture, drawing inspiration from the biological brain's energy-efficient information processing. Unlike traditional Artificial Neural Networks (ANNs) that use continuous, real-valued activations, SNNs communicate through discrete, sparse binary spikes (typically 0 or 1). This event-driven, binary nature allows SNNs to replace computationally expensive multiplications with simpler additions, leading to significant energy savings, especially on specialized neuromorphic hardware.
However, the core problem the paper identifies is that this very advantage—the use of binary spike activations—becomes a bottleneck for performance. The binary spike activation maps (i.e., the patterns of 0s and 1s transmitted between neurons) are shown to have a severely limited information capacity. This limitation means they cannot adequately capture and transmit all the necessary information derived from the continuous membrane potentials (internal electrical states) of neurons, leading to information loss. This loss ultimately results in decreased task accuracy when SNNs are applied to complex problems like image recognition, hindering their widespread adoption compared to ANNs.
Furthermore, the paper highlights another overlooked issue: the membrane potential distributions (the range and frequency of internal electrical states) vary significantly across different layers of an SNN. Prior work typically quantizes these diverse membrane potentials into the same fixed spike values (e.g., always 0 or 1), which the authors argue is "unnatural" and suboptimal.
The paper's entry point and innovative idea is to address these two problems by:
- Expanding the spike representation: Moving beyond binary spikes to
ternary spikes() to directly increase information capacity. - Introducing adaptability: Allowing the
spike amplitudeto be learned on a layer-wise basis, thereby adapting to the unique membrane potential distributions of each layer.
2.2. Main Contributions / Findings
The paper makes several significant contributions to the field of Spiking Neural Networks:
- Theoretical and Experimental Proof of Binary Spike Limitation: The authors rigorously prove, both theoretically using information entropy and through experimental analysis, that
binary spike activation mapsfundamentally lack sufficient information capacity. This limited capacity causes significantinformation lossduring the quantization ofmembrane potentials, directly contributing to accuracy degradation in SNNs. - Proposal of the Ternary Spike Neuron: To address the
information lossproblem, the paper introduces a novelternary spike neuron. This neuron transmits information using spikes. Crucially, it is demonstrated that thisternary spikesignificantly boosts theinformation capacitycompared to binary spikes while fully retaining the key advantages of SNNs:event-driven computation(neurons only activate when membrane potential crosses a threshold) andmultiplication-free operations(multiplications can be replaced by additions or subtractions). This represents a new paradigm for spike neurons. - Development of a Trainable Ternary Spike Neuron: Recognizing that
membrane potential distributionsdiffer across layers, the authors extend the concept to alearnable ternary spikeneuron. This version allows thespike magnitude(represented by a factor ) to be learned during the training phase, resulting in layer-wise spike amplitudes of . This adaptive approach enables the SNN to better suit the unique characteristics of each layer. For efficient inference, are-parameterization techniqueis employed to fold the learned factor into the network weights, converting thetrainable ternary spikesback to standardternary spikes(effectively after weight scaling) and thus preserving theaddition-mostlycomputational advantage. - Extensive Experimental Validation: The proposed methods (
Ternary SpikeandTrainable Ternary Spike) are thoroughly evaluated on a wide range of popular network architectures (e.g., ResNet20, ResNet19, ResNet18, ResNet34) and diverse datasets, including static image datasets (CIFAR10, CIFAR100, ImageNet) and a dynamic event-based dataset (CIFAR10-DVS). The experimental results consistently demonstrate that theTernary Spikesignificantly outperforms state-of-the-art SNN models, often achieving substantial accuracy improvements (e.g., ~3% higher top-1 accuracy on ImageNet with ResNet34 and 4 timesteps) while maintaining high energy efficiency.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand the paper, a foundational grasp of neural networks, information theory, and the specifics of Spiking Neural Networks is essential.
3.1.1. Artificial Neural Networks (ANNs)
Artificial Neural Networks (ANNs) are computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers. Each connection has a weight, and each neuron applies an activation function to the weighted sum of its inputs to produce an output. ANNs have achieved remarkable success in various fields, but they typically use continuous, real-valued activations and perform numerous multiplications, leading to high computational and energy costs, especially for deep and complex models.
3.1.2. Spiking Neural Networks (SNNs)
Spiking Neural Networks (SNNs) are considered the third generation of neural networks, designed to mimic the brain's event-driven, sparse, and asynchronous communication more closely than ANNs.
- Binary Spike Activations: Unlike ANNs, SNN neurons communicate by firing discrete, binary electrical impulses called
spikes(typically represented as 0 for no spike and 1 for a spike). - Event-Driven Computation: Computation in SNNs is
event-driven, meaning that neurons only process information and consume energy when a spike occurs. If a neuron does not fire, it remains "silent," saving energy. - Multiplication-Free Operations: Because activations are binary (0 or 1), the multiplication of an activation by a weight () can be simplified to an addition () if a spike occurs, or no operation if no spike occurs (). This replacement of multiplications with additions is a primary source of SNNs' high energy efficiency.
- Neuromorphic Hardware: SNNs are particularly well-suited for specialized
neuromorphic hardware, which is designed to process spike-based information efficiently, further enhancing energy savings.
3.1.3. Information Entropy
Information entropy, a concept from information theory, measures the average level of "surprise," "uncertainty," or "information content" in a random variable. In the context of data representation, a higher entropy value indicates that a system can represent a wider variety of states or carry more information.
- Representation Capability: The paper uses
representation capability, , to quantify how much information a set of samples can encode. It is directly linked to the maximum information entropy, . - Formula for Information Entropy: For a discrete random variable with possible outcomes and their respective probabilities , the entropy is defined as:
$
\mathcal{H}(S) = - \sum_{i=1}^{N} p_S(s_i) \log_2 p_S(s_i)
$
- Symbol Explanation:
- : The information entropy of the random variable .
- : The total number of distinct possible outcomes (samples) for .
- : The -th distinct outcome (sample) of .
- : The probability of the -th outcome occurring.
- : The logarithm base 2, implying that entropy is measured in bits.
- Symbol Explanation:
- Maximizing Entropy: Entropy is maximized when all outcomes are equally probable. For equally probable outcomes, for all , and the maximum entropy becomes . The paper states this as Proposition 1.
3.1.4. Leaky-Integrate-and-Fire (LIF) Neuron Model
The Leaky-Integrate-and-Fire (LIF) neuron model is a widely used, biologically plausible model for SNNs. It simplifies the complex dynamics of biological neurons into a set of differential or iterative equations.
- Mechanism:
- Integrate: The neuron accumulates incoming synaptic currents, which increase its internal state called
membrane potential(). - Leak: Over time, if no inputs are received, the membrane potential
leaks(decays) back towards its resting state. - Fire: If the membrane potential reaches a predefined
firing threshold(), the neuronfiresa spike (outputs 1). - Reset: After firing, the membrane potential is
reset(typically to 0 or a resting potential), making the neuron temporarily unable to fire again immediately.
- Integrate: The neuron accumulates incoming synaptic currents, which increase its internal state called
- Iterative Model (Equations 1-3 from paper):
The standard iterative
LIFmodel is governed by: $ u^{\mathrm{before}, t} = (1 - \frac{dt}{\tau}) u^{t-1} + \frac{dt}{\tau} R \cdot I^t $ $ u^t = \left{ \begin{array}{ll} {0, \mathrm{if~} u^t \ge V_{\mathrm{th}}} \ {u^{\mathrm{before},t}, \mathrm{otherwise}} \end{array} \right. $ $ o^t = \left{ \begin{array}{ll} {1, \mathrm{~if~} u^{\mathrm{before},t} \ge V_{\mathrm{th}}} \ {0, \mathrm{~otherwise}} \end{array} \right. $- Symbol Explanation:
- : Membrane potential before firing at timestep .
- : Membrane potential after firing/reset at timestep .
- : Output spike at timestep .
- : Membrane potential at the previous timestep
t-1. dt: Time constant for the simulation step.- : Membrane time constant, influencing the leak rate.
- : Charging voltage from input current and resistance .
- : Firing threshold. The paper simplifies the input voltage to , representing the weighted sum of spikes from the previous layer. The term is also simplified to (a different constant, usually also denoted as ). This leads to the simplified iterative model used in the paper: $ u^t = \tau u^{t-1} (1 - o^{t-1}) + \sum_j w_j o_{j,\mathrm{pre}}^t $ $ o^t = \left{ \begin{array}{ll} {1,} & {\mathrm{if~} u^t \geq V_{\mathrm{th}}} \ {0,} & {\mathrm{otherwise}} \end{array} \right. $
- Symbol Explanation (simplified model):
- : Membrane potential at timestep .
- : Output spike (0 or 1) at timestep .
- : Leakage factor (simplified from ), a constant between 0 and 1.
- : Membrane potential at the previous timestep
t-1. - : This term ensures that if a spike was fired at
t-1(), the membrane potential is reset to 0 (effectively ). If no spike was fired (), the potential leaks (effectively ). - : Weight connecting the -th neuron of the previous layer to the current neuron.
- : Binary spike output from the -th neuron of the previous layer at timestep .
- : Firing threshold.
- Symbol Explanation:
3.2. Previous Works
The paper organizes related work into Learning Methods of Spiking Neural Networks and Information Loss in Spiking Neural Networks.
3.2.1. Learning Methods of Spiking Neural Networks
- ANN-SNN Conversion: This common approach involves training a traditional ANN first and then converting its parameters to an SNN.
- Principle: It maps ANN activations to SNN average firing rates.
- Advantages: Often simpler to implement and can achieve high accuracy (approaching the ANN's performance) because ANN training is more established.
- Deficiencies (as pointed out by the paper):
- Rate-coding Limitation: Primarily relies on
rate-coding(information encoded in firing rates), ignoring the richertemporal dynamic behaviorsof SNNs, making it less suitable forneuromorphic datasets(event-based data like DVS cameras). - High Timesteps: Usually requires a large number of
timesteps(simulation steps) to match ANN accuracy, which increases energy consumption and contradicts SNNs' low-power goal. - Accuracy Ceiling: SNN accuracy cannot exceed the original ANN accuracy, limiting its potential.
- Rate-coding Limitation: Primarily relies on
- Examples:
SpikeNorm(Sengupta et al. 2019),RMP(Han, Srinivasan, and Roy 2020),SpikeConverter(Liu et al. 2022).
- Direct Training from Scratch: This method involves training SNNs directly using
backpropagationwithsurrogate gradients(a technique to approximate the non-differentiable spiking function).- Advantages: Better suited for
neuromorphic datasetsand can achieve high performance with very few timesteps (sometimes <5), leading to higher energy efficiency. - Examples:
STBP-tdBN(Zheng et al. 2021),TET(Deng et al. 2022),Dspike(Li et al. 2021b),SEW ResNet(Fang et al. 2021a).
- Advantages: Better suited for
- Hybrid Learning: Combines elements of both ANN-SNN conversion and direct training.
- Examples:
Hybrid-Train(Rathi and Roy 2020).
- Examples:
- Paper's Focus: This paper focuses on improving the performance of
directly training-based SNNsby addressing theinformation lossproblem.
3.2.2. Information Loss in Spiking Neural Networks
The paper highlights a specific area of research dedicated to mitigating information loss in SNNs, an area it aims to advance.
- InfLoR-SNN (Guo et al. 2022b): Proposed a
membrane potential rectifierto adjust membrane potentials closer to quantization spikes, reducingquantization error. - RMP-Loss (Guo et al. 2023a): Used a loss function to adjust membrane potentials to reduce
quantization error, similar in idea toInfLoR-SNN. - IM-Loss (Guo et al. 2022a): Argued that
information entropyof activations could be maximized to reduce information loss, proposing aninformation maximization loss function. - RecDis-SNN (Guo et al. 2022c): Introduced a loss to penalize undesired shifts in
membrane potential distributions, aiming for abimodal distributionwhich helps mitigate information loss. - MT-SNN (Wang, Zhang, and Zhang 2023): Proposed a
multiple threshold (MT) algorithmforLIFneurons to partially recover information lost during quantization. - Common Limitation of Prior Work: The paper points out that all these prior works, despite their efforts to reduce information loss, still quantize membrane potentials to binary spikes (0 or 1) and ignore the differences in membrane potential distributions along layers.
- Alternative Ternary Spike: The paper also briefly mentions another
ternary spike neuronproposed in (Sun et al. 2022) which uses spikes. The key distinction is that this alternative cannot benefit from themultiplication-addition transform(as is still a multiplication), losing a core energy efficiency advantage of SNNs. This paper's proposed ternary spike using does retain this advantage.
3.3. Technological Evolution
The field of neural networks has evolved from traditional ANNs, which prioritize accuracy but consume significant power, towards more energy-efficient models. This evolution includes techniques like quantization (reducing the precision of weights/activations), pruning (removing redundant connections), and knowledge distillation (transferring knowledge from a large model to a smaller one). SNNs emerged as a distinct path, seeking biological plausibility and extreme energy efficiency by adopting spike-based communication. Early SNN research focused on biological modeling, then shifted to achieving competitive accuracy on benchmark tasks, often relying on ANN-SNN conversion. More recently, direct training methods have gained prominence for their ability to achieve high accuracy with fewer timesteps and better suitability for dynamic neuromorphic data. Within this direct training paradigm, a critical challenge has been information loss due to the binary nature of spikes. This paper's work represents an advancement within this lineage, pushing beyond binary spikes to ternary spikes and introducing layer-wise adaptability to further close the performance gap between SNNs and ANNs while maintaining efficiency.
3.4. Differentiation Analysis
Compared to the main methods in related work, this paper's approach offers several core differences and innovations:
-
Increased Information Capacity via Ternary Spikes: Unlike prior works that primarily focused on refining the binary spiking process (e.g., adjusting membrane potentials, thresholds, or loss functions to minimize error within binary quantization), this paper fundamentally alters the
spike representationitself by moving from binary toternary. This directly and theoretically boosts theinformation capacityper spike, a novel approach to addressing information loss. -
Preservation of SNN Advantages: Crucially, the proposed
ternary spikeis designed such that it still retains theevent-drivenandmultiplication-free(convertible to addition/subtraction) advantages that are fundamental to SNNs' energy efficiency. This differentiates it from other non-binary spiking schemes (like the scheme mentioned in related work) that might sacrifice the multiplication-free benefit. -
Layer-wise Adaptive Spike Amplitudes: The paper introduces a
learnable factorto createtrainable ternary spikesof magnitudes . This is a significant innovation because it directly addresses the observation thatmembrane potential distributionsvary across different layers. Previous works largely ignored this heterogeneity, applying a uniform quantization scheme. By allowing layer-specific spike amplitudes, the model can better adapt to and represent the unique firing characteristics of each layer. -
Re-parameterization for Efficiency: To ensure that the
learnable spike amplitudesdo not compromise inference efficiency (by reintroducing multiplications), the paper utilizes are-parameterization technique. This allows the learned values to be folded into the weights during inference, effectively converting the network back into a standardternary spike SNNthat enjoys multiplication-free operations. This training-inference decoupling maintains the benefits of adaptive learning without incurring runtime overhead.In essence, this paper proposes a more expressive and adaptive spiking mechanism that tackles information loss at its source (spike representation) and adapts to network heterogeneity, all while carefully preserving the inherent energy efficiency principles of SNNs.
4. Methodology
4.1. Principles
The core principle behind the proposed method is that the limited information capacity of binary spikes (0 or 1) is a primary bottleneck for the accuracy of Spiking Neural Networks (SNNs). By expanding the possible spike values from binary to ternary (i.e., ), the information carried by each spike is significantly increased. This directly mitigates information loss during the quantization of membrane potentials. Furthermore, the paper posits that a static, uniform quantization to the same spike values across all layers is suboptimal because membrane potential distributions differ substantially between layers. To address this, learnable spike amplitudes are introduced, allowing each layer to dynamically adjust its output spike magnitude, thus better adapting to its unique membrane potential characteristics. Crucially, these enhancements are designed to maintain the energy-efficient event-driven and multiplication-free (or addition-only) advantages that define SNNs.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Information Loss in Spiking Neural Networks
The paper begins by theoretically justifying the claim that binary spike activation maps suffer from limited information capacity, leading to information loss and decreased accuracy. This justification uses the concept of information entropy.
Theoretical Analysis using Information Entropy:
The representation capability, , of a set of samples can be measured by the maximum information entropy, . The general formula for is given by:
$ \mathcal{R}(\mathbf{S}) = \operatorname*{max} \mathcal{H}(\mathbf{S}) = \operatorname*{max} \left( - \sum_{s \in \mathbf{S}} p \mathbf{s}(s) \log p \mathbf{s}(s) \right) $ Where:
-
: The representation capability of the set .
-
: The information entropy of the set .
-
: A sample from the set .
-
: The probability of a specific sample occurring from .
-
: The logarithm (implicitly base 2 for bits).
The paper then presents
Proposition 1, which states: Proposition 1: Whenp_S(s_1) = p_S(s_2) = p_S(s_3) \cdots = p_S(s_N), reaches its maximum, . Here, denotes the total number of distinct samples (or states) possible from .
Using this proposition, the authors calculate the representation capability for both binary spike feature maps and real-valued membrane potential maps.
- Binary Spike Feature Map: Let denote a binary feature map. Each individual binary spike output can be one of 2 values (), representing 1 bit of information. For a feature map of size (Channels Height Width), the total number of distinct samples (possible states) is . Therefore, its representation capability is: $ \mathcal{R}(\mathbf{F}_B) = \log_2 2^{(C \times H \times W)} = C \times H \times W $
- Real-valued Membrane Potential Map: Let denote a real-valued membrane potential map. Assuming each real-valued potential requires 32 bits (standard floating-point precision), this corresponds to possible samples per potential. For a feature map of size , the total number of distinct samples is . Its representation capability is:
$
\mathcal{R}(\mathbf{M}_R) = \log_2 2^{32 \times (C \times H \times W)} = 32 \times C \times H \times W
$
The comparison clearly shows that is 32 times greater than . This significant difference theoretically demonstrates that quantizing
real-valued membrane potentialstobinary spikescauses excessiveinformation loss. The paper also notes that increasingtimestepsin SNNs, which is known to improve accuracy, implicitly increases therepresentation capabilityby accumulating spike information over time, thus aligning with this information theory.
4.2.2. Ternary Spike Neuron Model
To address the identified information loss while preserving SNN's energy efficiency, the paper proposes a ternary LIF spike neuron. This neuron extends the output spike values from binary to ternary .
The iterative model for the ternary LIF spike neuron is given by:
$
u^t = \tau u^{t-1} (1 - |o^{t-1}|) + \sum_j w_j o_{j,\mathrm{pre}}^t
$
$
o^t = \left{ \begin{array}{l} {1, \mathrm{~if~} u^t \geq V_{\mathrm{th}}} \ {-1, \mathrm{~if~} u^t \leq -V_{\mathrm{th}}} \ {0, \mathrm{~otherwise}} \end{array} \right.
$
Where:
- : Membrane potential at timestep .
- : Output spike (either -1, 0, or 1) at timestep .
- : Leakage factor (a constant between 0 and 1).
- : Membrane potential at the previous timestep
t-1. - : The absolute value of the output spike from the previous timestep. This term is crucial for the reset mechanism:
- If a spike was fired at
t-1(either or ), then . The term becomes0, effectively resetting the membrane potential to 0 before integrating new inputs. - If no spike was fired at
t-1(), then . The term becomes1, allowing the membrane potential to leak by .
- If a spike was fired at
- : Weight connecting the -th neuron of the previous layer to the current neuron.
- : Ternary spike output from the -th neuron of the previous layer at timestep .
- : Positive firing threshold.
- : Negative firing threshold. If the membrane potential drops below this, a negative spike is fired.
4.2.2.1. Representation Capacity Improvement of Ternary Spike Neuron
The authors argue that firing ternary spikes significantly increases the representation capacity of SNNs.
Let denote a ternary feature map, where . Each ternary spike output can be one of 3 values. Thus, for a feature map of size , the total number of distinct samples is . According to Proposition 1, its representation capability is:
$
\mathcal{R}(\mathbf{F}_T) = \log_2 3^{(C \times H \times W)} = (C \times H \times W) \log_2 3
$
Since , the ternary spike feature map has approximately 1.585 times the representation capacity of a binary spike feature map (where ). This theoretical improvement indicates enhanced information expressiveness and potential for better performance.
4.2.2.2. Event-driven and Addition-only Advantages Retaining
The paper emphasizes that the proposed ternary spike neuron retains the critical energy-efficiency advantages of vanilla SNNs:
- Event-driven Characteristic: Similar to binary SNNs, the
ternary spike neuronisevent-driven. It only activates and performs computations if itsmembrane potentialexceeds the positive threshold (firing a 1 spike) or drops below the negative threshold (firing a -1 spike). If the potential stays between and , it remains silent (fires a 0 spike), conserving energy. - Multiplication-addition Transform: The core energy-saving mechanism of SNNs is replacing weight-activation multiplications with additions/subtractions.
- For a binary spike neuron: If a spike is fired (), the operation is: $ x = 1 \times w $ This multiplication can be replaced by an addition: $ x = 0 + w $
- For a ternary spike neuron:
If a spike is fired ( or ), the operation is:
$
x = 1 \times w, \mathrm{or}, -1 \times w
$
These multiplications can also be replaced by additions or subtractions:
$
x = 0 + w, \mathrm{or, 0 - w}
$
This demonstrates that the proposed
ternary spike neuronsuccessfully enhances theexpression abilityof SNNs while fully retaining theevent-drivenandaddition-only(or addition/subtraction-only) advantages, which are key for energy efficiency.
4.2.3. Trainable Ternary Spike
The paper then addresses a second problem: the "unnatural" practice of quantizing membrane potentials from different layers to the same fixed spike values, despite membrane potential distributions varying significantly across layers. To demonstrate this, the authors provide empirical evidence in Figure 2 (images/2.jpg):
该图像是一个示意图,其中展示了不同情况下的分布直方图,包括八个子图(a)至(h),展示了信息传输过程中的信息容量变化。这些变化与三元脉冲神经元的性能相关。
The image 2.jpg presents a schematic showing distribution histograms related to information capacity. While the VLM description for this image is generic, the accompanying text in the paper in Section 4.3 specifically refers to "Fig. 2. Membrane potential distributions of different layers of a spiking ResNet20 with 1&2 timesteps on the CIFAR-10 dataset." It shows that these distributions "are very different along layers." This visual evidence supports the claim that a one-size-fits-all quantization is suboptimal.
To tackle this, the paper introduces a trainable ternary spike neuron, where the spike amplitude can be learned during the training phase.
The equations for the trainable ternary spike neuron are:
$
u^t = \tau u^{t-1} (1 - |b^{t-1}|) + \sum_j w_j o_{j,\mathrm{pre}}^t
$
$
o^t = \left{ \begin{array}{l} {1 \cdot a, \mathrm{~if~} u^t \geq V_{\mathrm{th}}} \ {-1 \cdot a, \mathrm{~if~} u^t \leq -V_{\mathrm{th}}} \ {0 \cdot a, \mathrm{~otherwise}} \end{array} \right.
$
Where:
-
, , , , , are the same as defined for the
ternary spike neuron. -
: Represents the normalized ternary spike value at the previous timestep before scaling by . This is used in the reset term to ensure the membrane potential reset logic is based on whether a non-zero spike was fired, regardless of its amplitude.
-
: The actual output spike at timestep , which is a normalized ternary spike scaled by a
trainable factor. -
: A
trainable factor(scalar) that determines thespike amplitude. This factor is set in alayer-wise manner, meaning each layer learns its own . This allows neurons in different layers to fire spikes of different magnitudes, adapting to their specificmembrane potential distributions.The paper notes that while the concept of a
trainable factoris similar toReal Spike(Guo et al. 2022d), the motivations differ. In this paper, adjusts spike amplitude to suit membrane potential distributions, whereasReal Spikeused a trainable factor to learn unshared convolution kernels.
4.2.3.1. Re-parameterization Technique
A potential issue with the trainable ternary spike is that the multiplication (when ) or (when ) would reintroduce actual multiplications, thus losing the multiplication-free advantage during inference. To circumvent this, the paper adopts a training-inference decoupled technique via re-parameterization. This technique allows the learned factor to be folded into the network's weights, effectively restoring the multiplication-free inference.
The re-parameterization technique is illustrated using a convolution layer:
Let be the input feature map and be the output feature map of a convolution layer. The convolution operation is:
$
\mathbf{G} = \mathbf{K} * \mathbf{F}
$
Where:
-
: Output feature map.
-
: Convolution kernel tensor (weights).
-
: Input feature map.
-
: Convolution operation.
In a
trainable ternary SNN, the input feature map consists of real-valued spikes scaled by the trainable factor . So, can be expressed as: $ \mathbf{F} = \boldsymbol{a} \cdot \mathbf{B} $ Where: -
: A tensor representing the layer-wise
trainable factorthat scales the spikes. For a given layer, this would effectively be a scalar multiplied across all spatial dimensions of the input feature map. -
: A tensor representing the normalized
ternary spikes() before scaling by .During inference, the
trainable factorcan be folded into the convolution kernels . This transformation creates a new set of convolution kernels such that the network can operate with normalizedternary spikes() without changing the output feature map : $ \mathbf{G} = \mathbf{K} * (\boldsymbol{a} \cdot \mathbf{B}) = (\boldsymbol{a} \cdot \mathbf{K}) * \mathbf{B} = \tilde{\mathbf{K}} * \mathbf{B} $ Where: -
: The
transformed convolution kernel tensor, where each original weight in is scaled by the corresponding layer's learned factor , resulting in .This
re-parameterizationallows the SNN to be trained withreal-valued spikes(scaled by ) for better learning and adaptation, but then converted into an equivalent network that only emitsnormalized ternary spikes() during inference. This ensures that themultiplication-freeadvantage and thus the energy efficiency of SNNs are fully retained in the deployment phase.
5. Experimental Setup
5.1. Datasets
The experiments were conducted on a diverse set of datasets, including both static image datasets and a dynamic event-based dataset, to thoroughly evaluate the proposed method's performance across different data modalities.
- CIFAR-10 (Krizhevsky, Nair, and Hinton 2010):
- Source: Canadian Institute for Advanced Research (CIFAR).
- Scale: 60,000 32x32 color images in 10 classes, with 6,000 images per class. 50,000 images for training and 10,000 for testing.
- Characteristics: Small-scale, low-resolution natural images, commonly used for image classification benchmarks.
- Domain: Object recognition.
- Purpose: To evaluate performance on a fundamental image classification task.
- CIFAR-100 (Krizhevsky, Nair, and Hinton 2010):
- Source: Canadian Institute for Advanced Research (CIFAR).
- Scale: 60,000 32x32 color images in 100 classes, with 600 images per class. 50,000 for training and 10,000 for testing.
- Characteristics: Similar to CIFAR-10 but with a larger number of classes and fewer images per class, making it a more challenging classification task. The classes are grouped into 20 superclasses.
- Domain: Object recognition.
- Purpose: To test the method's ability to handle more fine-grained classification problems.
- ImageNet (Deng et al. 2009):
- Source: Large-scale hierarchical image database.
- Scale: Over 14 million images across more than 20,000 categories. The standard ImageNet-1k subset used for benchmarks contains 1.28 million training images and 50,000 validation images across 1,000 object categories.
- Characteristics: Large-scale, high-resolution natural images, considered a standard and highly challenging benchmark for computer vision models.
- Domain: Object recognition.
- Purpose: To validate the method's scalability and effectiveness on complex, real-world image classification.
- CIFAR10-DVS (Li et al. 2017):
-
Source: Event-stream dataset for object classification, recorded using a Dynamic Vision Sensor (DVS) camera.
-
Scale: Event data corresponding to the CIFAR-10 dataset, capturing asynchronous events (pixel brightness changes) rather than traditional frames.
-
Characteristics: A
neuromorphic datasetthat captures temporal dynamics of visual information. SNNs are particularly well-suited for this type of data due to their event-driven nature. -
Domain: Event-based object classification.
-
Purpose: To demonstrate the method's performance on dynamic, biologically inspired data, where SNNs typically show stronger advantages.
The choice of these datasets is effective for validating the method's performance as they cover a range of complexities and data types, from small static images to large-scale static images and dynamic event streams, allowing for a comprehensive evaluation of the proposed
Ternary SpikeandTrainable Ternary Spikemodels. The paper does not provide concrete examples of data samples.
-
5.2. Evaluation Metrics
The primary evaluation metric reported for classification tasks in this paper is Top-1 Accuracy. For energy efficiency analysis, the paper uses FLOPs, SOPs, and Sign operations, along with their associated energy costs.
5.2.1. Top-1 Accuracy
- Conceptual Definition:
Top-1 Accuracyis a common metric in multi-class classification, representing the proportion of correctly classified instances where the model's highest-confidence prediction matches the true label. It quantifies the overall correctness of the model's single best prediction. - Mathematical Formula: $ \text{Top-1 Accuracy} = \frac{\text{Number of Correct Top-1 Predictions}}{\text{Total Number of Samples}} $
- Symbol Explanation:
Number of Correct Top-1 Predictions: The count of instances where the class predicted by the model with the highest probability (its top-1 prediction) is identical to the actual true class label.Total Number of Samples: The total number of instances (e.g., images) in the dataset being evaluated.
5.2.2. Energy Estimation Metrics
The paper quantifies energy consumption using three types of operations:
- FLOPs (Floating Point Operations):
- Conceptual Definition:
FLOPsmeasure the number of floating-point arithmetic operations (e.g., additions, multiplications involving real numbers) performed by a model. These are typically associated with layers that do not solely operate on binary or ternary spikes, such as the initial rate-encoding layer in SNNs which converts input pixels into spike trains. - Mathematical Formula: There isn't a single universal formula for FLOPs; it's calculated by summing the floating-point operations of each layer (e.g., for a convolution, it's approximately for multiplications and additions).
- Symbol Explanation: Specific to the operations performed (e.g., multiplications, additions). The paper assigns an energy cost of per FLOP.
- Conceptual Definition:
- SOPs (Synaptic Operations):
- Conceptual Definition:
SOPsrepresent the energy-efficient operations specific to SNNs, where multiplications of binary or ternary spikes by weights are replaced by additions or subtractions. This metric is used for layers that process spikes. The paper calculates SOPs as , where is the mean sparsity, is the timestep, and is the addition number in ANN (presumably the number of synaptic computations if it were an ANN). - Mathematical Formula: $ \text{SOPs} = s \times T \times A $
- Symbol Explanation:
- : Mean
sparsityof spikes (the proportion of non-zero spikes). - : Number of
timesteps(simulation steps). - : The number of additions that would be performed in an equivalent ANN layer. The paper assigns an energy cost of per SOP.
- : Mean
- Conceptual Definition:
- Sign Operations:
- Conceptual Definition: This refers to the operation performed by the
LIFneuron to determine if a spike should be fired (i.e., checking if the membrane potential crosses a threshold and assigning a sign). The number ofSignoperations is generally much smaller than convolution operations. - Mathematical Formula: Not explicitly provided, but implicitly refers to the number of times a neuron's output state is determined based on its membrane potential crossing a threshold.
- Symbol Explanation: Calculated based on the number of
LIFneurons and timesteps. The paper assigns an energy cost of perSignoperation.
- Conceptual Definition: This refers to the operation performed by the
5.3. Baselines
The paper compares its Ternary Spike and Trainable Ternary Spike methods against numerous state-of-the-art (SoTA) SNN models. These baselines represent leading approaches in both ANN-SNN conversion and direct SNN training, providing a comprehensive comparison of performance.
Common Architectures Used for Baselines:
- VGG11/VGG16: Deep convolutional networks known for their simplicity and effectiveness in image classification.
- ResNet18/ResNet19/ResNet20/ResNet34: Residual Networks, which use skip connections to enable the training of much deeper networks, are widely adopted as backbones in SNN research.
- CIFARNet, PLIFNet, 7B-wideNet: Other specific SNN or ANN architectures used in prior works.
Key Baselines Mentioned:
- ANN-SNN Conversion Methods:
SpikeNorm(Sengupta et al. 2019)RMP(Han, Srinivasan, and Roy 2020)
- Hybrid Training Methods:
Hybrid-Train(Rathi et al. 2020)LTL(Yang et al. 2022)
- Direct SNN Training Methods (most relevant for comparison):
-
TSSL-BP(Zhang and Li 2020) -
TL(Wu et al. 2021b) /PTL(Wu et al. 2021c) - Tandem Learning methods -
PLIF(Fang et al. 2021b) - Incorporates learnable membrane time constants. -
DSR(Meng et al. 2022) - Differentiation on spike representation. -
KDSNN(Xu et al. 2023) - Knowledge Distillation for SNNs. -
Joint A-SNN(Guo et al. 2023b) - Joint training of ANNs and SNNs. -
Diet-SNN(Rathi and Roy 2020) - Direct Input Encoding. -
Dspike(Li et al. 2021b) - Differentiable spike for training. -
STBP-tdBN(Zheng et al. 2021) - Spatio-temporal backpropagation with time-dependent Batch Normalization. -
TET(Deng et al. 2022) - Temporal Efficient Training. -
RecDis-SNN(Guo et al. 2022c) - Rectifying Membrane Potential Distribution. -
RMP-Loss(Guo et al. 2023a) - Regularizing Membrane Potential Distribution. -
IM-Loss(Guo et al. 2022a) - Information Maximization Loss. -
Real Spike(Guo et al. 2022d) - Learning real-valued spikes. -
MPBN(Guo et al. 2023c) - Membrane Potential Batch Normalization. -
InfLoR-SNN(Guo et al. 2022b) - Reducing Information Loss. -
SEW ResNet(Fang et al. 2021a) - Deep residual learning in SNNs. -
OTTT(Xiao et al. 2022) - Online Training Through Time. -
GLIF(Yao et al. 2022) - Gated Leaky Integrate-and-Fire Neuron.These baselines are representative because they cover a wide spectrum of recent advancements in SNN training, including approaches to handle
information loss, optimize learning, and improve performance on various datasets and architectures. This allows the authors to position their work effectively within the current research landscape.
-
6. Results & Analysis
The experimental results demonstrate the effectiveness and efficiency of the proposed Ternary Spike and Trainable Ternary Spike methods across various datasets and network architectures.
6.1. Core Results Analysis
6.1.1. Ablation Study
The ablation study, conducted on the ImageNet dataset using ResNet18 and ResNet34 backbones with different timesteps, systematically evaluates the contribution of the ternary spike and trainable ternary spike components.
The following are the results from Table 1 of the original paper:
| Architecture | Method | Time-step | Accuracy |
|---|---|---|---|
| ResNet18 | Binary spike | 2 | 58.30% |
| ResNet18 | Ternary spike | 2 | 65.87% |
| ResNet18 | Trainable ternary spike | 2 | 66.40% |
| ResNet18 | Binary spike | 4 | 61.07% |
| ResNet18 | Ternary spike | 4 | 66.90% |
| ResNet18 | Trainable ternary spike | 4 | 67.68% |
| ResNet34 | Binary spike | 2 | 62.81% |
| ResNet34 | Ternary spike | 2 | 69.48% |
| ResNet34 | Trainable ternary spike | 2 | 69.51% |
| ResNet34 | Binary spike | 4 | 63.82% |
| ResNet34 | Ternary spike | 4 | 70.12% |
| ResNet34 | Trainable ternary spike | 4 | 70.74% |
Analysis:
- Impact of Ternary Spike: A significant performance boost is observed when moving from
Binary spiketoTernary spike. For instance, with ResNet18 at 4 timesteps, accuracy jumps from 61.07% to 66.90% (a 5.83% absolute improvement). Similarly, for ResNet34 at 4 timesteps, accuracy increases from 63.82% to 70.12% (a 6.30% absolute improvement). This clearly validates the theoretical claim that increasing the information capacity viaternary spikeseffectively mitigatesinformation lossand improves task performance. - Impact of Trainable Ternary Spike: Further performance gains are achieved by incorporating the
trainable ternary spike. For ResNet18 at 4 timesteps, accuracy rises from 66.90% to 67.68% (an additional 0.78% improvement). For ResNet34 at 4 timesteps, it moves from 70.12% to 70.74% (an additional 0.62% improvement). This demonstrates the benefit of allowing layer-wise spike amplitudes to adapt to differentmembrane potential distributions, further optimizing information representation. - Timestep Dependency: As expected in SNNs, increasing the number of
timestepsgenerally leads to higher accuracy, as more temporal information can be processed. However, theternary spikemethods consistently show substantial improvements overbinary spikesat both 2 and 4 timesteps, indicating their effectiveness regardless of the timestep budget.
6.1.2. Comparison with SoTA methods on CIFAR-10(100)
The paper compares its methods against numerous state-of-the-art SNNs on CIFAR-10 and CIFAR-100 datasets.
The following are the results from Table 2 of the original paper:
| Dataset | Method | Type | Architecture | Timestep | Accuracy | |
|---|---|---|---|---|---|---|
| CIFAR-10 | SpikeNorm (Sengupta et al. 2019) | ANN2SNN | VGG16 | 2500 | 91.55% | |
| Hybrid-Train (Rathi et al. 2020) | Hybrid training | VGG16 | 200 | 92.02% | ||
| TSSL-BP (Zhang and Li 2020) | SNN training | CIFARNet | 5 | 91.41% | ||
| TL (Wu et al. 2021b) | Tandem Learning | CIFARNet | 8 | 89.04% | ||
| PTL (Wu et al. 2021c) | Tandem Learning | VGG11 | 16 | 91.24% | ||
| PLIF (Fang et al. 2021b) | SNN training | PLIFNet | 8 | 93.50% | ||
| DSR (Meng et al. 2022) | SNN training | ResNet18 | 20 | 95.40% | ||
| KDSNN (Xu et al. 2023) | SNN training | ResNet18 | 4 | 93.41% | ||
| Joint A-SNN (Guo et al. 2023b) | SNN training | ResNet18 | 4 | 95.45% | ||
| Diet-SNN (Rathi and Roy 2020) | SNN training | ResNet20 | 10 | 92.54% | ||
| Dspike (Li et al. 2021b) | SNN training | ResNet20 | 2 | 93.13% | ||
| Dspike (Li et al. 2021b) | SNN training | ResNet20 | 4 | 93.66% | ||
| STBP-tdBN (Zheng et al. 2021) | SNN training | ResNet19 | 2 | 92.34% | ||
| STBP-tdBN (Zheng et al. 2021) | SNN training | ResNet19 | 4 | 92.92% | ||
| TET (Deng et al. 2022) | SNN training | ResNet19 | 2 | 94.16% | ||
| TET (Deng et al. 2022) | SNN training | ResNet19 | 4 | 94.44% | ||
| RecDis-SNN (Guo et al. 2022c) | SNN training | ResNet19 | 2 | 93.64% | ||
| RecDis-SNN (Guo et al. 2022c) | SNN training | ResNet19 | 4 | 95.53% | ||
| RMP-Loss (Guo et al. 2023a) | SNN training | ResNet19 | 2 | 95.31% | ||
| RMP-Loss (Guo et al. 2023a) | SNN training | ResNet19 | 4 | 95.51% | ||
| RMP (Han, Srinivasan, and Roy 2020) | ANN2SNN | ResNet20 | 2048 | 94.96%±0.10 | ||
| Ternary Spike | SNN training | ResNet20 | 1 | 91.89% | ||
| ResNet19 | 2 | 95.60% ±0.09 | ||||
| ResNet20 | 2 | 94.29%±0.08 | ||||
| Trainable Ternary Spike | SNN training | ResNet19 | 1 | 95.58%±0.08 | ||
| ResNet19 | 2 | 95.80%±0.10 | ||||
| ResNet20 | 2 | 94.48%±0.09 | ||||
| ResNet20 | 4 | 94.96%±0.10 | ||||
| CIFAR-100 | Real Spike (Guo et al. 2022d) | SNN training | ResNet20 | 4 | 67.82% | |
| LTL (Yang et al. 2022) | Tandem Learning | ResNet20 | 5 | 66.60% | ||
| Diet-SNN (Rathi and Roy 2020) | SNN training | ResNet20 | 31 | 76.08% | ||
| RecDis-SNN (Guo et al. 2022c) | SNN training | ResNet19 | 4 | 74.10% | ||
| Dspike (Li et al. 2021b) | SNN training | ResNet20 | 2 | 71.68% | ||
| TET (Deng et al. 2022) | SNN training | ResNet19 | 2 | 72.87% | ||
| TET (Deng et al. 2022) | SNN training | ResNet19 | 4 | 73.35% | ||
| Ternary Spike | SNN training | ResNet19 | 1 | 78.13%±0.11 | ||
| ResNet19 | 2 | 79.66%±0.08 | ||||
| ResNet20 | 2 | 73.00%±0.08 | ||||
| Trainable Ternary Spike | SNN training | ResNet19 | 1 | 78.45% ±0.08 | ||
| ResNet19 | 2 | 80.20%±0.10 | ||||
| ResNet20 | 2 | 73.41%±0.12 | ||||
| ResNet20 | 4 | 74.02%±0.08 | ||||
Analysis:
- CIFAR-10:
- The
Trainable Ternary Spikeachieves a remarkable95.80% ±0.10accuracy with ResNet19 at only 2 timesteps. This surpasses previous state-of-the-art results likeRecDis-SNN(95.53% at 4 timesteps) andRMP-Loss(95.51% at 4 timesteps) with fewer timesteps. - Even with 1 timestep,
Trainable Ternary Spikereaches95.58% ±0.08(ResNet19), which is competitive with or better than many methods using 2-4 timesteps. - The
Ternary Spike(non-trainable) also shows strong performance, e.g.,95.60% ±0.09(ResNet19, 2 timesteps). - Compared to ANN-SNN conversion methods like
SpikeNorm(91.55% at 2500 timesteps) orRMP(94.96% at 2048 timesteps), the proposed methods achieve higher accuracy with drastically fewer timesteps, highlighting superior efficiency.
- The
- CIFAR-100:
- On the more challenging CIFAR-100 dataset,
Trainable Ternary Spikeachieves80.20% ±0.10with ResNet19 at 2 timesteps, which is a significant improvement over other methods. For example,TETachieves 73.35% at 4 timesteps, andRecDis-SNNachieves 74.10% at 4 timesteps. Ternary Spikealso performs very well, reaching79.66% ±0.08with ResNet19 at 2 timesteps.- This demonstrates that the proposed methods are effective not only for simpler datasets but also for more complex classification tasks.
- On the more challenging CIFAR-100 dataset,
6.1.3. Comparison with SoTA methods on ImageNet
ImageNet is a much larger and more complex dataset. The ability to perform well here is crucial for validating the scalability of the proposed methods.
The following are the results from Table 3 of the original paper:
| Method | Type | Architecture | Timestep | Accuracy |
|---|---|---|---|---|
| STBP-tdBN (Zheng et al. 2021) | SNN training | ResNet34 | 6 | 63.72% |
| TET (Deng et al. 2022) | SNN training | ResNet34 | 6 | 64.79% |
| RecDis-SNN (Guo et al. 2022c) | SNN training | ResNet34 | 6 | 67.33% |
| OTTT (Xiao et al. 2022) | SNN training | ResNet34 | 6 | 65.15% |
| GLIF (Yao et al. 2022) | SNN training | ResNet34 | 4 | 67.52% |
| DSR (Meng et al. 2022) | SNN training | ResNet18 | 50 | 67.74% |
| IM-Loss (Guo et al. 2022a) | SNN training | ResNet18 | 6 | 67.43% |
| Real Spike (Guo et al. 2022d) | SNN training | ResNet18 | 4 | 63.68% |
| Real Spike (Guo et al. 2022d) | SNN training | ResNet34 | 4 | 67.69% |
| RMP-Loss (Guo et al. 2023a) | SNN training | ResNet18 | 4 | 63.03% |
| RMP-Loss (Guo et al. 2023a) | SNN training | ResNet34 | 4 | 65.17% |
| MPBN (Guo et al. 2023c) | SNN training | ResNet18 | 4 | 63.14% |
| MPBN (Guo et al. 2023c) | SNN training | ResNet34 | 4 | 64.71% |
| InfLoR-SNN (Guo et al. 2022b) | SNN training | ResNet18 | 4 | 64.78% |
| InfLoR-SNN (Guo et al. 2022b) | SNN training | ResNet34 | 4 | 65.54% |
| SEW ResNet (Fang et al. 2021a) | SNN training | ResNet18 | 4 | 63.18% |
| SEW ResNet (Fang et al. 2021a) | SNN training | ResNet34 | 4 | 67.04% |
| Ternary Spike | SNN training | ResNet18 | 4 | 66.90%±0.19 |
| Ternary Spike | SNN training | ResNet34 | 4 | 70.12%±0.15 |
| Trainable Ternary Spike | SNN training | ResNet18 | 4 | 67.68%±0.13 |
| Trainable Ternary Spike | SNN training | ResNet34 | 4 | 70.74%±0.11 |
Analysis:
- The
Trainable Ternary Spikeachieves70.74% ±0.11top-1 accuracy on ImageNet using ResNet34 with only 4 timesteps. This is a significant improvement of approximately 3% compared to other state-of-the-art SNN models, such asRecDis-SNN(67.33% at 6 timesteps),GLIF(67.52% at 4 timesteps), andReal Spike(67.69% at 4 timesteps). - Even the
Ternary Spike(non-trainable) shows strong performance, reaching70.12% ±0.15with ResNet34 at 4 timesteps. - For ResNet18,
Trainable Ternary Spikeachieves67.68% ±0.13at 4 timesteps, outperforming many baselines, thoughDSR(67.74%) marginally surpasses it with 50 timesteps. However,DSRrequires a much higher timestep count (50 vs. 4), making theTrainable Ternary Spikesignificantly more efficient in terms of latency and energy per inference. - These results highlight the scalability and robustness of the
ternary spikeapproach for handling large-scale and complex image recognition tasks, indicating its potential for broader real-world applications.
6.1.4. Comparison with SoTA methods on CIFAR10-DVS
This dataset is critical for evaluating SNNs because it is event-based and dynamic, directly benefiting from the temporal processing capabilities of SNNs.
The following are the results from Table 4 of the original paper:
| Method | Type | Architecture | Timestep | Accuracy |
|---|---|---|---|---|
| DSR (Meng et al. 2022) | SNN training | VGG11 | 20 | 77.27% |
| GLIF (Yao et al. 2022) | SNN training | 7B-wideNet | 16 | 78.10% |
| STBP-tdBN (Zheng et al. 2021) | SNN training | ResNet19 | 10 | 67.80% |
| RecDis-SNN (Guo et al. 2022c) | SNN training | ResNet19 | 10 | 72.42% |
| Real Spike (Guo et al. 2022d) | SNN training | ResNet19 | 10 | 72.85% |
| Ternary Spike | SNN training | ResNet19 | 10 | 78.40%±0.21 |
| Ternary Spike | SNN training | ResNet20 | 10 | 78.70%±0.17 |
| Trainable Ternary Spike | SNN training | ResNet19 | 10 | 79.80%±0.16 |
| Trainable Ternary Spike | SNN training | ResNet20 | 10 | 79.80%±0.19 |
Analysis:
- On the CIFAR10-DVS dataset, both
Ternary SpikeandTrainable Ternary Spikeachieve significantly higher accuracy compared to baselines. - The
Trainable Ternary Spikereaches79.80% ±0.16(ResNet19, 10 timesteps) and79.80% ±0.19(ResNet20, 10 timesteps), approaching 80% accuracy. This is a substantial improvement over methods likeDSR(77.27% at 20 timesteps),GLIF(78.10% at 16 timesteps), andReal Spike(72.85% at 10 timesteps). - The strong performance on this neuromorphic dataset further validates the effectiveness of the proposed methods in handling dynamic, event-based data, which is a key application area for SNNs.
6.1.5. Energy Estimation
The paper provides an energy cost comparison between Binary Spike and Ternary Spike for ResNet20 on CIFAR10 with 2 timesteps.
The following are the results from Table 5 of the original paper:
| Method | #Flops | #Sops | #Sign | Energy |
|---|---|---|---|---|
| Binary Spike | 3.54M | 71.20M | 0.11M | 50.14uJ |
| Ternary Spike | 3.54M | 79.21M | 0.23M | 51.20uJ |
Analysis:
- FLOPs (Floating Point Operations): Both
Binary SpikeandTernary Spikemodels have the same number of FLOPs (3.54M), indicating that the initial rate-encoding layer (which involves non-spike operations) remains consistent. - SOPs (Synaptic Operations): The
Ternary Spikemodel has a slightly higher number of SOPs (79.21M) compared toBinary Spike(71.20M). This is attributed to theternary spikehaving a highersparsity(18.27%) than thebinary spike(16.42%), meaning more non-zero spikes (1 or -1) are fired, leading to more active synaptic operations. - Sign Operations: The
Ternary Spikemodel has roughly double the number ofSignoperations (0.23M) compared toBinary Spike(0.11M). This is expected because theternary neuronchecks for two thresholds ( and ) instead of just one (). - Total Energy: Despite the increases in SOPs and Sign operations, the total energy consumption for
Ternary Spike(51.20uJ) is only marginally higher thanBinary Spike(50.14uJ). This represents an increase of approximately 2.11% (calculated as ). - Conclusion: The energy estimation demonstrates that the substantial accuracy gains achieved by the
Ternary Spike(as shown in ablation studies and comparisons) come with a very minimal increase in energy consumption. This indicates that the proposed method is highlyenergy-efficientand maintains the core low-power advantage of SNNs while significantly boosting performance.
6.2. Ablation Studies / Parameter Analysis
The ablation study presented in Table 1 directly addresses the effectiveness of the model's components:
-
Effectiveness of Ternary Spike: The first step in the ablation shows a consistent and large improvement when switching from
Binary spiketoTernary spike. This validates the core hypothesis that increasing the spike's information cardinality (from 2 states to 3 states) significantly enhances the SNN'srepresentation capabilityand reducesinformation loss. The 5-6% accuracy jump is a strong indicator. -
Effectiveness of Trainable Ternary Spike: The second step shows that adding the
trainable factorforTrainable Ternary Spikeyields further, albeit smaller, improvements (around 0.6-0.8%). This confirms the authors' premise thatmembrane potential distributionsvary across layers, and allowing for layer-wise adaptive spike amplitudes is beneficial for fine-tuning performance. -
Hyper-parameter
timesteps: While not an explicit ablation, the results across 2 and 4 timesteps show that the performance of all methods generally increases with more timesteps. However, theTernary Spikevariants consistently maintain their superior performance overBinary spikeat both timestep settings, suggesting that the benefits ofternary spikesare not heavily dependent on a specific timestep count but rather provide a fundamental improvement in information encoding. The ability to achieve high accuracy with very few timesteps (e.g., 1 or 2 on CIFAR datasets) is a key advantage for real-time and low-power applications.In summary, the ablation study clearly validates the design choices of the proposed method, demonstrating that both the move to
ternary spikesand the introduction oftrainable amplitudescontribute positively to the overall performance of SNNs.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper rigorously demonstrates a fundamental limitation of traditional Spiking Neural Networks (SNNs): their binary spike activation maps possess insufficient information capacity, leading to information loss and compromised accuracy. To address this, the authors successfully propose a novel ternary spike neuron that utilizes values. This innovation is theoretically proven to significantly increase information capacity while crucially preserving the energy-efficient event-driven and multiplication-free (addition/subtraction-based) characteristics of SNNs.
Furthermore, recognizing the varying membrane potential distributions across different network layers, the paper introduces a trainable ternary spike neuron. This model embeds a learnable factor () to enable layer-wise adaptive spike amplitudes (), allowing SNNs to better model neuronal activity. For inference, a clever re-parameterization technique is employed to fold these learned factors into the network weights, converting the model back to a standard ternary spike SNN and thereby maintaining computational efficiency.
Extensive experiments on diverse static (CIFAR10, CIFAR100, ImageNet) and dynamic (CIFAR10-DVS) datasets, using popular architectures, consistently show that both Ternary Spike and Trainable Ternary Spike models achieve state-of-the-art performance with high energy efficiency, often surpassing prior methods by a considerable margin.
7.2. Limitations & Future Work
While the paper presents a highly effective solution, it implicitly points to certain limitations and opens avenues for future work:
- Increased Energy Consumption (minor): Although the
ternary spikemaintains the multiplication-free advantage, the energy estimation shows a slight increase (2.11%) in total energy consumption due to higherSOPs(more active spikes) and double theSignoperations (checking two thresholds). For extremely energy-constrained neuromorphic applications, even this small increase might be a consideration, prompting further optimization of ternary spike hardware implementations. - Complexity of Training
alpha: While there-parameterizationhandles inference efficiency, thetrainable factorintroduces an additional learning parameter per layer during training. The paper does not delve into the sensitivity of the training process to this factor or alternative methods for learning optimal amplitudes (e.g., through more complex optimization landscapes). - Beyond Ternary Spikes: The paper establishes the benefit of moving from binary to ternary. This naturally raises the question of whether higher-order spiking (e.g., quaternary, quinary, or even small integer-valued spikes) could yield further benefits, and at what point the energy efficiency advantages of SNNs would be significantly compromised. The boundary between a "spike" (discrete, efficient) and a "quantized activation" (continuous-like, less efficient) becomes blurred.
- Theoretical bounds of information capacity: While the paper provides a good theoretical analysis using
log N, further exploration into the effective information carried in a real SNN (which has non-uniform distributions and temporal dependencies) could be a complex but fruitful research direction. - Hardware Implementation Details: The paper discusses the theoretical maintenance of efficiency but does not detail specific hardware considerations for implementing ternary spikes (e.g., how the signals are physically transmitted and processed on neuromorphic chips compared to ).
7.3. Personal Insights & Critique
This paper offers a compelling and intuitively sound solution to a fundamental problem in SNNs. The core insight that binary spikes are inherently information-starved is well-supported both theoretically and experimentally. The move to ternary spikes is a logical next step in enhancing representation capability without abandoning the efficiency paradigm.
Inspirations & Applications:
- Bridging the SNN-ANN Gap: The substantial accuracy improvements achieved, especially on large-scale datasets like ImageNet, are crucial for making SNNs competitive with ANNs. This work paves the way for SNNs to be deployed in more complex, real-world applications where high accuracy is paramount.
- Biologically Plausible Learning: The idea of
learnable spike amplitudesis fascinating. Biological neurons exhibit complex firing behaviors and signal modulation. Allowing SNN layers to adapt their spike magnitudes might be a step towards more biologically realistic and efficient learning mechanisms beyond simple binary communication. - Generalizable Paradigm: The
ternary spikeandtrainable factorconcepts appear highly generalizable. They are not tied to a specific SNN architecture or dataset type, suggesting they could be applied across various SNN models and tasks. - Future of Low-Power AI: As AI demands grow, the need for energy-efficient hardware and algorithms becomes critical. This paper demonstrates that significant performance gains in SNNs can be achieved with only marginal energy cost increases, reinforcing the potential of SNNs for sustainable AI.
Potential Issues & Areas for Improvement:
-
Optimality of Ternary: While ternary is better than binary, why specifically? Could other ternary sets (e.g., for some non-unity ) or even a small set of learned discrete spike values (beyond just scaling ) offer further advantages? The choice of is elegant because it maintains the addition/subtraction benefit, but exploring broader discrete sets with careful hardware considerations could be interesting.
-
The Reset Mechanism: The use of for the reset term is clever. It ensures reset for both positive and negative spikes. However, the precise biological interpretation of a "negative spike" and its reset mechanism might warrant further discussion, especially for truly bio-inspired SNNs.
-
Comparison Fairness (Timesteps): While the paper's results are very strong, some comparisons with baselines that use significantly higher timesteps (e.g., DSR on ImageNet with 50 timesteps vs. 4 timesteps for Ternary Spike) are technically valid but emphasize the efficiency of the proposed method. However, for a pure accuracy comparison, it's worth noting the timestep disparity. The paper does a good job of consistently using low timesteps for its methods.
-
Robustness to Noise: The paper does not specifically discuss the robustness of ternary spikes to noise, especially on neuromorphic hardware, which can be prone to variability. A ternary signal might be slightly more susceptible to noise than a purely binary (on/off) signal if the distinction between -1, 0, and 1 becomes blurred.
Overall, this work is a substantial contribution to SNN research, offering a practical and theoretically grounded approach to enhancing SNN performance while staying true to their energy-efficient principles. It is likely to inspire further research into alternative spike representations and adaptive SNN learning mechanisms.
Similar papers
Recommended via semantic vector search.