spiking-recurrent-cells-fpga-accelerator - SKILL.md Agent Skill

name: spiking-recurrent-cells-fpga-accelerator description: "FPGA accelerator for Spiking Recurrent Cell (SRC) neurons — a trade-off between biological plausibility and hardware cost. Removes costly unary operators (tanh, exp) via piecewise approximations, avoids floating-point arithmetic. Achieves 96.31% MNIST accuracy at 0.55-2.2mJ per digit on Artix-7 FPGA. Activation: FPGA spiking neural network, SRC neuron hardware, neuromorphic FPGA, energy-efficient SNN, VHDL SNN implementation, spiking recurrent cells."

Spiking Recurrent Cells FPGA Accelerator

Energy-efficient FPGA implementation of Spiking Recurrent Cell (SRC) neurons, providing a trade-off between biological plausibility and hardware cost with piecewise approximations replacing costly unary operators.

Metadata

Source: arXiv:2605.10679
Authors: Pascal Harmeling, Florent De Geeter, Guillaume Drion
Published: 2026-05-11
Subjects: Neural and Evolutionary Computing (cs.NE)

Core Methodology

Key Innovation

Spiking Neural Networks can reduce energy vs. ANNs when spiking is sparse and neuron models are hardware-friendly. However, biologically faithful models are too costly for FPGAs, while simple models (IR/LIF) sacrifice neuronal dynamics. This paper introduces an FPGA accelerator for SRC neurons that provides a middle ground, using mathematical simplifications to remove costly unary operators (tanh, exp) and avoid floating-point arithmetic through scaling and piecewise-defined approximations.

Technical Framework

SRC Neuron Model: Spiking Recurrent Cells offer richer dynamics than LIF/IR models while remaining implementable on hardware
Unary Operator Removal: Costly functions (tanh, exp) are replaced with piecewise-defined approximations using fixed-point arithmetic
Weight Storage in LUT-Registers: Weight matrices computed offline are stored directly in LUT-registers without adaptation, demonstrating SRC robustness
VHDL Implementation: Complete network implemented in VHDL on Artix-7 XC7A200T FPGA

Implementation Details

Platform: Artix-7 XC7A200T FPGA, 100 MHz clock
Accuracy: 96.31% on MNIST (220-image spiking trace, 1.7424 ms per digit)
Energy-Accuracy Trade-off:
- 93.32% accuracy at 0.55 mJ per digit (55 images, 5-bit weights)
- 92.89% accuracy at 0.45 mJ per digit (44 images, 4-bit weights)
Quantization: Synaptic weights quantized down to 4 bits with minimal accuracy loss

Implementation Guide

Prerequisites

FPGA development tools (Vivado for Xilinx Artix-7)
VHDL programming knowledge
SRC neuron model definition
MNIST or similar dataset for validation

Step-by-Step

Define SRC Neuron Equations: Start with the full SRC model with recurrent connections
Approximate Unary Operators: Replace tanh and exp with piecewise linear approximations
Fixed-Point Conversion: Convert all floating-point operations to fixed-point arithmetic with appropriate bit-width
VHDL Architecture Design:
- Neuron processing units (parallel or sequential)
- Weight memory in LUT-registers
- Spike routing and timing logic
Weight Off-line Training: Train weights in software, then quantize and load into FPGA
Synthesis & Place/Route: Target Artix-7 or compatible FPGA
Validation: Compare FPGA output with software reference using spike traces

Code Example

# SRC neuron with piecewise approximation (Python reference)
import numpy as np

def src_piecewise_tanh(x):
    """Piecewise linear approximation of tanh for hardware implementation."""
    if x < -2.0:
        return -1.0
    elif x < -1.0:
        return -0.5 * x - 1.5
    elif x < 0.0:
        return x
    elif x < 1.0:
        return -0.5 * x + 1.5
    else:
        return 1.0

def src_neuron_step(membrane, input_current, weights, prev_spikes, dt=0.001):
    """SRC neuron update with hardware-friendly approximations."""
    # Recurrent input
    recurrent = np.dot(weights, prev_spikes)
    
    # Total input
    total = input_current + recurrent
    
    # Piecewise activation (replaces tanh)
    activation = src_piecewise_tanh(total)
    
    # Membrane update
    membrane_new = membrane + dt * activation
    
    # Spike generation
    spike = 1.0 if membrane_new > 1.0 else 0.0
    
    # Reset
    if spike:
        membrane_new = 0.0
    
    return membrane_new, spike

VHDL Implementation Notes

Use block RAM or LUT-registers for weight storage
Implement piecewise functions as lookup tables or comparator chains
Pipeline spike processing for throughput
Clock domain: 100 MHz target for Artix-7

Applications

Edge AI inference: Low-power SNN deployment on FPGA for IoT devices
Neuromorphic prototyping: Hardware validation of SRC neuron models
Energy-constrained SNNs: Battery-powered neural processing with sub-mJ energy budget
Real-time signal processing: Event-based processing with guaranteed latency

Pitfalls

Offline training only: Weights are computed offline and loaded; no on-chip learning
Limited precision: 4-5 bit quantization introduces accuracy degradation vs. full precision
Model-specific: SRC-specific implementation may not generalize to other neuron models
FPGA resource constraints: LUT-register storage limits network size; larger networks require external memory

Related Skills

snn-mcu-fullfeature-edge (SNN on microcontrollers)
circuit-level-spiking-neuron-robustness (hardware-robust spiking neurons)
quantized-snn-hardware-optimization (SNN quantization)
edgespike-edge-iot-snn (edge SNN deployment)
neuromorphic-continual-nuclear-ics (neuromorphic continual learning)
snn-fpga-hardware-software-codesign