dual-timescale-memory-spiking-neuron-astrocyte-network-efficient

star 2

Spiking Neuron-Astrocyte Network (SNAN) combining STDP-based long-term memory with astrocytic calcium-mediated short-term suppression for efficient navigation. Introduces Topological-Context Memory as a dual-timescale working memory mechanism that reduces median path length by up to 6x and dramatically improves goal completion rates.

hiyenwong By hiyenwong schedule Updated 6/3/2026

name: dual-timescale-memory-spiking-neuron-astrocyte-network-efficient version: 1.0 date: 2026-04-23 paper: arXiv:2604.15391 authors: Tsybina, Antonova, Shchanikov, Kulagin, Mikhaylov, Kazantsev, Demin, Gordleeva description: > Spiking Neuron-Astrocyte Network (SNAN) combining STDP-based long-term memory with astrocytic calcium-mediated short-term suppression for efficient navigation. Introduces Topological-Context Memory as a dual-timescale working memory mechanism that reduces median path length by up to 6x and dramatically improves goal completion rates. tags: [spiking neural network, astrocyte, navigation, working memory, STDP, dual-timescale, neuromorphic]

Dual-Timescale Memory in Spiking Neuron-Astrocyte Network

Summary

This paper introduces a Spiking Neuron-Astrocyte Network (SNAN) for efficient navigation that implements dual-timescale memory by combining two biologically-inspired mechanisms:

  1. Long-term memory (slow timescale): Spike-Timing-Dependent Plasticity (STDP) reinforces successful action sequences after goal achievement, creating permanent pathway modifications.
  2. Short-term memory (fast timescale): Astrocytic calcium transients suppress recently visited states, creating temporary inhibition that biases exploration toward unvisited regions.

This combination produces what the authors call Topological-Context Memory (TCM) — a novel form of working memory that maintains both the graph topology of successful routes and the temporal context of recent visits. The astrocytic modulation naturally balances exploration-exploitation as an emergent property of local state suppression, without explicit reward shaping or external exploration bonuses.

Key Methodology

  1. Network Architecture:

    • Each state in the navigation environment is encoded by a population of spiking neurons (state neurons).
    • Each possible action (movement direction) is encoded by action neuron populations.
    • State-to-action connectivity forms the decision policy.
    • Astrocytes are coupled to state neurons and respond to neural activity via calcium signaling.
  2. Neuron Model:

    • Spiking neurons use the Leaky Integrate-and-Fire (LIF) model for computational efficiency.
    • Membrane potential dynamics follow standard LIF equations with synaptic inputs.
    • Action potentials are generated when membrane potential exceeds threshold.
  3. STDP Learning (Long-Timescale Memory):

    • Reward-modulated STDP is applied upon successful goal completion.
    • Synapses along the successful trajectory are strengthened proportionally to spike-timing correlations.
    • Reinforces state-action pairs that led to the goal, building long-term navigational memory.
    • Learning occurs only after reaching the goal (episodic reinforcement), not continuously.
  4. Astrocytic Calcium Dynamics (Short-Timescale Memory):

    • Astrocytes are activated when their associated state neurons fire.
    • Calcium transients in astrocytes produce inhibitory modulation on the corresponding state neurons.
    • The calcium dynamics operate on a shorter timescale (seconds) compared to STDP (episodes).
    • This creates a temporary suppression of recently visited states.
  5. Dual-Timescale Integration:

    • STDP provides slow, cumulative learning of successful routes (long-term memory).
    • Astrocytic calcium provides fast, decaying suppression of recently visited states (short-term memory).
    • The interplay biases the agent toward unexplored regions while reinforcing known successful paths.
    • The exploration-exploitation trade-off is handled as an emergent property rather than an explicit mechanism.
  6. Navigation Task:

    • Agent navigates grid-based environments from a start position to a goal.
    • At each step, the current state activates corresponding neurons; the most active action neuron determines movement.
    • Upon goal completion, the STDP learning rule updates synaptic weights along the trajectory.
    • Astrocytic suppression operates continuously during each episode.

Core Equations / Model

Leaky Integrate-and-Fire (LIF) Neuron Model

The membrane potential V_i(t) of neuron i evolves as:

tau_m * dV_i/dt = -(V_i - V_rest) + R_m * sum_j w_ij * sum_k alpha(t - t_j^k)

where:

  • tau_m = membrane time constant
  • V_rest = resting potential
  • R_m = membrane resistance
  • w_ij = synaptic weight from neuron j to neuron i
  • alpha(.) = postsynaptic current kernel (e.g., exponential decay)
  • t_j^k = k-th spike time of presynaptic neuron j

Spike emission: if V_i(t) >= V_th, emit spike, then V_i -> V_reset for refractory period tau_ref.

STDP Learning Rule (Reward-Modulated)

Weight update upon goal completion:

Delta_w_ij = eta * R * sum_{spike pairs} W(Delta_t)

where:

  • eta = learning rate
  • R = reward signal (1 upon goal, 0 otherwise)
  • Delta_t = t_post - t_pre = spike timing difference
  • W(Delta_t) = STDP window function

Typical STDP window:

W(Delta_t) = A_+ * exp(-Delta_t / tau_+)   if Delta_t > 0  (LTP)
W(Delta_t) = A_- * exp(Delta_t / tau_-)     if Delta_t < 0  (LTD)

Astrocytic Calcium Dynamics

Calcium concentration in astrocyte a coupled to state neuron i:

d[Ca2+]_a/dt = -([Ca2+]_a - [Ca2+]_baseline) / tau_Ca + gamma * S_i(t)

where:

  • tau_Ca = calcium decay time constant (short timescale, seconds)
  • [Ca2+]_baseline = baseline calcium concentration
  • gamma = coupling strength between neural activity and astrocyte calcium
  • S_i(t) = spike train of associated neuron i (filtered/convolved)

Astrocytic Modulation (Inhibitory)

The astrocyte modulates the effective threshold or synaptic efficacy:

V_th,i^eff(t) = V_th + beta * ([Ca2+]_a(t) - [Ca2+]_baseline)

or equivalently as inhibitory current:

I_astro,i(t) = -beta * ([Ca2+]_a(t) - [Ca2+]_baseline)

where beta = astrocytic modulation strength. This suppresses recently active states by raising their firing threshold.

Combined Decision Dynamics

At each navigation step, action selection follows:

a* = argmax_a [ sum_i w_{s->a} * r_i(t) - beta * [Ca2+]_s(t) ]

where astrocytic suppression of the current state biases the agent toward less-recently-visited neighbors.

Implementation Notes

Practical Implementation Steps

  1. Define the environment: Grid world with states, actions (up/down/left/right), start and goal positions.
  2. Initialize neural populations: One population per state and per action. Use LIF neurons with standard parameters.
  3. Initialize astrocytes: One astrocyte per state, with calcium dynamics initialized at baseline.
  4. Set hyperparameters:
    • tau_m ~ 20 ms (membrane time constant)
    • tau_Ca ~ 1-5 s (calcium decay — sets short-term memory window)
    • STDP learning rate eta: small enough for stable learning, large enough for convergence within ~100 episodes
    • Astrocytic modulation strength beta: tune to balance exploration vs exploitation
    • A_+/A_-: STDP amplitude ratios (typically asymmetric, e.g., A_+ > |A_-|)
  5. Simulation loop:
    • For each episode: agent starts at initial state
    • At each step: activate state neurons, simulate astrocyte calcium, compute action selection, move agent
    • Upon goal: apply reward-modulated STDP to all synapses along the trajectory
    • Track path length and success rate across episodes

Key Design Considerations

  • Timescale separation is critical: The calcium decay time tau_Ca should be long enough to suppress recently visited states within an episode but short enough to allow revisiting in subsequent episodes. Typically 1-5 seconds of simulated time.
  • STDP requires episodic credit assignment: Since navigation involves multi-step decisions, the reward-modulated STDP must propagate credit across the entire successful trajectory.
  • Population coding improves robustness: Using populations of neurons per state/action rather than single neurons provides noise resistance and richer dynamics.
  • Astrocyte coupling strength matters: Too strong suppression prevents re-visiting necessary waypoints; too weak provides insufficient exploration bias.
  • Network can be implemented in neuromorphic hardware due to purely local learning rules (STDP + astrocytic calcium dynamics require only local information).

Simulation Frameworks

  • Brian2 spiking neural network simulator (Python) is well-suited for this model
  • NEST simulator for larger-scale implementations
  • Neuromorphic hardware: Loihi, SpiNNaker platforms for energy-efficient deployment

Extensions and Variations

  • Continuous environments: Extend from grid to continuous state-action spaces with place cell encoding
  • Multi-agent scenarios: Astrocytic suppression can coordinate multi-agent exploration
  • Hierarchical navigation: Stack multiple SNAN layers for multi-scale planning
  • Dynamic environments: Astrocyte timescales can adapt to environment change frequency

Results

Key Quantitative Results

  • Path length reduction: SNAN reduces median path length by up to 6x compared to baseline spiking networks without astrocytic modulation.
  • Goal completion rates: Dramatically improved over baseline, especially in complex maze environments.
  • Exploration efficiency: The dual-timescale mechanism achieves near-optimal exploration without explicit exploration bonuses or epsilon-greedy policies.
  • Learning speed: Convergence to near-optimal paths typically within tens of episodes (exact numbers depend on maze complexity).

Comparison Conditions

  • Compared against: (1) SNN with STDP only (no astrocytes), (2) SNN with random exploration, (3) standard reinforcement learning baselines.
  • The astrocytic mechanism provides the critical improvement in exploration efficiency.
  • STDP-only networks suffer from excessive revisiting of states (looping behavior) without the astrocytic short-term suppression.

Key Findings

  1. The dual-timescale mechanism is necessary — neither component alone achieves the same performance.
  2. Topological-Context Memory emerges naturally from the interaction of neural STDP and astrocytic calcium dynamics.
  3. The exploration-exploitation trade-off is mitigated as an emergent consequence, not a designed feature.
  4. The approach is biologically plausible and aligns with known neuron-astrocyte interactions in the brain.
  5. The mechanism generalizes across different maze topologies and complexity levels.

Activation Triggers

Use this skill when:

  • Designing spiking neural networks for navigation or path-planning tasks
  • Implementing biologically-inspired working memory mechanisms
  • Building neuromorphic systems that require exploration-exploitation balance without explicit algorithms
  • Creating SNNs with dual-timescale dynamics (fast inhibition + slow learning)
  • Implementing astrocyte-neuron co-computation models
  • Seeking alternatives to epsilon-greedy or softmax exploration in spiking RL
  • Designing Topological-Context Memory systems for spatial reasoning
  • Implementing reward-modulated STDP for sequential decision-making
  • Building energy-efficient navigation systems for neuromorphic hardware (Loihi, SpiNNaker)
  • Researching neuron-glia interactions in computational neuroscience
Install via CLI
npx skills add https://github.com/hiyenwong/ai_collection --skill dual-timescale-memory-spiking-neuron-astrocyte-network-efficient
Repository Details
star Stars 2
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator