dual-timescale-memory-spiking-neuron-astrocyte-network-efficient - SKILL.md Agent Skill

name: dual-timescale-memory-spiking-neuron-astrocyte-network-efficient version: 1.0 date: 2026-04-23 paper: arXiv:2604.15391 authors: Tsybina, Antonova, Shchanikov, Kulagin, Mikhaylov, Kazantsev, Demin, Gordleeva description: > Spiking Neuron-Astrocyte Network (SNAN) combining STDP-based long-term memory with astrocytic calcium-mediated short-term suppression for efficient navigation. Introduces Topological-Context Memory as a dual-timescale working memory mechanism that reduces median path length by up to 6x and dramatically improves goal completion rates. tags: [spiking neural network, astrocyte, navigation, working memory, STDP, dual-timescale, neuromorphic]

Dual-Timescale Memory in Spiking Neuron-Astrocyte Network

Summary

This paper introduces a Spiking Neuron-Astrocyte Network (SNAN) for efficient navigation that implements dual-timescale memory by combining two biologically-inspired mechanisms:

Long-term memory (slow timescale): Spike-Timing-Dependent Plasticity (STDP) reinforces successful action sequences after goal achievement, creating permanent pathway modifications.
Short-term memory (fast timescale): Astrocytic calcium transients suppress recently visited states, creating temporary inhibition that biases exploration toward unvisited regions.

This combination produces what the authors call Topological-Context Memory (TCM) — a novel form of working memory that maintains both the graph topology of successful routes and the temporal context of recent visits. The astrocytic modulation naturally balances exploration-exploitation as an emergent property of local state suppression, without explicit reward shaping or external exploration bonuses.

Key Methodology

Network Architecture:
- Each state in the navigation environment is encoded by a population of spiking neurons (state neurons).
- Each possible action (movement direction) is encoded by action neuron populations.
- State-to-action connectivity forms the decision policy.
- Astrocytes are coupled to state neurons and respond to neural activity via calcium signaling.
Neuron Model:
- Spiking neurons use the Leaky Integrate-and-Fire (LIF) model for computational efficiency.
- Membrane potential dynamics follow standard LIF equations with synaptic inputs.
- Action potentials are generated when membrane potential exceeds threshold.
STDP Learning (Long-Timescale Memory):
- Reward-modulated STDP is applied upon successful goal completion.
- Synapses along the successful trajectory are strengthened proportionally to spike-timing correlations.
- Reinforces state-action pairs that led to the goal, building long-term navigational memory.
- Learning occurs only after reaching the goal (episodic reinforcement), not continuously.
Astrocytic Calcium Dynamics (Short-Timescale Memory):
- Astrocytes are activated when their associated state neurons fire.
- Calcium transients in astrocytes produce inhibitory modulation on the corresponding state neurons.
- The calcium dynamics operate on a shorter timescale (seconds) compared to STDP (episodes).
- This creates a temporary suppression of recently visited states.
Dual-Timescale Integration:
- STDP provides slow, cumulative learning of successful routes (long-term memory).
- Astrocytic calcium provides fast, decaying suppression of recently visited states (short-term memory).
- The interplay biases the agent toward unexplored regions while reinforcing known successful paths.
- The exploration-exploitation trade-off is handled as an emergent property rather than an explicit mechanism.
Navigation Task:
- Agent navigates grid-based environments from a start position to a goal.
- At each step, the current state activates corresponding neurons; the most active action neuron determines movement.
- Upon goal completion, the STDP learning rule updates synaptic weights along the trajectory.
- Astrocytic suppression operates continuously during each episode.

Core Equations / Model

Leaky Integrate-and-Fire (LIF) Neuron Model

The membrane potential V_i(t) of neuron i evolves as:

tau_m * dV_i/dt = -(V_i - V_rest) + R_m * sum_j w_ij * sum_k alpha(t - t_j^k)

where:

tau_m = membrane time constant
V_rest = resting potential
R_m = membrane resistance
w_ij = synaptic weight from neuron j to neuron i
alpha(.) = postsynaptic current kernel (e.g., exponential decay)
t_j^k = k-th spike time of presynaptic neuron j

Spike emission: if V_i(t) >= V_th, emit spike, then V_i -> V_reset for refractory period tau_ref.

STDP Learning Rule (Reward-Modulated)

Weight update upon goal completion:

Delta_w_ij = eta * R * sum_{spike pairs} W(Delta_t)

where:

eta = learning rate
R = reward signal (1 upon goal, 0 otherwise)
Delta_t = t_post - t_pre = spike timing difference
W(Delta_t) = STDP window function

Typical STDP window:

W(Delta_t) = A_+ * exp(-Delta_t / tau_+)   if Delta_t > 0  (LTP)
W(Delta_t) = A_- * exp(Delta_t / tau_-)     if Delta_t < 0  (LTD)

Astrocytic Calcium Dynamics

Calcium concentration in astrocyte a coupled to state neuron i:

d[Ca2+]_a/dt = -([Ca2+]_a - [Ca2+]_baseline) / tau_Ca + gamma * S_i(t)

where:

tau_Ca = calcium decay time constant (short timescale, seconds)
[Ca2+]_baseline = baseline calcium concentration
gamma = coupling strength between neural activity and astrocyte calcium
S_i(t) = spike train of associated neuron i (filtered/convolved)

Astrocytic Modulation (Inhibitory)

The astrocyte modulates the effective threshold or synaptic efficacy:

V_th,i^eff(t) = V_th + beta * ([Ca2+]_a(t) - [Ca2+]_baseline)

or equivalently as inhibitory current:

I_astro,i(t) = -beta * ([Ca2+]_a(t) - [Ca2+]_baseline)

where beta = astrocytic modulation strength. This suppresses recently active states by raising their firing threshold.

Combined Decision Dynamics

At each navigation step, action selection follows:

a* = argmax_a [ sum_i w_{s->a} * r_i(t) - beta * [Ca2+]_s(t) ]

where astrocytic suppression of the current state biases the agent toward less-recently-visited neighbors.

Implementation Notes

Practical Implementation Steps

Define the environment: Grid world with states, actions (up/down/left/right), start and goal positions.
Initialize neural populations: One population per state and per action. Use LIF neurons with standard parameters.
Initialize astrocytes: One astrocyte per state, with calcium dynamics initialized at baseline.
Set hyperparameters:
- tau_m ~ 20 ms (membrane time constant)
- tau_Ca ~ 1-5 s (calcium decay — sets short-term memory window)
- STDP learning rate eta: small enough for stable learning, large enough for convergence within ~100 episodes
- Astrocytic modulation strength beta: tune to balance exploration vs exploitation
- A_+/A_-: STDP amplitude ratios (typically asymmetric, e.g., A_+ > |A_-|)
Simulation loop:
- For each episode: agent starts at initial state
- At each step: activate state neurons, simulate astrocyte calcium, compute action selection, move agent
- Upon goal: apply reward-modulated STDP to all synapses along the trajectory
- Track path length and success rate across episodes

Key Design Considerations

Timescale separation is critical: The calcium decay time tau_Ca should be long enough to suppress recently visited states within an episode but short enough to allow revisiting in subsequent episodes. Typically 1-5 seconds of simulated time.
STDP requires episodic credit assignment: Since navigation involves multi-step decisions, the reward-modulated STDP must propagate credit across the entire successful trajectory.
Population coding improves robustness: Using populations of neurons per state/action rather than single neurons provides noise resistance and richer dynamics.
Astrocyte coupling strength matters: Too strong suppression prevents re-visiting necessary waypoints; too weak provides insufficient exploration bias.
Network can be implemented in neuromorphic hardware due to purely local learning rules (STDP + astrocytic calcium dynamics require only local information).

Simulation Frameworks

Brian2 spiking neural network simulator (Python) is well-suited for this model
NEST simulator for larger-scale implementations
Neuromorphic hardware: Loihi, SpiNNaker platforms for energy-efficient deployment

Extensions and Variations

Continuous environments: Extend from grid to continuous state-action spaces with place cell encoding
Multi-agent scenarios: Astrocytic suppression can coordinate multi-agent exploration
Hierarchical navigation: Stack multiple SNAN layers for multi-scale planning
Dynamic environments: Astrocyte timescales can adapt to environment change frequency

Results

Key Quantitative Results

Path length reduction: SNAN reduces median path length by up to 6x compared to baseline spiking networks without astrocytic modulation.
Goal completion rates: Dramatically improved over baseline, especially in complex maze environments.
Exploration efficiency: The dual-timescale mechanism achieves near-optimal exploration without explicit exploration bonuses or epsilon-greedy policies.
Learning speed: Convergence to near-optimal paths typically within tens of episodes (exact numbers depend on maze complexity).

Comparison Conditions

Compared against: (1) SNN with STDP only (no astrocytes), (2) SNN with random exploration, (3) standard reinforcement learning baselines.
The astrocytic mechanism provides the critical improvement in exploration efficiency.
STDP-only networks suffer from excessive revisiting of states (looping behavior) without the astrocytic short-term suppression.

Key Findings

The dual-timescale mechanism is necessary — neither component alone achieves the same performance.
Topological-Context Memory emerges naturally from the interaction of neural STDP and astrocytic calcium dynamics.
The exploration-exploitation trade-off is mitigated as an emergent consequence, not a designed feature.
The approach is biologically plausible and aligns with known neuron-astrocyte interactions in the brain.
The mechanism generalizes across different maze topologies and complexity levels.

Activation Triggers

Use this skill when:

Designing spiking neural networks for navigation or path-planning tasks
Implementing biologically-inspired working memory mechanisms
Building neuromorphic systems that require exploration-exploitation balance without explicit algorithms
Creating SNNs with dual-timescale dynamics (fast inhibition + slow learning)
Implementing astrocyte-neuron co-computation models
Seeking alternatives to epsilon-greedy or softmax exploration in spiking RL
Designing Topological-Context Memory systems for spatial reasoning
Implementing reward-modulated STDP for sequential decision-making
Building energy-efficient navigation systems for neuromorphic hardware (Loihi, SpiNNaker)
Researching neuron-glia interactions in computational neuroscience