name: dual-timescale-memory-spiking-neuron-astrocyte-network-efficient version: 1.0 date: 2026-04-23 paper: arXiv:2604.15391 authors: Tsybina, Antonova, Shchanikov, Kulagin, Mikhaylov, Kazantsev, Demin, Gordleeva description: > Spiking Neuron-Astrocyte Network (SNAN) combining STDP-based long-term memory with astrocytic calcium-mediated short-term suppression for efficient navigation. Introduces Topological-Context Memory as a dual-timescale working memory mechanism that reduces median path length by up to 6x and dramatically improves goal completion rates. tags: [spiking neural network, astrocyte, navigation, working memory, STDP, dual-timescale, neuromorphic]
Dual-Timescale Memory in Spiking Neuron-Astrocyte Network
Summary
This paper introduces a Spiking Neuron-Astrocyte Network (SNAN) for efficient navigation that implements dual-timescale memory by combining two biologically-inspired mechanisms:
- Long-term memory (slow timescale): Spike-Timing-Dependent Plasticity (STDP) reinforces successful action sequences after goal achievement, creating permanent pathway modifications.
- Short-term memory (fast timescale): Astrocytic calcium transients suppress recently visited states, creating temporary inhibition that biases exploration toward unvisited regions.
This combination produces what the authors call Topological-Context Memory (TCM) — a novel form of working memory that maintains both the graph topology of successful routes and the temporal context of recent visits. The astrocytic modulation naturally balances exploration-exploitation as an emergent property of local state suppression, without explicit reward shaping or external exploration bonuses.
Key Methodology
Network Architecture:
- Each state in the navigation environment is encoded by a population of spiking neurons (state neurons).
- Each possible action (movement direction) is encoded by action neuron populations.
- State-to-action connectivity forms the decision policy.
- Astrocytes are coupled to state neurons and respond to neural activity via calcium signaling.
Neuron Model:
- Spiking neurons use the Leaky Integrate-and-Fire (LIF) model for computational efficiency.
- Membrane potential dynamics follow standard LIF equations with synaptic inputs.
- Action potentials are generated when membrane potential exceeds threshold.
STDP Learning (Long-Timescale Memory):
- Reward-modulated STDP is applied upon successful goal completion.
- Synapses along the successful trajectory are strengthened proportionally to spike-timing correlations.
- Reinforces state-action pairs that led to the goal, building long-term navigational memory.
- Learning occurs only after reaching the goal (episodic reinforcement), not continuously.
Astrocytic Calcium Dynamics (Short-Timescale Memory):
- Astrocytes are activated when their associated state neurons fire.
- Calcium transients in astrocytes produce inhibitory modulation on the corresponding state neurons.
- The calcium dynamics operate on a shorter timescale (seconds) compared to STDP (episodes).
- This creates a temporary suppression of recently visited states.
Dual-Timescale Integration:
- STDP provides slow, cumulative learning of successful routes (long-term memory).
- Astrocytic calcium provides fast, decaying suppression of recently visited states (short-term memory).
- The interplay biases the agent toward unexplored regions while reinforcing known successful paths.
- The exploration-exploitation trade-off is handled as an emergent property rather than an explicit mechanism.
Navigation Task:
- Agent navigates grid-based environments from a start position to a goal.
- At each step, the current state activates corresponding neurons; the most active action neuron determines movement.
- Upon goal completion, the STDP learning rule updates synaptic weights along the trajectory.
- Astrocytic suppression operates continuously during each episode.
Core Equations / Model
Leaky Integrate-and-Fire (LIF) Neuron Model
The membrane potential V_i(t) of neuron i evolves as:
tau_m * dV_i/dt = -(V_i - V_rest) + R_m * sum_j w_ij * sum_k alpha(t - t_j^k)
where:
- tau_m = membrane time constant
- V_rest = resting potential
- R_m = membrane resistance
- w_ij = synaptic weight from neuron j to neuron i
- alpha(.) = postsynaptic current kernel (e.g., exponential decay)
- t_j^k = k-th spike time of presynaptic neuron j
Spike emission: if V_i(t) >= V_th, emit spike, then V_i -> V_reset for refractory period tau_ref.
STDP Learning Rule (Reward-Modulated)
Weight update upon goal completion:
Delta_w_ij = eta * R * sum_{spike pairs} W(Delta_t)
where:
- eta = learning rate
- R = reward signal (1 upon goal, 0 otherwise)
- Delta_t = t_post - t_pre = spike timing difference
- W(Delta_t) = STDP window function
Typical STDP window:
W(Delta_t) = A_+ * exp(-Delta_t / tau_+) if Delta_t > 0 (LTP)
W(Delta_t) = A_- * exp(Delta_t / tau_-) if Delta_t < 0 (LTD)
Astrocytic Calcium Dynamics
Calcium concentration in astrocyte a coupled to state neuron i:
d[Ca2+]_a/dt = -([Ca2+]_a - [Ca2+]_baseline) / tau_Ca + gamma * S_i(t)
where:
- tau_Ca = calcium decay time constant (short timescale, seconds)
- [Ca2+]_baseline = baseline calcium concentration
- gamma = coupling strength between neural activity and astrocyte calcium
- S_i(t) = spike train of associated neuron i (filtered/convolved)
Astrocytic Modulation (Inhibitory)
The astrocyte modulates the effective threshold or synaptic efficacy:
V_th,i^eff(t) = V_th + beta * ([Ca2+]_a(t) - [Ca2+]_baseline)
or equivalently as inhibitory current:
I_astro,i(t) = -beta * ([Ca2+]_a(t) - [Ca2+]_baseline)
where beta = astrocytic modulation strength. This suppresses recently active states by raising their firing threshold.
Combined Decision Dynamics
At each navigation step, action selection follows:
a* = argmax_a [ sum_i w_{s->a} * r_i(t) - beta * [Ca2+]_s(t) ]
where astrocytic suppression of the current state biases the agent toward less-recently-visited neighbors.
Implementation Notes
Practical Implementation Steps
- Define the environment: Grid world with states, actions (up/down/left/right), start and goal positions.
- Initialize neural populations: One population per state and per action. Use LIF neurons with standard parameters.
- Initialize astrocytes: One astrocyte per state, with calcium dynamics initialized at baseline.
- Set hyperparameters:
- tau_m ~ 20 ms (membrane time constant)
- tau_Ca ~ 1-5 s (calcium decay — sets short-term memory window)
- STDP learning rate eta: small enough for stable learning, large enough for convergence within ~100 episodes
- Astrocytic modulation strength beta: tune to balance exploration vs exploitation
- A_+/A_-: STDP amplitude ratios (typically asymmetric, e.g., A_+ > |A_-|)
- Simulation loop:
- For each episode: agent starts at initial state
- At each step: activate state neurons, simulate astrocyte calcium, compute action selection, move agent
- Upon goal: apply reward-modulated STDP to all synapses along the trajectory
- Track path length and success rate across episodes
Key Design Considerations
- Timescale separation is critical: The calcium decay time tau_Ca should be long enough to suppress recently visited states within an episode but short enough to allow revisiting in subsequent episodes. Typically 1-5 seconds of simulated time.
- STDP requires episodic credit assignment: Since navigation involves multi-step decisions, the reward-modulated STDP must propagate credit across the entire successful trajectory.
- Population coding improves robustness: Using populations of neurons per state/action rather than single neurons provides noise resistance and richer dynamics.
- Astrocyte coupling strength matters: Too strong suppression prevents re-visiting necessary waypoints; too weak provides insufficient exploration bias.
- Network can be implemented in neuromorphic hardware due to purely local learning rules (STDP + astrocytic calcium dynamics require only local information).
Simulation Frameworks
- Brian2 spiking neural network simulator (Python) is well-suited for this model
- NEST simulator for larger-scale implementations
- Neuromorphic hardware: Loihi, SpiNNaker platforms for energy-efficient deployment
Extensions and Variations
- Continuous environments: Extend from grid to continuous state-action spaces with place cell encoding
- Multi-agent scenarios: Astrocytic suppression can coordinate multi-agent exploration
- Hierarchical navigation: Stack multiple SNAN layers for multi-scale planning
- Dynamic environments: Astrocyte timescales can adapt to environment change frequency
Results
Key Quantitative Results
- Path length reduction: SNAN reduces median path length by up to 6x compared to baseline spiking networks without astrocytic modulation.
- Goal completion rates: Dramatically improved over baseline, especially in complex maze environments.
- Exploration efficiency: The dual-timescale mechanism achieves near-optimal exploration without explicit exploration bonuses or epsilon-greedy policies.
- Learning speed: Convergence to near-optimal paths typically within tens of episodes (exact numbers depend on maze complexity).
Comparison Conditions
- Compared against: (1) SNN with STDP only (no astrocytes), (2) SNN with random exploration, (3) standard reinforcement learning baselines.
- The astrocytic mechanism provides the critical improvement in exploration efficiency.
- STDP-only networks suffer from excessive revisiting of states (looping behavior) without the astrocytic short-term suppression.
Key Findings
- The dual-timescale mechanism is necessary — neither component alone achieves the same performance.
- Topological-Context Memory emerges naturally from the interaction of neural STDP and astrocytic calcium dynamics.
- The exploration-exploitation trade-off is mitigated as an emergent consequence, not a designed feature.
- The approach is biologically plausible and aligns with known neuron-astrocyte interactions in the brain.
- The mechanism generalizes across different maze topologies and complexity levels.
Activation Triggers
Use this skill when:
- Designing spiking neural networks for navigation or path-planning tasks
- Implementing biologically-inspired working memory mechanisms
- Building neuromorphic systems that require exploration-exploitation balance without explicit algorithms
- Creating SNNs with dual-timescale dynamics (fast inhibition + slow learning)
- Implementing astrocyte-neuron co-computation models
- Seeking alternatives to epsilon-greedy or softmax exploration in spiking RL
- Designing Topological-Context Memory systems for spatial reasoning
- Implementing reward-modulated STDP for sequential decision-making
- Building energy-efficient navigation systems for neuromorphic hardware (Loihi, SpiNNaker)
- Researching neuron-glia interactions in computational neuroscience