name: lstm-forecaster description: Deep learning time series forecasting using LSTM and GRU networks. Captures long-term dependencies and nonlinear patterns in sequential data. Essential for complex temporal prediction in MCM/ICM.
LSTM/GRU Forecaster
Recurrent Neural Networks (RNN) specialized for time series prediction with memory mechanisms.
When to Use
- Long Sequences: Data with 100+ time steps (e.g., daily data over months/years).
- Nonlinear Dynamics: Complex patterns that ARIMA cannot capture.
- Multivariate Time Series: Multiple related variables evolving together (e.g., weather + energy consumption).
- Long-Term Dependencies: Current value depends on events far in the past.
- Sufficient Data: At least 500+ samples for reliable training (preferably 1000+).
When NOT to Use
- Small Samples: < 100 points → Use
arima-forecasterorgrey-forecaster. - Simple Trends: Linear or exponential growth → Classical methods are faster and more interpretable.
- Need Interpretability: LSTM is a black box. Use
ml-regressorwith feature importance if explanation is critical. - Short Sequences: < 20 time steps → Use simpler models.
Model Comparison
| Model | Strengths | Weaknesses | Best For |
|---|---|---|---|
| LSTM | Captures long-term memory, handles vanishing gradients | Slower training, more parameters | Complex sequences, long dependencies |
| GRU | Faster than LSTM, fewer parameters | Slightly less powerful | When speed matters, shorter sequences |
| Simple RNN | Fast, simple | Vanishing gradient problem | Baseline comparison only |
Architecture Overview
LSTM (Long Short-Term Memory)
- Cell State: Long-term memory highway.
- Gates: Forget gate (discard info), Input gate (add info), Output gate (read info).
- Use Case: Default choice for most time series tasks.
GRU (Gated Recurrent Unit)
- Simplified LSTM with 2 gates (reset, update) instead of 3.
- 25% fewer parameters → Faster training.
- Use Case: When computational budget is tight or data is limited.
Implementation Template
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, GRU, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
import warnings
warnings.filterwarnings('ignore')
class LSTMForecaster:
"""
LSTM/GRU time series forecaster with automatic preprocessing
"""
def __init__(self, model_type='lstm', lookback=20, units=50,
layers=2, dropout=0.2, random_state=42):
"""
Args:
model_type (str): 'lstm' or 'gru'
lookback (int): Number of past time steps to use as input
units (int): Number of units per RNN layer
layers (int): Number of stacked RNN layers
dropout (float): Dropout rate for regularization
random_state (int): Random seed
"""
self.model_type = model_type
self.lookback = lookback
self.units = units
self.layers = layers
self.dropout = dropout
self.random_state = random_state
self.model = None
self.scaler = MinMaxScaler()
self.history = None
# Set random seeds
np.random.seed(random_state)
tf.random.set_seed(random_state)
def create_sequences(self, data, lookback):
"""
Transform time series into supervised learning format
Args:
data (np.array): Time series data (n_samples, n_features)
lookback (int): Number of past steps to use
Returns:
X (np.array): Input sequences (n_samples, lookback, n_features)
y (np.array): Target values (n_samples,)
"""
X, y = [], []
for i in range(len(data) - lookback):
X.append(data[i:i+lookback])
y.append(data[i+lookback, 0]) # Predict first feature
return np.array(X), np.array(y)
def build_model(self, input_shape):
"""
Build LSTM or GRU model
Args:
input_shape (tuple): (lookback, n_features)
"""
model = Sequential()
# Choose RNN type
RNN_Layer = LSTM if self.model_type == 'lstm' else GRU
# Stacked RNN layers
for i in range(self.layers):
return_sequences = (i < self.layers - 1) # All but last layer
model.add(RNN_Layer(
units=self.units,
return_sequences=return_sequences,
input_shape=input_shape if i == 0 else None
))
model.add(Dropout(self.dropout))
# Output layer
model.add(Dense(1))
# Compile
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='mse',
metrics=['mae']
)
self.model = model
return model
def fit(self, data, validation_split=0.2, epochs=100, batch_size=32,
verbose=1, early_stop_patience=10):
"""
Train the model
Args:
data (pd.Series or np.array): Time series data
validation_split (float): Fraction for validation
epochs (int): Maximum training epochs
batch_size (int): Batch size
verbose (int): Verbosity level
early_stop_patience (int): Epochs to wait before early stopping
"""
# Convert to numpy
if isinstance(data, pd.Series):
data = data.values
# Reshape if 1D
if len(data.shape) == 1:
data = data.reshape(-1, 1)
# Normalize
data_scaled = self.scaler.fit_transform(data)
# Create sequences
X, y = self.create_sequences(data_scaled, self.lookback)
print(f"Training data shape: X={X.shape}, y={y.shape}")
# Build model
self.build_model(input_shape=(self.lookback, data.shape[1]))
# Callbacks
callbacks = [
EarlyStopping(
monitor='val_loss',
patience=early_stop_patience,
restore_best_weights=True,
verbose=1
),
ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=5,
min_lr=1e-6,
verbose=1
)
]
# Train
self.history = self.model.fit(
X, y,
validation_split=validation_split,
epochs=epochs,
batch_size=batch_size,
callbacks=callbacks,
verbose=verbose
)
return self
def predict(self, data, steps=1):
"""
Multi-step ahead prediction
Args:
data (np.array): Recent data (at least lookback points)
steps (int): Number of future steps to predict
Returns:
np.array: Predictions (original scale)
"""
# Prepare input
if isinstance(data, pd.Series):
data = data.values
if len(data.shape) == 1:
data = data.reshape(-1, 1)
# Normalize
data_scaled = self.scaler.transform(data)
# Use last lookback points as initial input
current_sequence = data_scaled[-self.lookback:].reshape(1, self.lookback, -1)
predictions = []
# Recursive prediction
for _ in range(steps):
# Predict next step
next_pred = self.model.predict(current_sequence, verbose=0)
predictions.append(next_pred[0, 0])
# Update sequence (shift left, append prediction)
next_pred_full = np.zeros((1, 1, data_scaled.shape[1]))
next_pred_full[0, 0, 0] = next_pred[0, 0]
current_sequence = np.concatenate([
current_sequence[:, 1:, :],
next_pred_full
], axis=1)
# Inverse transform
predictions = np.array(predictions).reshape(-1, 1)
predictions_original = self.scaler.inverse_transform(
np.concatenate([predictions,
np.zeros((len(predictions), data_scaled.shape[1]-1))],
axis=1)
)[:, 0]
return predictions_original
def evaluate(self, data):
"""
Evaluate on test data
Returns:
dict: RMSE and MAE
"""
if isinstance(data, pd.Series):
data = data.values
if len(data.shape) == 1:
data = data.reshape(-1, 1)
data_scaled = self.scaler.transform(data)
X, y = self.create_sequences(data_scaled, self.lookback)
# Predict
y_pred_scaled = self.model.predict(X, verbose=0)
# Inverse transform
y_pred = self.scaler.inverse_transform(
np.concatenate([y_pred_scaled,
np.zeros((len(y_pred_scaled), data.shape[1]-1))],
axis=1)
)[:, 0]
y_true = self.scaler.inverse_transform(
np.concatenate([y.reshape(-1, 1),
np.zeros((len(y), data.shape[1]-1))],
axis=1)
)[:, 0]
rmse = np.sqrt(np.mean((y_true - y_pred)**2))
mae = np.mean(np.abs(y_true - y_pred))
return {
'rmse': rmse,
'mae': mae,
'predictions': y_pred,
'actual': y_true
}
def plot_training_history(history, title='Training History'):
"""Plot loss curves"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Loss
ax1.plot(history.history['loss'], label='Training Loss')
ax1.plot(history.history['val_loss'], label='Validation Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss (MSE)')
ax1.set_title('Loss Curve')
ax1.legend()
ax1.grid(True, alpha=0.3)
# MAE
ax2.plot(history.history['mae'], label='Training MAE')
ax2.plot(history.history['val_mae'], label='Validation MAE')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('MAE')
ax2.set_title('MAE Curve')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('training_history.png', dpi=300)
plt.show()
def plot_forecast(historical_data, forecast_values, title='LSTM Forecast'):
"""Visualize forecast"""
n_hist = len(historical_data)
n_fore = len(forecast_values)
plt.figure(figsize=(12, 6))
# Historical
plt.plot(range(n_hist), historical_data, 'o-',
label='Historical', color='blue', linewidth=2)
# Forecast
plt.plot(range(n_hist, n_hist + n_fore), forecast_values, '^-',
label='Forecast', color='red', linewidth=2)
# Connection line
plt.plot([n_hist-1, n_hist],
[historical_data[-1], forecast_values[0]],
'k--', alpha=0.3)
plt.title(title, fontsize=14, fontweight='bold')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('lstm_forecast.png', dpi=300)
plt.show()
# --- Usage Example ---
if __name__ == "__main__":
# Mock Data: Sinusoidal with trend and noise
np.random.seed(42)
t = np.linspace(0, 100, 500)
trend = 0.05 * t
seasonal = 10 * np.sin(0.5 * t)
noise = np.random.normal(0, 1, len(t))
data = trend + seasonal + noise
# Split data
train_size = int(0.8 * len(data))
train_data = data[:train_size]
test_data = data[train_size:]
print("=" * 60)
print("LSTM FORECASTING")
print("=" * 60)
# Initialize model
model = LSTMForecaster(
model_type='lstm',
lookback=30,
units=64,
layers=2,
dropout=0.2
)
# Train
model.fit(train_data, epochs=50, batch_size=16, verbose=1)
# Plot training history
plot_training_history(model.history)
# Evaluate on test set
results = model.evaluate(test_data)
print(f"\nTest Set Performance:")
print(f" RMSE: {results['rmse']:.4f}")
print(f" MAE: {results['mae']:.4f}")
# Forecast future
forecast = model.predict(data, steps=50)
print(f"\nForecast next 50 steps:")
print(forecast[:10]) # Show first 10
# Visualize
plot_forecast(data, forecast, title='LSTM Time Series Forecast')
# Compare LSTM vs GRU
print("\n" + "=" * 60)
print("GRU FORECASTING (for comparison)")
print("=" * 60)
gru_model = LSTMForecaster(model_type='gru', lookback=30, units=64)
gru_model.fit(train_data, epochs=50, batch_size=16, verbose=0)
gru_results = gru_model.evaluate(test_data)
print(f"\nGRU Test Performance:")
print(f" RMSE: {gru_results['rmse']:.4f}")
print(f" MAE: {gru_results['mae']:.4f}")
Hyperparameter Tuning Guide
Architecture Parameters
- lookback: Input sequence length (10-60). Longer = more context but slower.
- units: Hidden units per layer (32-128). More = more capacity but risk overfitting.
- layers: Number of stacked RNN layers (1-3). More = deeper but harder to train.
- dropout: Regularization (0.1-0.3). Higher = less overfitting but may underfit.
Training Parameters
- batch_size: 16-64 for small datasets, 128-256 for large.
- learning_rate: 0.001 is a good default. Use ReduceLROnPlateau to adapt.
- epochs: 50-200. Use EarlyStopping to prevent overfitting.
Common Pitfalls
- Not Scaling Data: Neural networks require normalized inputs (0-1 or standardized).
- Too Few Samples: LSTM needs 500+ samples. Use classical methods for small data.
- Overfitting: Always use dropout and early stopping. Monitor val_loss.
- Wrong Lookback: Too short = misses patterns. Too long = overfits noise.
- Ignoring Validation Loss: If val_loss >> train_loss, you're overfitting.
Integration Workflow
- Input: Use
data-cleanerto handle missing values and outliers. - Feature Engineering: For multivariate, use
pca-analyzerto reduce feature dimensions. - Comparison: Compare with
arima-forecasterto show LSTM's superiority on nonlinear data. - Uncertainty: Use Monte Carlo Dropout (predict multiple times with dropout enabled) for confidence intervals.
- Visualization: Use
visual-engineerfor publication-quality plots.
Output Requirements for Paper
- Model Architecture: "We used a 2-layer LSTM with 64 units per layer and 0.2 dropout."
- Training Details: "Trained for 87 epochs with early stopping (patience=10) on 80/20 train/val split."
- Performance Metrics: "Test RMSE=2.34, MAE=1.87, outperforming ARIMA (RMSE=3.45)."
- Learning Curves: Show training and validation loss over epochs.
- Forecast Plot: Historical data + multi-step forecast.
- Comparison Table: LSTM vs GRU vs ARIMA performance side-by-side.
Advanced: Attention Mechanism (Optional)
For O-Prize level analysis, add attention to visualize which past time steps the model focuses on:
from tensorflow.keras.layers import Attention, Concatenate
# After LSTM layer
lstm_output = LSTM(units, return_sequences=True)(input_layer)
attention_output = Attention()([lstm_output, lstm_output])
# Continue with Dense layers...
This provides interpretability by showing "the model pays most attention to events 20 days ago."
Decision Guide: LSTM vs Other Methods
| Scenario | Recommended Method |
|---|---|
| < 100 samples | grey-forecaster or arima-forecaster |
| Linear trend, stationary | arima-forecaster |
| Nonlinear, 500+ samples | lstm-forecaster (this skill) |
| Multivariate time series | lstm-forecaster with multiple input features |
| Need interpretability | arima-forecaster or ml-regressor |
| Highest accuracy at any cost | lstm-forecaster with hyperparameter tuning |