ds-time-series-forecasting - SKILL.md Agent Skill

name: ds-time-series-forecasting compatibility: opencode completeness: 95 content-types:

code
guidance
do-dont
examples description: '"Implements ARIMA, exponential smoothing, state-space models, LSTM networks and deep learning methods for temporal prediction"' license: MIT maturity: stable metadata: domain: coding output-format: code related-skills: ds-feature-engineering, ds-neural-networks, ds-regression-evaluation role: implementation scope: implementation triggers: time series forecasting, ARIMA, exponential smoothing, LSTM, forecasting time series prediction archetypes:
- tactical
- generation anti_triggers:
- brainstorming
- vague ideation
- code golf
- over-engineering response_profile: verbosity: low directive_strength: high abstraction_level: operational version: "1.0.0"

Time Series Forecasting

Comprehensive guide to time series forecasting in machine learning and data science workflows.

When to Use This Skill

Solving real-world supervised learning problems
Building machine learning pipelines with time series forecasting
Implementing best practices for time series forecasting
Optimizing model performance using time series forecasting techniques
Learning industry-standard approaches to time series forecasting

When NOT to Use This Skill

When using pre-built libraries without understanding underlying concepts
For toy problems that don't require time series forecasting rigor
When domain expertise in specific problem requires different approach
If your problem doesn't require the complexity this skill provides

Purpose and Key Concepts

Time Series Forecasting is a critical component of the machine learning workflow. This skill covers:

Theoretical foundations — Mathematical principles and statistical concepts
Practical implementation — Working code examples and patterns
Common pitfalls — Mistakes to avoid and how to recover from them
Best practices — Industry-standard approaches and optimization techniques

Core Workflow

Understand the problem — Clearly define what you're solving for
Select approach — Choose the right technique for your data and constraints
Implement solution — Write clean, tested code following best practices
Validate results — Verify your implementation with tests and validation
Optimize performance — Improve efficiency and accuracy incrementally

Implementation Patterns

Pattern 1: Basic Time Series Forecasting

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from sklearn.metrics import mean_absolute_error, mean_squared_error

def basic_forecasting_pipeline(data: pd.Series, forecast_horizon: int = 10) -> dict:
    """
    Demonstrates a basic time series forecasting pipeline using Exponential Smoothing.
    """
    if data is None or data.empty:
        raise ValueError("Input data cannot be None or empty")
    
    if not isinstance(data.index, pd.DatetimeIndex):
        data.index = pd.date_range(start="2020-01-01", periods=len(data), freq="D")
    
    split_idx = int(len(data) * 0.8)
    train_data = data.iloc[:split_idx]
    test_data = data.iloc[split_idx:]
    
    model = ExponentialSmoothing(
        train_data, 
        trend='add', 
        seasonal='add', 
        seasonal_periods=7
    )
    fitted_model = model.fit(optimized=True)
    
    forecasts = fitted_model.forecast(forecast_horizon)
    test_forecasts = fitted_model.predict(start=test_data.index[0], end=test_data.index[-1])
    mae = mean_absolute_error(test_data, test_forecasts)
    rmse = np.sqrt(mean_squared_error(test_data, test_forecasts))
    
    return {
        'forecasts': forecasts
        'metrics': {'mae': mae, 'rmse': rmse}
        'fitted_model': fitted_model
    }

Pattern 2: Production-Ready Time Series Forecasting

import logging
import pandas as pd
import numpy as np
from typing import Any, Dict, Optional
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_absolute_error, mean_squared_error

logger = logging.getLogger(__name__)

class TimeSeriesForecasting:
    """Production-grade time series forecasting class with ARIMA."""
    
    def __init__(self, order: tuple = (1, 1, 1), forecast_steps: int = 10):
        self.order = order
        self.forecast_steps = forecast_steps
        self.model = None
        self.history = None
        
    def _validate_data(self, data: pd.DataFrame) -> pd.Series:
        if data is None or data.empty:
            raise ValueError("Input data cannot be None or empty")
        if not isinstance(data.index, pd.DatetimeIndex):
            raise ValueError("Data index must be a DatetimeIndex")
        return data.squeeze()
        
    def fit(self, data: pd.DataFrame) -> 'TimeSeriesForecasting':
        """Fit the ARIMA model on historical data."""
        self.history = self._validate_data(data)
        try:
            self.model = ARIMA(self.history, order=self.order)
            self.fitted_model = self.model.fit()
            logger.info(f"Model fitted successfully with order {self.order}")
        except Exception as e:
            logger.error(f"Failed to fit model: {e}")
            raise RuntimeError("Model fitting failed") from e
        return self
        
    def predict(self, steps: Optional[int] = None) -> pd.Series:
        """Generate future forecasts."""
        if self.model is None:
            raise RuntimeError("Model must be fitted before prediction")
        horizon = steps or self.forecast_steps
        return self.fitted_model.forecast(steps=horizon)
        
    def evaluate(self, test_data: pd.DataFrame) -> Dict[str, float]:
        """Evaluate model performance on held-out test data."""
        test_series = self._validate_data(test_data)
        test_forecasts = self.fitted_model.predict(
            start=test_series.index[0], 
            end=test_series.index[-1]
        )
        mae = mean_absolute_error(test_series, test_forecasts)
        rmse = np.sqrt(mean_squared_error(test_series, test_forecasts))
        return {'mae': mae, 'rmse': rmse}

Best Practices

✅ Always validate your implementation on test data
✅ Document your assumptions and methodology
✅ Use version control for reproducibility
✅ Monitor performance metrics in production
✅ Periodically review and update your approach
✅ Test with edge cases and outliers
✅ Log all significant operations for debugging

Common Pitfalls

Constraints

MUST DO

Validate all data preprocessing steps are fit-only on training data, never on validation or test sets
Implement reproducible pipelines with fixed random seeds and deterministic operations where possible
Report model performance with confidence intervals via bootstrapping or cross-validation across multiple runs
Log all experiments with parameters, metrics, and artifacts using MLflow or equivalent tracking system

MUST NOT DO

Do not evaluate a model on the same data used for training — always hold out a proper test set
Avoid overfitting to the validation set by limiting hyperparameter search iterations
Never use features that can only be computed at inference time (look-ahead bias)
Do not report single-run accuracy without statistical significance testing or error bars

Live References

Authoritative documentation links for this skill's domain. The model follows markdown links at load time to resolve external references and inline content.