joint-multi-tf-v560

name: joint-multi-tf-v560 description: "v5.6.0 joint multi-TF model: single model per symbol with broadcast 1Hour context replaces dual 15Min/1Hour models. Trigger: (1) replacing weighted-voting model aggregation, (2) adding broadcast features to vectorized env, (3) limited training data + worried about overfitting from doubling obs_dim, (4) backtest builder mismatch with newer feature counts." author: Claude Code date: 2026-04-10

Joint Multi-TF Architecture v5.6.0

Experiment Overview

Item	Details
Date	2026-04-10
Goal	Replace separate 15Min/1Hour models + weighted voting with single joint model that uses 15Min steps and 1Hour context broadcast features
Environment	GPUVectorizedTradingEnv, NativePPOTrainer, InferenceObservationBuilder, BacktestObservationBuilder
Status	Code complete, awaiting Colab training + walk-forward validation

Context

Two prior architectures had problems:

v5.2.0 dual-model: Train 15Min and 1Hour models independently, combine via aggregate_timeframe_predictions() weighted voting (15Min w=0.15, 1Hour w=0.25). Models never learn to complement each other.
Mar-25 evaluation: Multi-TF was NOT universally better — only CLSK improved (PF 1.29→3.98), CNM degraded (PF 1.84→1.25), MOH degraded, PSTG overtraded.

The user's plan was a "hierarchical" architecture where 15Min refines confidence in the 1Hour prediction. Three options were considered:

Option	Description	obs_dim	Verdict
A	Full dual 51-feature 1H window (separate window)	10,200	Rejected — doubles first-layer params, ~140K crypto bars too few
B	Broadcast 1H summary features	5,800	Chosen
C	Compressed 1H window (different shape)	~7,000	Rejected — needs MLP arch changes or padding hacks

Verified Workflow

1. Add Config Fields (GPUEnvConfig)

# alpaca_trading/gpu/vectorized_env.py
@dataclass(slots=True)
class GPUEnvConfig:
    # ... existing fields ...
    use_hourly_context: bool = False          # v5.6.0: 1Hour broadcast features
    hourly_context_bars_per_hour: int = 4     # 15Min bars per hour

2. Vectorized Precomputation at Init

def _precompute_hourly_context(self):
    """Pre-compute (T, 7) hourly context tensor — fully vectorized, no Python loops."""
    bph = self.config.hourly_context_bars_per_hour
    p = prices[0] if prices.dim() == 2 else prices  # (T,)
    T = p.shape[0]
    hc = torch.zeros(T, 7, dtype=self.dtype, device=self.device)

    # Returns
    returns = torch.zeros(T, dtype=self.dtype, device=self.device)
    returns[1:] = (p[1:] - p[:-1]) / (p[:-1] + 1e-8)

    # Hour boundary indexing (vectorized)
    bar_indices = torch.arange(T, device=self.device)
    h_end = (bar_indices // bph) * bph     # (T,)
    valid_hour = h_end >= bph
    h_start = (h_end - bph).clamp(min=0)

    # Gather hour-level features
    hour_open = p[h_start]
    hour_close = p[(h_end - 1).clamp(min=0)]
    hc[valid_hour, 0] = (hour_close[valid_hour] - hour_open[valid_hour]) / (hour_open[valid_hour] + 1e-8)

    # Reshape into (n_complete_hours, bph) blocks for per-hour std/range
    n_complete_hours = T // bph
    if n_complete_hours > 0:
        hourly_returns = returns[:n_complete_hours * bph].reshape(n_complete_hours, bph)
        hourly_vol = hourly_returns.std(dim=1)
        hour_idx = ((bar_indices // bph) - 1).clamp(min=0)
        hc[valid_hour, 1] = hourly_vol[hour_idx[valid_hour].clamp(max=n_complete_hours - 1)]

    # Lookback features (16-bar trend, 4/8-bar momentum acceleration)
    p_16_ago = p[(bar_indices - 16).clamp(min=0)]
    hc[:, 2] = (p - p_16_ago) / (p_16_ago + 1e-8)
    hc[:16, 2] = 0.0

    # Cumulative sum trick for rolling RSI proxy
    positive_returns = (returns > 0).float()
    if T >= 16:
        cumsum = positive_returns.cumsum(dim=0)
        rsi_sum = cumsum.clone()
        rsi_sum[16:] = cumsum[16:] - cumsum[:-16]
        hc[16:, 4] = rsi_sum[16:] / 16.0

    # Position in hour (always valid)
    hc[:, 6] = ((bar_indices % bph) + 1).float() / bph

    self._obs_hourly_context = hc

3. O(1) Step-Time Gather + Broadcast

def _get_observations(self):
    # ... existing feature gathers ...

    # 12b. Hourly context features (7) — single gather + broadcast
    if self.config.use_hourly_context:
        if hasattr(self, '_obs_hourly_context'):
            hc_feats = self._obs_hourly_context[step_vals]  # (n_envs, 7)
            obs[:, :, feat_idx:feat_idx+7] = hc_feats.unsqueeze(1).expand(-1, W, -1)
            feat_idx += 7

4. Update Feature Count Detection

# alpaca_trading/gpu/inference_obs_builder.py
def get_target_features_from_obs_dim(obs_dim, window=100):
    features = obs_dim // window
    if features == 58:
        return 58  # v5.6.0: 51 base + 7 hourly context
    elif features == 51:
        return 51  # v5.5.0
    # ... etc

5. Update Both Inference and Backtest Builders

InferenceObservationBuilder delegates to build_inference_observation() — add new branch:

elif target_features == 58:
    n_base = 6; n_hourly_context = 7  # ... etc

BacktestObservationBuilder needs _precompute_hourly_context() and conditional reads in get_obs_at_bar(). Use exact same numpy formulas as the GPU vectorized version for parity.

6. Notebook Changes

Cell 14 (TRAINING_TIMEFRAMES): ['15Min'] (single TF) Cell 24 (env_config): n_features=58, use_hourly_context=True, hourly_context_bars_per_hour=4 Cell 32 (loop): No change — already iterates over (symbol, training_tf) pairs, just one TF now

7. Live Trader / Backtest

No code changes needed. Both auto-detect target_features from model.obs_dim via get_target_features_from_obs_dim(). v5.6.0 models (obs_dim=5800) automatically get hourly context computation.

Failed Attempts (Critical)

Attempt	Why it Failed	Lesson Learned
Python for-loop precomputation	T=140K bars × 7 features = ~1-2s init overhead	Vectorize with `torch.arange + clamp + gather`, even for init code
8 hourly features → 59 total	Collides with v2.7.0 feature count in `get_target_features_from_obs_dim()`	Use 7 features → 58 total to avoid collision
Doubling obs_dim with full dual window	First layer would balloon to 10.4M params (12.6M total) on ~140K crypto bars — overfitting risk	Add features only as broadcast scalars, keep first layer ≤ ~6M params
Trusting `target_features >= 56` checks for backward compat	v5.5.0 (51) was NEVER backtested, so the always-write-7-base-features bug went undetected	Use exact `target_features in (51, 58)` checks rather than `>= N` thresholds
Forgetting reversal probs in BacktestObservationBuilder	v5.5.0/v5.6.0 expect 3 reversal prob features, but builder skipped straight to extended indicators	Match training env feature layout exactly — add reversal probs for `target_features in (51, 58, 65)`
*Initial v5.6.0: index-based hourly grouping (`h_end = (t // bph) bph`)**	Correct for crypto 24/7, broken for equity sessions. ~14% of equity h1_ features were cross-session contaminated; position_in_hour drifted from clock-minute alignment after day 1.*	Always use `pd.DatetimeIndex.floor('h')` clock-hour IDs + `searchsorted` for time-aligned features. NEVER use bar-index modulo for any feature that has a calendar interpretation.
Recommending BTC-only as Phase 1 "PoC"	N=1 cannot validate an architecture. User correctly pushed back: "whatever you are thinking about dropping stock trading is hugely wrong." Single-asset-class testing cannot distinguish architecture from regime.	Phase 1 = smoke test (single symbol, code-runs check). Phase 2 = multi-asset PoC (4 crypto + 4-5 equity, 50-100M each). Always include WST and CNM as deliberate re-tests of the prior "dead" symbols to isolate the architecture variable.
Reading `self.timestamps` instead of `self._timestamps` in `_precompute_hourly_context`	AttributeError on env init. The training env stores timestamps with leading underscore. Test caught it locally before Colab.	After adding new methods that read instance attributes, `grep -n "self\._\?timestamps" file.py` to verify attribute name.

Pre-existing Bugs Discovered During v5.6.0 Work

Bug A: BacktestObservationBuilder hardcoded 7 base + 4 intraday features

BacktestObservationBuilder.get_obs_at_bar() always wrote 7 base + 4 intraday features (53 total) regardless of target_features. For v5.5.0 (51 features), this caused IndexError: index 51 out of bounds. The bug was latent because v5.5.0 models were never backtested before v5.6.0 work — the system jumped from v5.3.0 (65 features, builder works) to v5.6.0 (forced the bug to surface).

Fix: Conditional skip of volume_proxy and intraday for target_features in (51, 58). Also added reversal probs (3 features) for target_features in (51, 58, 65).

Bug B: Trainer `checkpoint_interval=0` ZeroDivisionError (v5.6.0 Phase 1, commit 9ffa9e7)

NativePPOTrainer.train() did n_updates % self.config.checkpoint_interval without guarding against checkpoint_interval=0. The quick_test mode in get_auto_config() (line 1957) explicitly sets it to 0 to mean "disabled," but the trainer never honored that intent.

Latent because: production/standard/thorough/extended modes all set checkpoint_interval to 50/75/100. Quick_test was rarely run until v5.6.0 Phase 1.

Fix: Added self.config.checkpoint_interval > 0 and to the modulo guard.

Bug C: Trainer buffer floor-vs-iterator-ceil off-by-one (v5.6.0 Phase 1, commit a495b2a)

Buffer pre-allocation used _batches_per_epoch = batch_size // minibatch_size (floor), but ReplayBuffer.get_batches() iterates with range(0, batch_size, minibatch_size) which yields ceil(batch_size / minibatch_size) batches (last batch is partial when sizes don't divide evenly).

Concrete trip: n_envs=1024 × n_steps=256 = 262144 / 12288 = 21.33 → buffer for 21*10=210 slots, iterator does 22*10=220 writes → IndexError: index 210 out of bounds for dimension 0 with size 210.

Latent because: production/standard/thorough configs use (n_envs, n_steps, minibatch_size) combinations that happen to divide evenly. Quick_test landed on 21.33.

Fix: Use ceil division (_batch_size + minibatch_size - 1) // minibatch_size for buffer allocation. The _active = _gpu_policy_losses[:_batch_idx] slice at line ~1027 ensures unused tail slots don't pollute aggregates.

Lesson

Every feature-count, network-shape, or schedule-knob change should be smoke-tested with quick_test mode before launching production. Quick_test exercises corners of the configuration space (small n_updates, edge-case batch divisibility) that production runs never reach. The cost of running a 5-minute smoke test is negligible compared to the cost of finding these bugs after a 4-hour production run crashes.

Final Parameters

# v5.6.0 GPUEnvConfig
n_features = 58
use_hourly_context = True
hourly_context_bars_per_hour = 4  # 15Min bars per hour
timeframe = '15Min'  # Stepping timeframe (1Hour context is internal)

# Feature breakdown (58 total)
base = 6                  # norm_price, returns, log_returns, vol, momentum, rsi
temporal = 7              # calendar features
markov = 12               # vol/trend/momentum/macro probs
reversal_probs = 3        # always with markov
extended = 14             # MACD, ADX, Stoch, ATR, BB, CCI, Williams, ROC, RSI
multi_window = 9          # 3 windows × 3 features
hourly_context = 7        # NEW: h1_return, h1_vol, h1_trend, h1_momentum, h1_rsi, h1_range, position_in_hour

Key Insights

First layer width dominates network params (86-90%) — adding broadcast features is much cheaper than adding windowed features (no × window multiplier on params, but still gives the model the info every step).
Broadcast pattern is the right level of abstraction for time-series context features — same-value-across-window matches how the model can already use volatility, momentum, multi-window features.
Crypto overtrains at 200M steps (peak at 50-100M per memory) → modest network growth is preferred over full dual obs space.
Feature count avoidance: Always check get_target_features_from_obs_dim() lookup table before choosing a new total. 58 was deliberately chosen to avoid collision with v2.7.0's 59.
Position-in-hour is the cheapest, most useful feature — single broadcast scalar tells the model "you have 25%/50%/75%/100% of the hour's data" so it knows when to commit vs when to wait.

Testing Strategy

Unit tests for _precompute_hourly_context(): Verify h1_return/h1_volatility/position_in_hour on known prices
Output shape tests: (n_envs, 100, 58) for env, (100, 58) for builders
Train/inference parity: Same prices → same hourly context values across env / inference builder / backtest builder
Backward compat: v5.5.0 models (obs_dim=5100) still load and infer correctly
Equity calendar tests (added after Phase 1): synthetic 26-bar/day equity timestamps with overnight gaps. Verify position_in_hour follows clock minute, h1 features masked at session boundaries.
Quick training smoke test on real BTC/USD data (Phase 1, 10M steps on H100)

25 tests in tests/test_hourly_context.py (18 baseline + 7 equity calendar). All pass.

Phase 1 Smoke Test Results (2026-04-11)

What worked end-to-end: precompute, obs assembly, the new mm(1024×5800, 5800×512) first-layer shape live in the policy network, torch.compile + CUDA graphs + FP8 + AMP, deploy zip, manifest, disconnect verification.

What needed fixing (3 bugs surfaced):

Trainer Bug B above (checkpoint_interval=0) — 1-line fix
Trainer Bug C above (buffer off-by-one) — 1-line fix
Equity calendar bug in own hourly context implementation — clock-aligned via compute_hourly_window_indices() helper, requires pd.DatetimeIndex.floor('h') and searchsorted

Validation trajectory: Model learned then unlearned. PF climbed 0.12 → 0.94 over 6 validations, crashed to 0.50 at validation 7. HOLD% peaked at 61% then reverted to 26%. Direction reward briefly went positive (+0.07 at validation 4). Architecture works; 10M steps is far below convergence threshold for crypto.

Verdict: DROP (correct for a smoke test). Goal was "did the code run?" not "is there edge?"

Phase 2 step count revision based on Phase 1 trajectory:

Crypto: thorough mode (100M each) instead of standard (50M). Memory says crypto peaks at 50-100M for v5.3.0; the trajectory shows BTC was still climbing at validation 6 of 10M, so 50M is borderline.
Equity: standard mode (50M each). Equity has fewer bars per year, so 50M provides more passes over the data.

Workflow Pitfall (v5.6.0 Phase 1)

The training notebook extracts the project from a Drive zip (cell-6). After every code change, the zip must be rebuilt and re-uploaded — git pull does NOT work because the extracted directory isn't a git repo. This caused multiple Phase 1 iterations where the user re-ran the notebook and got the same error because the zip was stale.

Long-term fix: replace cell-6 with git clone --branch=stable from GitHub. Eliminates the sync problem permanently.

Short-term hotfix pattern: in-notebook Python overrides applied INSIDE the cell that uses the affected variable, NOT in a separate hotfix cell (which can be skipped or reordered).

References

alpaca_trading/gpu/vectorized_env.py: _precompute_hourly_context(), _get_observations() lines ~1846-1853, _calculate_obs_features() line ~1067
alpaca_trading/gpu/inference_obs_builder.py: build_inference_observation() (target_features=58 branch), BacktestObservationBuilder._precompute_hourly_context()
alpaca_trading/training/model_version.py: v5.6.0 ModelSpec, version detection
notebooks/training.ipynb: Cell 14 (TRAINING_TIMEFRAMES), Cell 24 (env_config)
tests/test_hourly_context.py: 18 tests (feature count, model version, both builders, env config)
CHANGELOG.md: v5.6.0 entry
See also: multi-timeframe-training skill (the deprecated dual-model approach)