name: joint-multi-tf-v560 description: "v5.6.0 joint multi-TF model: single model per symbol with broadcast 1Hour context replaces dual 15Min/1Hour models. Trigger: (1) replacing weighted-voting model aggregation, (2) adding broadcast features to vectorized env, (3) limited training data + worried about overfitting from doubling obs_dim, (4) backtest builder mismatch with newer feature counts." author: Claude Code date: 2026-04-10
Joint Multi-TF Architecture v5.6.0
Experiment Overview
| Item | Details |
|---|---|
| Date | 2026-04-10 |
| Goal | Replace separate 15Min/1Hour models + weighted voting with single joint model that uses 15Min steps and 1Hour context broadcast features |
| Environment | GPUVectorizedTradingEnv, NativePPOTrainer, InferenceObservationBuilder, BacktestObservationBuilder |
| Status | Code complete, awaiting Colab training + walk-forward validation |
Context
Two prior architectures had problems:
- v5.2.0 dual-model: Train 15Min and 1Hour models independently, combine via
aggregate_timeframe_predictions()weighted voting (15Min w=0.15, 1Hour w=0.25). Models never learn to complement each other. - Mar-25 evaluation: Multi-TF was NOT universally better — only CLSK improved (PF 1.29→3.98), CNM degraded (PF 1.84→1.25), MOH degraded, PSTG overtraded.
The user's plan was a "hierarchical" architecture where 15Min refines confidence in the 1Hour prediction. Three options were considered:
| Option | Description | obs_dim | Verdict |
|---|---|---|---|
| A | Full dual 51-feature 1H window (separate window) | 10,200 | Rejected — doubles first-layer params, ~140K crypto bars too few |
| B | Broadcast 1H summary features | 5,800 | Chosen |
| C | Compressed 1H window (different shape) | ~7,000 | Rejected — needs MLP arch changes or padding hacks |
Verified Workflow
1. Add Config Fields (GPUEnvConfig)
# alpaca_trading/gpu/vectorized_env.py
@dataclass(slots=True)
class GPUEnvConfig:
# ... existing fields ...
use_hourly_context: bool = False # v5.6.0: 1Hour broadcast features
hourly_context_bars_per_hour: int = 4 # 15Min bars per hour
2. Vectorized Precomputation at Init
def _precompute_hourly_context(self):
"""Pre-compute (T, 7) hourly context tensor — fully vectorized, no Python loops."""
bph = self.config.hourly_context_bars_per_hour
p = prices[0] if prices.dim() == 2 else prices # (T,)
T = p.shape[0]
hc = torch.zeros(T, 7, dtype=self.dtype, device=self.device)
# Returns
returns = torch.zeros(T, dtype=self.dtype, device=self.device)
returns[1:] = (p[1:] - p[:-1]) / (p[:-1] + 1e-8)
# Hour boundary indexing (vectorized)
bar_indices = torch.arange(T, device=self.device)
h_end = (bar_indices // bph) * bph # (T,)
valid_hour = h_end >= bph
h_start = (h_end - bph).clamp(min=0)
# Gather hour-level features
hour_open = p[h_start]
hour_close = p[(h_end - 1).clamp(min=0)]
hc[valid_hour, 0] = (hour_close[valid_hour] - hour_open[valid_hour]) / (hour_open[valid_hour] + 1e-8)
# Reshape into (n_complete_hours, bph) blocks for per-hour std/range
n_complete_hours = T // bph
if n_complete_hours > 0:
hourly_returns = returns[:n_complete_hours * bph].reshape(n_complete_hours, bph)
hourly_vol = hourly_returns.std(dim=1)
hour_idx = ((bar_indices // bph) - 1).clamp(min=0)
hc[valid_hour, 1] = hourly_vol[hour_idx[valid_hour].clamp(max=n_complete_hours - 1)]
# Lookback features (16-bar trend, 4/8-bar momentum acceleration)
p_16_ago = p[(bar_indices - 16).clamp(min=0)]
hc[:, 2] = (p - p_16_ago) / (p_16_ago + 1e-8)
hc[:16, 2] = 0.0
# Cumulative sum trick for rolling RSI proxy
positive_returns = (returns > 0).float()
if T >= 16:
cumsum = positive_returns.cumsum(dim=0)
rsi_sum = cumsum.clone()
rsi_sum[16:] = cumsum[16:] - cumsum[:-16]
hc[16:, 4] = rsi_sum[16:] / 16.0
# Position in hour (always valid)
hc[:, 6] = ((bar_indices % bph) + 1).float() / bph
self._obs_hourly_context = hc
3. O(1) Step-Time Gather + Broadcast
def _get_observations(self):
# ... existing feature gathers ...
# 12b. Hourly context features (7) — single gather + broadcast
if self.config.use_hourly_context:
if hasattr(self, '_obs_hourly_context'):
hc_feats = self._obs_hourly_context[step_vals] # (n_envs, 7)
obs[:, :, feat_idx:feat_idx+7] = hc_feats.unsqueeze(1).expand(-1, W, -1)
feat_idx += 7
4. Update Feature Count Detection
# alpaca_trading/gpu/inference_obs_builder.py
def get_target_features_from_obs_dim(obs_dim, window=100):
features = obs_dim // window
if features == 58:
return 58 # v5.6.0: 51 base + 7 hourly context
elif features == 51:
return 51 # v5.5.0
# ... etc
5. Update Both Inference and Backtest Builders
InferenceObservationBuilder delegates to build_inference_observation() — add new branch:
elif target_features == 58:
n_base = 6; n_hourly_context = 7 # ... etc
BacktestObservationBuilder needs _precompute_hourly_context() and conditional reads in get_obs_at_bar(). Use exact same numpy formulas as the GPU vectorized version for parity.
6. Notebook Changes
Cell 14 (TRAINING_TIMEFRAMES): ['15Min'] (single TF)
Cell 24 (env_config): n_features=58, use_hourly_context=True, hourly_context_bars_per_hour=4
Cell 32 (loop): No change — already iterates over (symbol, training_tf) pairs, just one TF now
7. Live Trader / Backtest
No code changes needed. Both auto-detect target_features from model.obs_dim via get_target_features_from_obs_dim(). v5.6.0 models (obs_dim=5800) automatically get hourly context computation.
Failed Attempts (Critical)
| Attempt | Why it Failed | Lesson Learned |
|---|---|---|
| Python for-loop precomputation | T=140K bars × 7 features = ~1-2s init overhead | Vectorize with torch.arange + clamp + gather, even for init code |
| 8 hourly features → 59 total | Collides with v2.7.0 feature count in get_target_features_from_obs_dim() |
Use 7 features → 58 total to avoid collision |
| Doubling obs_dim with full dual window | First layer would balloon to 10.4M params (12.6M total) on ~140K crypto bars — overfitting risk | Add features only as broadcast scalars, keep first layer ≤ ~6M params |
Trusting target_features >= 56 checks for backward compat |
v5.5.0 (51) was NEVER backtested, so the always-write-7-base-features bug went undetected | Use exact target_features in (51, 58) checks rather than >= N thresholds |
| Forgetting reversal probs in BacktestObservationBuilder | v5.5.0/v5.6.0 expect 3 reversal prob features, but builder skipped straight to extended indicators | Match training env feature layout exactly — add reversal probs for target_features in (51, 58, 65) |
Initial v5.6.0: index-based hourly grouping (h_end = (t // bph) * bph) |
Correct for crypto 24/7, broken for equity sessions. ~14% of equity h1_ features were cross-session contaminated; position_in_hour drifted from clock-minute alignment after day 1.* | Always use pd.DatetimeIndex.floor('h') clock-hour IDs + searchsorted for time-aligned features. NEVER use bar-index modulo for any feature that has a calendar interpretation. |
| Recommending BTC-only as Phase 1 "PoC" | N=1 cannot validate an architecture. User correctly pushed back: "whatever you are thinking about dropping stock trading is hugely wrong." Single-asset-class testing cannot distinguish architecture from regime. | Phase 1 = smoke test (single symbol, code-runs check). Phase 2 = multi-asset PoC (4 crypto + 4-5 equity, 50-100M each). Always include WST and CNM as deliberate re-tests of the prior "dead" symbols to isolate the architecture variable. |
Reading self.timestamps instead of self._timestamps in _precompute_hourly_context |
AttributeError on env init. The training env stores timestamps with leading underscore. Test caught it locally before Colab. | After adding new methods that read instance attributes, grep -n "self\._\?timestamps" file.py to verify attribute name. |
Pre-existing Bugs Discovered During v5.6.0 Work
Bug A: BacktestObservationBuilder hardcoded 7 base + 4 intraday features
BacktestObservationBuilder.get_obs_at_bar() always wrote 7 base + 4 intraday features (53 total) regardless of target_features. For v5.5.0 (51 features), this caused IndexError: index 51 out of bounds. The bug was latent because v5.5.0 models were never backtested before v5.6.0 work — the system jumped from v5.3.0 (65 features, builder works) to v5.6.0 (forced the bug to surface).
Fix: Conditional skip of volume_proxy and intraday for target_features in (51, 58). Also added reversal probs (3 features) for target_features in (51, 58, 65).
Bug B: Trainer checkpoint_interval=0 ZeroDivisionError (v5.6.0 Phase 1, commit 9ffa9e7)
NativePPOTrainer.train() did n_updates % self.config.checkpoint_interval without guarding against checkpoint_interval=0. The quick_test mode in get_auto_config() (line 1957) explicitly sets it to 0 to mean "disabled," but the trainer never honored that intent.
Latent because: production/standard/thorough/extended modes all set checkpoint_interval to 50/75/100. Quick_test was rarely run until v5.6.0 Phase 1.
Fix: Added self.config.checkpoint_interval > 0 and to the modulo guard.
Bug C: Trainer buffer floor-vs-iterator-ceil off-by-one (v5.6.0 Phase 1, commit a495b2a)
Buffer pre-allocation used _batches_per_epoch = batch_size // minibatch_size (floor), but ReplayBuffer.get_batches() iterates with range(0, batch_size, minibatch_size) which yields ceil(batch_size / minibatch_size) batches (last batch is partial when sizes don't divide evenly).
Concrete trip: n_envs=1024 × n_steps=256 = 262144 / 12288 = 21.33 → buffer for 21*10=210 slots, iterator does 22*10=220 writes → IndexError: index 210 out of bounds for dimension 0 with size 210.
Latent because: production/standard/thorough configs use (n_envs, n_steps, minibatch_size) combinations that happen to divide evenly. Quick_test landed on 21.33.
Fix: Use ceil division (_batch_size + minibatch_size - 1) // minibatch_size for buffer allocation. The _active = _gpu_policy_losses[:_batch_idx] slice at line ~1027 ensures unused tail slots don't pollute aggregates.
Lesson
Every feature-count, network-shape, or schedule-knob change should be smoke-tested with quick_test mode before launching production. Quick_test exercises corners of the configuration space (small n_updates, edge-case batch divisibility) that production runs never reach. The cost of running a 5-minute smoke test is negligible compared to the cost of finding these bugs after a 4-hour production run crashes.
Final Parameters
# v5.6.0 GPUEnvConfig
n_features = 58
use_hourly_context = True
hourly_context_bars_per_hour = 4 # 15Min bars per hour
timeframe = '15Min' # Stepping timeframe (1Hour context is internal)
# Feature breakdown (58 total)
base = 6 # norm_price, returns, log_returns, vol, momentum, rsi
temporal = 7 # calendar features
markov = 12 # vol/trend/momentum/macro probs
reversal_probs = 3 # always with markov
extended = 14 # MACD, ADX, Stoch, ATR, BB, CCI, Williams, ROC, RSI
multi_window = 9 # 3 windows × 3 features
hourly_context = 7 # NEW: h1_return, h1_vol, h1_trend, h1_momentum, h1_rsi, h1_range, position_in_hour
Key Insights
- First layer width dominates network params (86-90%) — adding broadcast features is much cheaper than adding windowed features (no
× windowmultiplier on params, but still gives the model the info every step). - Broadcast pattern is the right level of abstraction for time-series context features — same-value-across-window matches how the model can already use volatility, momentum, multi-window features.
- Crypto overtrains at 200M steps (peak at 50-100M per memory) → modest network growth is preferred over full dual obs space.
- Feature count avoidance: Always check
get_target_features_from_obs_dim()lookup table before choosing a new total. 58 was deliberately chosen to avoid collision with v2.7.0's 59. - Position-in-hour is the cheapest, most useful feature — single broadcast scalar tells the model "you have 25%/50%/75%/100% of the hour's data" so it knows when to commit vs when to wait.
Testing Strategy
- Unit tests for
_precompute_hourly_context(): Verify h1_return/h1_volatility/position_in_hour on known prices - Output shape tests:
(n_envs, 100, 58)for env,(100, 58)for builders - Train/inference parity: Same prices → same hourly context values across env / inference builder / backtest builder
- Backward compat: v5.5.0 models (obs_dim=5100) still load and infer correctly
- Equity calendar tests (added after Phase 1): synthetic 26-bar/day equity timestamps with overnight gaps. Verify
position_in_hourfollows clock minute, h1 features masked at session boundaries. - Quick training smoke test on real BTC/USD data (Phase 1, 10M steps on H100)
25 tests in tests/test_hourly_context.py (18 baseline + 7 equity calendar). All pass.
Phase 1 Smoke Test Results (2026-04-11)
What worked end-to-end: precompute, obs assembly, the new mm(1024×5800, 5800×512) first-layer shape live in the policy network, torch.compile + CUDA graphs + FP8 + AMP, deploy zip, manifest, disconnect verification.
What needed fixing (3 bugs surfaced):
- Trainer Bug B above (
checkpoint_interval=0) — 1-line fix - Trainer Bug C above (buffer off-by-one) — 1-line fix
- Equity calendar bug in own hourly context implementation — clock-aligned via
compute_hourly_window_indices()helper, requirespd.DatetimeIndex.floor('h')andsearchsorted
Validation trajectory: Model learned then unlearned. PF climbed 0.12 → 0.94 over 6 validations, crashed to 0.50 at validation 7. HOLD% peaked at 61% then reverted to 26%. Direction reward briefly went positive (+0.07 at validation 4). Architecture works; 10M steps is far below convergence threshold for crypto.
Verdict: DROP (correct for a smoke test). Goal was "did the code run?" not "is there edge?"
Phase 2 step count revision based on Phase 1 trajectory:
- Crypto: thorough mode (100M each) instead of standard (50M). Memory says crypto peaks at 50-100M for v5.3.0; the trajectory shows BTC was still climbing at validation 6 of 10M, so 50M is borderline.
- Equity: standard mode (50M each). Equity has fewer bars per year, so 50M provides more passes over the data.
Workflow Pitfall (v5.6.0 Phase 1)
The training notebook extracts the project from a Drive zip (cell-6). After every code change, the zip must be rebuilt and re-uploaded — git pull does NOT work because the extracted directory isn't a git repo. This caused multiple Phase 1 iterations where the user re-ran the notebook and got the same error because the zip was stale.
Long-term fix: replace cell-6 with git clone --branch=stable from GitHub. Eliminates the sync problem permanently.
Short-term hotfix pattern: in-notebook Python overrides applied INSIDE the cell that uses the affected variable, NOT in a separate hotfix cell (which can be skipped or reordered).
References
alpaca_trading/gpu/vectorized_env.py:_precompute_hourly_context(),_get_observations()lines ~1846-1853,_calculate_obs_features()line ~1067alpaca_trading/gpu/inference_obs_builder.py:build_inference_observation()(target_features=58 branch),BacktestObservationBuilder._precompute_hourly_context()alpaca_trading/training/model_version.py: v5.6.0ModelSpec, version detectionnotebooks/training.ipynb: Cell 14 (TRAINING_TIMEFRAMES), Cell 24 (env_config)tests/test_hourly_context.py: 18 tests (feature count, model version, both builders, env config)CHANGELOG.md: v5.6.0 entry- See also:
multi-timeframe-trainingskill (the deprecated dual-model approach)