omnifinan - SKILL.md Agent Skill

name: omnifinan description: Guide AI to use the OmniFinan Python library for financial analysis, including multi-agent hedge fund workflow, data fetching (AkShare/YFinance/FRED/DBnomics/SEC EDGAR), macro indicators, technical/fundamental/sentiment/valuation analysis, bull-bear debate, risk management, portfolio decisions, backtesting, visualization, and report pipelines. Use when working with OmniFinan code, running analyses, modifying agents, or adding features to this financial analysis system.

OmniFinan Library Guide

OmniFinan is an AI-driven multi-agent hedge fund analysis system supporting US, HK, China, and crypto workflows. It uses LangGraph for agent orchestration, LangChain for LLM calls, and multiple data providers (AkShare, YFinance, SEC EDGAR, FRED, World Bank, IMF Datamapper, DBnomics, Tavily, Brave).

Project Structure

src/omnifinan/
├── __init__.py              # Exports: MarketType, run_hedge_fund
├── main.py                  # CLI entrypoint -> presentation.cli.run_cli()
├── unified_api.py           # Low-level AkShare/FRED/WorldBank data functions (3800+ lines)
├── data_models.py           # Pydantic models: Price, FinancialMetrics, LineItem, etc.
├── visualize.py             # Plotly charting: StockFigure, macro dashboards
├── backtester.py            # Backtester class for strategy evaluation
├── agents/                  # LangGraph agent nodes
│   ├── graphs.py            # Graph builders: create_trading_graph()
│   ├── state.py             # AgentState TypedDict
│   ├── nodes.py             # Node factories
│   ├── edges.py             # Conditional routing functions
│   ├── prompts.py           # Prompt constants
│   ├── market_data.py       # Data collection agent
│   ├── technicals.py        # Technical analysis agent
│   ├── fundamentals.py      # Fundamental analysis agent
│   ├── macro.py             # Macro analyst agent
│   ├── sentiment.py         # Sentiment analysis agent (uses LLM)
│   ├── valuation.py         # DCF/multiples valuation agent
│   ├── researcher_bull.py   # Bullish thesis generator (uses LLM)
│   ├── researcher_bear.py   # Bearish thesis generator (uses LLM)
│   ├── debate_room.py       # Bull-bear debate judge (uses LLM)
│   ├── risk_manager.py      # Position sizing / risk constraints
│   └── portfolio_manager.py # Final buy/sell/hold decisions (uses LLM)
├── core/
│   ├── config.py            # RuntimeConfig (env vars / YAML / JSON)
│   ├── workflow.py          # run_hedge_fund() orchestration
│   ├── observability.py     # RunTrace for metrics/cost tracking
│   └── experiment.py       # ExperimentRecorder for run comparison
├── data/
│   ├── cache.py             # DataCache: file-based request + dataset cache
│   ├── unified_service.py   # UnifiedDataService: cached provider wrapper
│   ├── symbols.py           # is_crypto_ticker() helper
│   └── providers/
│       ├── base.py          # DataProvider ABC
│       ├── factory.py       # create_data_provider(name)
│       ├── akshare_provider.py
│       ├── yfinance_provider.py
│       ├── sec_edgar_provider.py
│       └── moomoo_options_provider.py   # stock options provider
├── analysis/
│   ├── indicators.py        # XMA, cross_over, cross_under (TA-Lib wrappers)
│   ├── transform.py         # Feature engineering: returns, rolling features
│   ├── options.py           # BS pricing, IV, Greeks, max pain, OI levels, GEX, term structure, skew
│   ├── factor_mining.py     # CustomFactorSpec, add_candidate_factors, IC/RankIC evaluation
│   └── factor_backtest.py   # Cross-sectional factor backtesting, perf_stats
├── research/
│   ├── valuation.py         # dcf_intrinsic_value(), valuation_signal()
│   ├── factors.py           # Qlib-style DSL: ref, mean, std, rank, apply_factor
│   ├── report_pipeline.py   # PDF report -> LLM synthesis
│   └── report_parser.py     # ParsedReport via pypdf
├── llm/
│   ├── client.py            # call_llm(): unified LLM call with cache + retry
│   └── providers.py         # PROVIDER_REGISTRY: gpt/claude/gemini/deepseek
├── presentation/
│   ├── cli.py               # argparse CLI with questionary analyst selector
│   └── api.py               # Flask REST API: POST /analyze, GET /healthz
└── utils/
    ├── analysts.py          # ANALYST_CONFIG registry, get_analyst_nodes()
    ├── holidays.py          # Trading calendar filtering
    ├── normalization.py     # confidence_to_unit()
    ├── progress.py          # Progress tracking
    ├── display.py           # Console output formatting
    ├── scratchpad.py        # Scratchpad for run artifacts
    └── llm.py               # Convenience LLM wrapper

Core Concepts

AgentState

All agents receive and return AgentState:

class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    data: Annotated[dict[str, Any], merge_dicts]
    metadata: Annotated[dict[str, Any], merge_dicts]

data holds tickers, dates, prices, financial_metrics, macro_indicators, analyst_signals
metadata holds show_reasoning, model_name, provider_api, language, data_service, trace, scratchpad

Trading Graph Pipeline

start_node -> market_data_agent -> [analyst agents in parallel] -> investment_debate_start
-> researcher_bull_agent <-> researcher_bear_agent -> debate_room_agent
-> risk_start -> risk_management_agent -> execution_start -> portfolio_management_agent -> END

Analyst agents (run in parallel after market_data_agent):

technical_analyst_agent - trend, momentum, mean reversion, volatility, stat arb
fundamentals_agent - profitability, growth, financial health, valuation ratios
macro_analyst_agent - central bank rates, inflation, employment
sentiment_agent - news/inference-driven sentiment analysis node; data discovery is separate from get_company_news
valuation_agent - DCF, owner earnings, residual income, comparable multiples

Data Providers

Provider	Name strings	Markets	Capabilities
AkShare	`"akshare"`	CN, US, HK	Prices, financials, news, macro (CN+intl)
YFinance	`"yfinance"`, `"yf"`, `"yahoo"`	US, global, crypto	Prices, financials, news
SEC EDGAR	`"sec_edgar"`, `"sec"`	US	Financial metrics, line items from XBRL
DBnomics	n/a (API helper in `unified_api.py`)	Global macro	Macro time series catalog (providers/datasets/series). Use workflow: `/search` → `/datasets/{provider}/{dataset}` → fetch via `series_ids` or provider+dataset+dimensions. Avoid calling `/series` without selectors (can be slow/terminated).
Moomoo (options-only)	`provider="moomoo"`	US stock/index options	Stock option chain fetch only

Ticker normalization includes volatility indices:

VIX / .VIX -> ^VIX
VVIX / .VVIX -> ^VVIX

Default stock-options routing:

auto uses moomoo.
Futures option chain is explicit unavailable in the current provider stack.

Create with: create_data_provider("akshare")

DataProvider ABC

All providers implement:

get_prices(ticker, start_date, end_date, interval) -> list[Price]
get_financial_metrics(ticker, end_date, period, limit) -> list[FinancialMetrics]
search_line_items(ticker, period, limit) -> list[LineItem]
get_company_news_raw(ticker, start_date, end_date, limit) -> list[CompanyNews]
get_insider_trades(ticker, end_date, start_date, limit) -> list[InsiderTrade]
get_market_cap(ticker, end_date) -> float | None
get_macro_indicators(start_date, end_date) -> dict

UnifiedDataService

Wraps a DataProvider with intelligent caching via DataCache. Key behaviors:

Price data: Incremental fetch - backfills gaps, appends new data, avoids redundant downloads
Financial metrics/line items: Refetch when latest report is >30 days stale
Macro indicators: Datasets-first strategy (stored in datasets/macro_indicators_master/, immune to TTL-based cleanup_expired). Uses series-level staleness + subset refresh. When force=False, the service must never auto full-refresh; if subset refresh is unavailable or fails, it logs ERROR and continues with cached payload. Metrics explicitly retired from the active macro interface are removed from outputs rather than emitted as stale placeholders.
Company news:
- A-shares use AkShare raw news
- US/HK use Tavily-first and Brave supplemental search
- results are clustered into integrated events
- cross-verification uses actual publisher/domain weights, not search-engine counts

Public API note:

UnifiedDataService.get_company_news(...) returns integrated news events
provider-layer get_company_news_raw(...) is internal raw discovery only
Insider trades: Incremental fetch by filing_date
Crypto: Auto-routes to YFinance for crypto tickers
Options:
get_stock_option_chain() defaults to provider="auto" (moomoo)
get_futures_option_chain() returns explicit unavailable in the current provider stack
- default snapshot mode is previous-business-day close (snapshot_mode="prev_close")

from omnifinan.data.cache import DataCache
from omnifinan.data.providers.factory import create_data_provider
from omnifinan.data.unified_service import UnifiedDataService

service = UnifiedDataService(
    provider=create_data_provider("akshare"),
    cache=DataCache(),
    ttl_seconds=3600,
)
prices = service.get_prices("600519", "2025-01-01", "2025-12-31")
macro = service.get_macro_indicators("2025-01-01", "2025-12-31")
structured = service.get_macro_indicators_structured("2025-01-01", "2025-12-31")
stock_opts = service.get_stock_option_chain("AAPL", expiration="2026-06-19")
fut_opts = service.get_futures_option_chain("ES", expiration="2026-06-19")

Macro Data Architecture

Source policy: fixed_sources_with_dbnomics_proxies
Data-source credentials: read from OMNIX_PATH/finn_api.json
- Recommended node keys: FRED.api_key, tavily.api_key, brave.api_key
Master payload stored as dataset with snapshot history
Staleness detection: per-series based on cycle_days * 3 threshold
Refetch cooldown: fetched_at + frequency-based minimum interval
Subset refresh when provider supports get_macro_indicators_subset()
Structured output keys: meta, dimensions, metrics, chart_data

Structured Macro Output

get_macro_indicators_structured() returns:

meta: snapshot_at, source_policy, counts
dimensions: growth, inflation, liquidity, credit, market_feedback
metrics: per-series cards with yoy, mom, qoq, trend_short, trend_medium, volatility
chart_data.long: flattened list for plotting

Factor Mining Framework

Quantitative factor mining pipeline inspired by qlib. Supports the full cycle: factor creation -> IC evaluation -> cross-sectional backtest.

Dependencies

scipy is required for Spearman Rank IC computation (evaluate_factors).

Built-in Factor Generation

from omnifinan.analysis.factor_mining import add_candidate_factors

# Input: panel DataFrame with columns [date, symbol, close, high, low, volume]
factored = add_candidate_factors(panel, forward_horizon=5)
# Adds: ret_1, ret_5, ret_20, mom_ma_5_20, mom_ma_20_60,
#        volatility_20, amplitude_1, vol_ratio_20, rev_5, fwd_ret_5

Cross-Sectional Z-Score

from omnifinan.analysis.factor_mining import zscore_by_date

zscored, z_cols = zscore_by_date(factored, ["ret_5", "mom_ma_5_20"])
# z_cols = ["ret_5_z", "mom_ma_5_20_z"]

IC Evaluation

from omnifinan.analysis.factor_mining import evaluate_factors, daily_ic

# Full report: IC mean/std/IR + Rank IC mean/std/IR for each factor
report = evaluate_factors(zscored, z_cols, label_col="fwd_ret_5")

# Single factor IC time series
ic_ts = daily_ic(zscored, "ret_5_z", label_col="fwd_ret_5", method="spearman")

Custom Factors

from omnifinan.analysis.factor_mining import apply_custom_factors, CustomFactorSpec

def rolling_sharpe(g, window=20):
    ret = g["close"].pct_change()
    return ret.rolling(window).mean() / ret.rolling(window).std()

specs = [
    CustomFactorSpec(name="sharpe_20", func=rolling_sharpe, kwargs={"window": 20}),
]
extended = apply_custom_factors(factored, specs)

# Or dict shorthand:
extended = apply_custom_factors(factored, {"ret_3": lambda g: g["close"].pct_change(3)})

Factor Backtest

from omnifinan.analysis.factor_backtest import (
    build_cross_sectional_weights, run_daily_backtest, perf_stats,
)

weights = build_cross_sectional_weights(
    df, score_col="mom_ma_20_60_z", quantile=0.2, long_short=True,
)
bt = run_daily_backtest(df, weights, cost_rate=0.001)
stats = perf_stats(bt["net_ret"], bt["equity"])
# Returns: total_return, annual_return, annual_vol, sharpe, max_drawdown, win_rate

Qlib-Style Factor DSL

from omnifinan.research.factors import ref, mean, std, rank, apply_factor

# Primitives
ref(series, 1)        # shift(1)
mean(series, 5)       # rolling(5).mean()
std(series, 20)       # rolling(20).std()
rank(series, 20)      # rolling(20) rank of latest value

# String expression interface
result = apply_factor("Ref($close,1)", df)
result = apply_factor("Mean($close,5)", df)

LLM Execution Guidance

This section is for LLM runtime orchestration guidance only.

Preferred Data-First API Path

For non-LLM analytics tasks, prefer direct UnifiedDataService APIs instead of full multi-agent orchestration:

get_prices
get_financial_metrics
get_line_items
get_company_news
get_insider_trades
get_macro_indicators
get_macro_indicators_structured
get_stock_option_chain
get_futures_option_chain
get_stock_option_chain_analytics (new; non-LLM)
get_stock_option_gex (new; non-LLM; estimated GEX summary)
- supports gex_expiration (default None):
  - None => full-chain GEX
  - YYYY-MM-DD => GEX for that expiry only
- supports exp_date post-fetch expiry bucket filters:
  - all
  - 0dte
  - Ndte (nearest available DTE bucket, e.g. 7dte)
  - monthly
  - quarterly
  - combinable with +, e.g. 7dte+monthly
get_futures_option_chain_analytics (new; non-LLM)

Option Analytics (Non-LLM) Guidance

When the task asks for IV/skew/term-structure/Greeks, use analytics APIs first:

UnifiedDataService.get_stock_option_chain_analytics(...)
- risk_free_rate can be omitted; service resolves from macro yields (us_treasury_2y then us_treasury_10y) before fallback.
- contract_multiplier supports explicit override; default is 100.
- provider iv inputs are normalized to decimal volatility at chain level before Greeks/GEX computation, so values such as 87.4 are interpreted as 87.4%, not 87.4x vol.
UnifiedDataService.get_futures_option_chain_analytics(...)

Market compatibility rule:

China A-share / HK equity tickers do not have options support in current provider stack.
For those tickers, stock-option APIs return meta.source = "fixed_sources_unavailable" with explicit meta.error.
LLM should continue the broader analysis flow without treating this as a fatal error.
Crypto pair symbols are normalized to base asset for options endpoints:
- BTC-USDT, BTC-USD, BTCUSDT -> BTC
- ETH-USDT, ETH-USD, ETHUSDT -> ETH

Return contract from analytics APIs:

meta (inherits chain metadata + analytics_version)
data (raw option rows)
raw (provider raw payload)
analytics.summary (option_count, enriched_count, underlying_price, median_iv)
analytics.surface (per-contract normalized metrics with IV/Greeks)
analytics.term_structure (ATM IV by expiry)
analytics.skew_by_expiry (risk_reversal_25d, butterfly_25d, ATM IV)
analytics.smile_by_expiry (IV smile points by strike/moneyness per expiry)
analytics.max_pain (overall and per-expiry max pain strike)
analytics.levels (primary support/resistance from put/call OI walls)
analytics.implied_vs_realized (current_atm_iv, historical_volatility, iv_minus_hv, iv_to_hv_ratio)
analytics.summary.iv_historical_percentile (requires iv_history input)
analytics.errors (explicit calculation issues)

get_stock_option_gex(...) return notes:

gex_data.metadata.gamma_flip_price is only populated when net GEX crosses zero inside the internal spot sweep band (0.7x to 1.3x current spot).
If no sign change occurs inside that band, gamma_flip_price remains null; this means “no zero-gamma root found in search band”, not “missing data”.

LLM-as-Glue Fallback Pattern

If a step requires semantic generation but nested runtime model calls are not desired:

Pause at the step and collect exact upstream state.
Generate the structured output in the current LLM context.
Write back to the exact state path expected downstream.
Resume remaining deterministic steps.

Required compatibility write-back path example:

state["data"]["analyst_signals"]["sentiment_agent"][ticker]
fields: signal, confidence, reasoning

Keep external contracts stable:

Top-level result keys (e.g., decisions, analyst_signals)
Macro structured keys (meta, dimensions, metrics, chart_data)

Orchestration Interfaces (Still Available)

These orchestration interfaces remain supported; use them when task intent is end-to-end execution rather than atomic data/analytics calls.

omnifinan.run_hedge_fund(...) (full multi-agent workflow)
omnifinan.core.workflow.run_hedge_fund(...) (workflow module entry)
omnifinan.backtester.Backtester (backtest runner)
omnifinan.presentation.api.create_app() (REST API entry)
omnifinan.visualize.StockFigure / omnifinan.visualize.create_macro_figure (charting)

Configuration

Environment Variables

Variable	Default	Description
`OMNIFINAN_DATA_PROVIDER`	`akshare`	Data provider name
`OMNIFINAN_MARKET_TYPE`	`china`	Market: us, china, hongkong
`OMNIFINAN_DATA_CACHE_TTL`	`3600`	Cache TTL in seconds
`OMNIFINAN_MODEL_NAME`	`deepseek-chat`	LLM model
`OMNIFINAN_PROVIDER_API`	`deepseek`	LLM provider
`OMNIFINAN_LANGUAGE`	`Chinese`	Output language
`OMNIFINAN_MODEL_TEMPERATURE`	`0.2`	LLM temperature
`OMNIFINAN_DEBATE_ROUNDS`	`1`	Bull-bear debate rounds
`OMNIFINAN_DETERMINISTIC_MODE`	`1`	Enable LLM response caching
`OMNIFINAN_LLM_SEED`	`7`	LLM seed for reproducibility
`OMNIFINAN_ENABLED_ANALYSTS`	(all)	Comma-separated analyst keys
`OMNIFINAN_CONFIG_PATH`	(none)	Path to YAML/JSON config file

Config File (YAML/JSON)

data_provider: akshare
market_type: china
data_cache_ttl_seconds: 3600
debate_rounds: 2
deterministic_mode: true
enabled_analysts:
  - technical_analyst
  - fundamentals_analyst
  - macro_analyst
llm:
  model_name: deepseek-chat
  provider_api: deepseek
  temperature: 0.2
  max_retries: 3
  language: Chinese

Available Analyst Keys

technical_analyst - Technical indicators and pattern analysis
fundamentals_analyst - Financial statement analysis
macro_analyst - Macroeconomic indicators analysis
sentiment_analyst - News and insider trading sentiment
valuation_analyst - Intrinsic value estimation

Data Models

Key Pydantic Models

Model	Key Fields
`Price`	open, close, high, low, volume, time, market
`FinancialMetrics`	ticker, report_period, period, currency, market_cap, PE/PB/PS ratios, margins, ROE/ROA, growth rates
`LineItem`	ticker, report_period, net_income, operating_revenue, free_cash_flow (extra="allow")
`CompanyNews`	provider-level raw article row: ticker, title, source, date, url
`IntegratedNewsEvent`	event_id, ticker, headline, published_at, primary_source, weighted_source_score, consensus_passed
`InsiderTrade`	ticker, filing_date, transaction_date, transaction_shares, transaction_value
`MarketType`	US, CHINA, CHINA_SZ, CHINA_SH, HK, UNKNOWN

Runtime Data Paths

All runtime data under OMNIX_PATH/omnifinan/:

request_cache/ - API response cache (hashed JSON files)
datasets/ - Persistent datasets (prices, financials, macro history)
reports/ - Output reports and experiment records
logs/ - Application logs

LLM Integration

from omnifinan.llm.client import call_llm

# Plain text response
text = call_llm(
    prompt="Analyze this data...",
    model_name="deepseek-chat",
    provider_api="deepseek",
)

# Structured Pydantic response
from pydantic import BaseModel
class Analysis(BaseModel):
    signal: str
    confidence: float

result = call_llm(
    prompt="...",
    model_name="deepseek-chat",
    provider_api="deepseek",
    pydantic_model=Analysis,
    deterministic_mode=True,  # enables response caching
    trace=trace,              # optional RunTrace
    scratchpad=scratchpad,    # optional Scratchpad
)

Supported providers: deepseek, openai (gpt), anthropic (claude), google (gemini)

Testing

Run verification tests after changes:

# Core macro logic
pytest tests/test_macro_source_policy.py
pytest tests/test_macro_structured.py
pytest tests/test_macro_visualize.py

# Factor mining and backtest (requires scipy)
pytest tests/test_factor_mining.py
pytest tests/test_factor_backtest.py

# Other test suites
pytest tests/test_agent_graphs.py
pytest tests/test_agent_edges.py
pytest tests/test_data_cache.py
pytest tests/test_llm_client.py
pytest tests/test_runtime_config.py
pytest tests/test_sec_edgar_provider.py
pytest tests/test_symbols.py

Key Constraints (from AGENTS.md)

Source policy: Do not change macro source policy unless explicitly requested
Runtime data: Never write hot data into the repo; use OMNIX_PATH/omnifinan/
Output stability: Preserve unified API structures (meta, dimensions, metrics, chart_data)
Minimal edits: Keep changes deterministic and technically concise
Anti-loop: Avoid repeated full refresh loops when sources have no delta
Local-first: Prefer cached data for repeated analysis/report generation

Adding a New Analyst Agent

Create src/omnifinan/agents/your_analyst.py with function signature:

def your_analyst_agent(state: AgentState) -> AgentState:
    # Read from state["data"], state["metadata"]["data_service"]
    # Write signals to state["data"]["analyst_signals"][ticker]["your_analyst_agent"]
    return {"messages": state["messages"], "data": {...}, "metadata": state["metadata"]}

Register in src/omnifinan/utils/analysts.py ANALYST_CONFIG
The graph builder (graphs.py) auto-wires registered analysts

Adding a New Data Provider

Create src/omnifinan/data/providers/your_provider.py implementing DataProvider ABC
Register in src/omnifinan/data/providers/factory.py
All 7 abstract methods must be implemented

Ticker Format

China A-shares: 6-digit code (e.g., 600519, 000001)
Hong Kong: 5-digit zero-padded (e.g., 00700)
US: Standard symbols (e.g., AAPL, MSFT)
Crypto: Pair format (e.g., BTC-USD, ETHUSDT) - auto-routed to YFinance

For detailed macro series reference and structured output schema, see macro-reference.md.