name: mathematical-data-science-foundations description: Apply mathematical and algorithmic foundations for data science. Use when Codex needs to explain, implement, review, or choose methods involving high-dimensional geometry, concentration, random projections, SVD, PCA, matrix approximation, random walks, Markov chains, streaming algorithms, sketching, sampling, clustering, random graphs, topic models, NMF, HMMs, graphical models, compressed sensing, or wavelets.
Mathematical Data Science Foundations
Use this for algorithmic and mathematical data science questions that require more than ordinary EDA or modeling workflow.
Domain Context Requirement
Use the Domain Context Contract to decide whether a mathematical method is appropriate for the actual domain object, data scale, resource constraint, success metric, and failure mode. Translate abstract assumptions into domain terms before recommending or implementing a method.
Routing
- High-dimensional data: check concentration, distance behavior, random projection, and scaling assumptions.
- SVD/PCA/matrix methods: check centering, rank choice, reconstruction error, stability, and interpretability.
- Random walks/Markov chains: check state space, transition matrix, stationarity, convergence, and graph interpretation.
- Streaming/sketching/sampling: check memory bound, pass count, approximation error, and failure probability.
- Clustering: check distance metric, scaling, cluster-shape assumptions, initialization, stability, and validation.
- Graphs/networks: check graph construction, edge weights, degree distribution, connectedness, and community assumptions.
- Topic models/NMF/HMMs/graphical models: check factorization assumptions, identifiability, priors, and inference quality.
- Compressed sensing/wavelets: check sparsity assumptions, basis choice, reconstruction objective, and signal/noise behavior.
Review Checklist
- State the relevant domain contract fields: domain object, unit of analysis, target/KPI, success metric, and operational constraint.
- State the mathematical object: vector space, matrix, graph, stream, sequence, or probabilistic model.
- State the algorithmic objective and resource constraint.
- List assumptions needed for correctness or useful approximation.
- Identify measurable diagnostics.
- Give a simple baseline or sanity check.
- Explain failure modes in applied data terms.