dynamo-differential-geometry-analysis - SKILL.md Agent Skill

name: dynamo-differential-geometry-analysis description: Run downstream differential-geometry analysis on a `dynamo` vector-field `AnnData`, including velocity, acceleration, curvature, Jacobian, regulatory-network, ddhodge pseudotime, and state-graph branches. Use when adapting the `403_Differential_geometry.ipynb` tutorial, extending a conventional spliced/unspliced RNA velocity workflow into vector calculus, or choosing among `method`, `mode`, `sampling`, `formula`, `adjmethod`, or `gene_order_method` branches.

Dynamo Differential Geometry Analysis

Goal

Turn a conventional dynamo RNA-velocity dataset into reusable downstream differential-geometry outputs in PCA space, then optionally continue into Jacobian-based regulatory-network analysis, ddhodge pseudotime, kinetic heatmaps, and state graphs without treating the zebrafish notebook as the skill identity.

Quick Workflow

Inspect whether the user already has a downstream-ready AnnData with velocity_pca and VecFld_pca, or whether raw conventional preprocessing and velocity fitting still need to happen.
If starting from raw spliced / unspliced counts, preprocess with recipe='monocle', fit dynamics, build the embedding, and compute both low-dimensional and PCA-space velocities.
Fit dyn.vf.VectorField(..., basis='pca') before calling acceleration, curvature, Jacobian, or ddhodge on the PCA basis.
Run the stage the user actually needs: scalar/vector geometry, Jacobian and ranking, regulatory-network extraction, or ddhodge pseudotime and state-graph analysis.
Validate the concrete storage keys produced by each stage before interpreting plots or rankings.
Export ranking tables or clean the object only after all downstream calculations that depend on kinetics_heatmap, fate, or vector-field internals are finished.

Interface Summary

Preprocessor.preprocess_adata(adata, recipe='monocle', tkey=None, experiment_type=None) is the conventional preprocessing wrapper.
dyn.tl.reduceDimension(..., reduction_method='umap') is the default embedding path; current source also exposes trimap, diffusion_map, tsne, psl, and sude.
dyn.tl.cell_velocities(..., basis=None) is the low-dimensional and PCA-space velocity entrypoint. In the current runtime, the basis='pca' branch may need an explicit transition_genes argument on smaller subsets.
dyn.vf.VectorField(..., basis='pca', method='SparseVFC', pot_curl_div=False, **kwargs) reconstructs the PCA-space vector field used by downstream differential geometry.
dyn.vf.acceleration(..., basis='pca', method='analytical') and dyn.vf.curvature(..., basis='pca', formula=2, method='analytical') generate per-cell and gene-space geometry outputs.
dyn.vf.jacobian(..., sampling=None, sample_ncells=1000, basis='pca', method='analytical') computes cell-wise Jacobians; dyn.vf.rank_jacobian_genes(..., mode=...) and dyn.vf.build_network_per_cluster(...) turn them into grouped rankings and edge lists.
dyn.ext.ddhodge(..., basis='pca', adjmethod='graphize_vecfld', sampling_method='velocity') creates vector-field pseudotime and potential, while dyn.pl.kinetic_heatmap(..., mode='vector_field' | 'lap' | 'pseudotime') and dyn.pd.state_graph(..., method='vf' | 'markov' | 'naive') consume those outputs.
dyn.export_rank_xlsx(..., rank_prefix='rank') exports ranking tables from .uns, and dyn.cleanup(..., del_prediction=False, del_2nd_moments=False) strips heavyweight internals before save.

Read references/source-grounding.md before documenting narrower branch behavior than the current source supports.

Stage Selection

Use the bootstrap stage when the user starts from raw conventional spliced / unspliced data instead of an already fitted vector field.
Use the geometry-ranking stage when the user wants velocity_S, acceleration, or curvature rankings by cell group.
Use the Jacobian-network stage when the user wants regulator and effector ranking, interaction ranking, or a cluster-specific edge list.
Use the ddhodge-state stage when the user wants vector-field pseudotime, kinetic heatmaps, or cell-state transition graphs.
Keep basis='pca' for vector-field reconstruction, Jacobian, divergence-like ranking, ddhodge, and notebook-style state graphs. Use basis='umap' mainly for display or upstream velocity plots.
Keep method='SparseVFC', curvature(..., formula=2), jacobian(..., method='analytical'), ddhodge(..., adjmethod='graphize_vecfld'), and state_graph(..., method='vf') as the default reusable path unless the user explicitly asks for a different branch.

Read references/stage-selection.md before choosing non-default recipe, method, formula, sampling, mode, adjmethod, or gene_order_method branches.

Input Contract

Expect an AnnData with conventional spliced / unspliced layers if the user wants the full bootstrap path.
Expect adata.obsm['X_pca'], adata.var['use_for_pca'], and adata.layers['velocity_S'] before PCA-space vector-field analysis.
Expect adata.obsm['velocity_pca'] and adata.uns['VecFld_pca'] before calling acceleration, curvature, Jacobian, ddhodge, or state_graph(..., method='vf', basis='pca').
Expect a meaningful grouping column such as adata.obs['Cell_type'] before using ranking helpers or build_network_per_cluster(...).
Expect adata.uns['PCs'] or adata.varm['PCs'] before top_pca_genes(...) or any PCA-basis inverse transform.
Treat notebook-specific zebrafish cell-type labels and genes such as tfec and pnp4a as worked-example defaults, not hard requirements.

If the user only wants upstream preprocessing or only wants pseudotime-derived velocity without conventional kinetics, route to a more appropriate skill instead of forcing this downstream analysis path.

Minimal Execution Patterns

For the default bootstrap from raw conventional zebrafish-style data:

import dynamo as dyn

adata = dyn.sample_data.zebrafish()
adata.obs_names_make_unique()

pre = dyn.pp.Preprocessor(cell_cycle_score_enable=True)
pre.preprocess_adata(adata, recipe="monocle")

dyn.tl.dynamics(adata, cores=1)
dyn.tl.reduceDimension(adata)
dyn.tl.cell_velocities(adata)
dyn.tl.cell_velocities(
    adata,
    basis="pca",
    transition_genes=adata.var.use_for_pca.values,
)

dyn.vf.VectorField(adata, basis="pca", M=50, cores=1)

For geometry and grouped ranking:

dyn.vf.rank_velocity_genes(adata, groups="Cell_type", vkey="velocity_S")

dyn.vf.acceleration(adata, basis="pca")
dyn.vf.rank_acceleration_genes(adata, groups="Cell_type", akey="acceleration")

dyn.vf.curvature(adata, basis="pca", formula=2)
dyn.vf.rank_curvature_genes(adata, groups="Cell_type", ckey="curvature")

For Jacobian ranking and cluster-specific network extraction:

dyn.pp.top_pca_genes(adata, n_top_genes=100)
genes = adata.var_names[adata.var["top_pca_genes"]][:50].tolist()

dyn.vf.jacobian(
    adata,
    regulators=genes,
    effectors=genes,
    sampling="trn",
    sample_ncells=200,
    basis="pca",
)

full_reg = dyn.vf.rank_jacobian_genes(
    adata,
    groups="Cell_type",
    mode="full_reg",
    abs=True,
    return_df=True,
)

reg_rank = dyn.vf.rank_jacobian_genes(
    adata,
    groups="Cell_type",
    mode="reg",
    abs=True,
    output_values=True,
    return_df=True,
)

edges = dyn.vf.build_network_per_cluster(
    adata,
    cluster="Cell_type",
    cluster_names=["Unknown"],
    full_reg_rank=full_reg,
    genes=genes[:20],
    n_top_genes=10,
    abs=True,
)

For ddhodge pseudotime, kinetic heatmaps, and state graphs:

dyn.ext.ddhodge(adata, basis="pca", sampling_method="velocity")

transition_genes = adata.var_names[adata.var["top_pca_genes"]][:20].tolist()

heat = dyn.pl.kinetic_heatmap(
    adata,
    genes=transition_genes,
    basis="pca",
    mode="pseudotime",
    tkey="pca_ddhodge_potential",
    gene_order_method="maximum",
    save_show_or_return="return",
)

dyn.pd.state_graph(
    adata,
    group="Cell_type",
    basis="pca",
    method="vf",
    sample_num=30,
)

For export and cleanup after ranking:

dyn.export_rank_xlsx(adata, path="result/rank_info.xlsx", rank_prefix="rank")
dyn.cleanup(adata, del_prediction=False, del_2nd_moments=False)

Validation

After bootstrap preprocessing, check these items:

adata.obsm["X_pca"] exists
adata.obsm["X_umap"] exists if you ran the default embedding path
adata.var["use_for_pca"] and adata.var["use_for_dynamics"] exist
adata.layers["velocity_S"] exists after cell_velocities(...)

After PCA velocity and vector-field fitting, check these items:

adata.obsm["velocity_pca"] exists
adata.uns["VecFld_pca"] exists
adata.uns["VecFld_pca"] includes X, Y, and vector-field parameters

After geometry ranking, check these items:

adata.layers["acceleration"] exists
adata.layers["curvature"] exists
adata.obs["acceleration_pca"] and adata.obs["curvature_pca"] exist
ranking tables were written into .uns with rank or rank_abs prefixes

After Jacobian and network analysis, check these items:

adata.uns["jacobian_pca"] exists
adata.uns["jacobian_pca"] includes jacobian_gene, cell_idx, regulators, and effectors
full_reg contains one table per requested group
edges[group_name] is a DataFrame with regulator, target, and weight

After ddhodge, heatmap, and state graph, check these items:

adata.obsp["pca_ddhodge"] exists
adata.obs["pca_ddhodge_potential"] exists
adata.uns["kinetics_heatmap"] exists after kinetic_heatmap(...)
adata.uns["Cell_type_graph"] exists after state_graph(...)

After cleanup and export, check these items:

the Excel file exists at the requested path
cleanup(...) removed kinetics_heatmap if it existed
cleanup(...) did not remove outputs you still need for downstream interpretation

Constraints

Do not assume dyn.sample_data.zebrafish() is already processed in the current runtime. Reviewer execution saw raw counts with spliced and unspliced layers but no downstream embedding or vector-field outputs.
Call adata.obs_names_make_unique() on the zebrafish worked example before concatenation or heavy preprocessing. Reviewer execution saw non-unique observation names.
Do not assume dyn.tl.cell_velocities(adata, basis='pca') is stable on small subsets without extra guidance. Reviewer execution needed transition_genes=adata.var.use_for_pca.values to avoid a PCA-projection failure.
Do not treat jacobian(...) as a whole-transcriptome default on large datasets. Narrow with top_pca_genes(...), explicit regulators and effectors, or cell sampling first.
Do not imply that rank_divergence_genes(...) computes geometric divergence in the usual trace sense. Current source ranks diagonal Jacobian elements after jacobian(...).
ddhodge(..., adjmethod='naive') depends on a preexisting transition matrix. Keep adjmethod='graphize_vecfld' unless the user explicitly wants the alternate branch and already has the needed adjacency.
kinetic_heatmap(...) stores results under adata.uns["kinetics_heatmap"], and cleanup(...) removes that key.
VectorField(..., method='dynode') is a real branch but requires an additional backend. Do not recommend it by default.

Resource Map

Read references/stage-selection.md when choosing among bootstrap, geometry ranking, Jacobian/network, and ddhodge/state-graph stages.
Read references/jacobian-and-network-analysis.md when the user wants Jacobian ranking modes, divergence-like ranking, or cluster-specific regulatory edges.
Read references/pseudotime-and-state-graph.md when the user wants ddhodge, kinetic_heatmap, or state_graph.
Read references/source-grounding.md for inspected signatures, source-level branch evidence, and reviewer-run empirical execution notes.
Read references/source-notebook-map.md to trace 403_Differential_geometry.ipynb into this reusable skill layout.
Read references/compatibility.md when notebook prose and current runtime behavior diverge.
Use assets/acceptance.json for the bounded smoke path used by local acceptance.