name: create-module description: Author a new Cartography intel module end-to-end (entry point, sync GET/TRANSFORM/LOAD/CLEANUP, declarative data model, integration test, schema docs). Use when the user asks to add a new provider, integration, intel module, or service ingestion to Cartography (e.g. "add a new module for service X", "integrate ServiceY", "create a sync for Z API").
create-module
Build a brand new Cartography intel module from scratch using the modern declarative data model. The module must follow the standard sync pattern (get -> transform -> load -> cleanup) and be exercised by an integration test.
Critical rules
- Use the data model, not handwritten Cypher. Call
load()/load_matchlinks()fromcartography.client.core.tx, andGraphJob.from_node_schema()for cleanup. - Sub-resource relationships always point to a tenant-like node (AWSAccount, AzureSubscription, GCPProject, GitHubOrganization, your
<Service>Tenant). Never to an infrastructure parent. - Required fields use direct dict access, optional fields use
.get()withNonedefault. Do not silently swallow exceptions inget(). - Only standard schema fields: any custom field added to a
CartographyNodeSchema/CartographyRelSchemasubclass is ignored. See theadd-node-typeandadd-relationshipskills. - Integration tests must call
sync(), not individualload()calls. Mock only external boundaries (API clients, credentials). - All commits use
git commit -s(DCO).
Instructions
Step 1 — Lay out the module
cartography/intel/your_service/
├── __init__.py # Entry point: start_your_service_ingestion()
└── users.py # Domain sync (or devices.py, projects.py, etc.)
cartography/models/your_service/
├── tenant.py # Tenant/account schema
└── user.py # Domain schemas
tests/data/your_service/
└── users.py # Mock API payloads
tests/integration/cartography/intel/your_service/
└── test_users.py # End-to-end sync test
The entry point (__init__.py) reads from Config, validates required credentials, builds common_job_parameters, and dispatches to per-domain sync() functions. See references/sync-pattern.md for a copy-paste template.
Step 2 — Wire CLI + Config
In cartography/cli.py:
- Add
PANEL_YOUR_SERVICE = "Your Service Options"and register it inMODULE_PANELS. - Add Typer options inside
_build_app().run()(useAnnotated[Optional[str], typer.Option(...)]withrich_help_panel=PANEL_YOUR_SERVICE). - Resolve secrets from
os.environand pass them intocartography.config.Config(...).
In cartography/config.py, extend Config.__init__ with the new fields. Then in your module entry point, validate them and short-circuit with logger.info("... not configured - skipping module") when missing.
Step 3 — Register the module in cartography/sync.py
Add one entry to TOP_LEVEL_MODULES using the lazy wrapper. Do not add a top-level import cartography.intel.your_service to sync.py — that defeats lazy SDK loading and reintroduces the slow-startup problem.
TOP_LEVEL_MODULES = OrderedDict({
...
"your_service": _LazyStage("cartography.intel.your_service", "start_your_service_ingestion"),
...
# `analysis` must remain last
"analysis": _LazyStage("cartography.intel.analysis", "run"),
})
Pick a sensible position relative to neighbors (cloud providers grouped together, etc.). The provider's heavy SDK imports stay where they are — they only fire when this stage is selected and run.
Step 4 — Implement the sync pattern
For each domain (users, devices, projects, ...):
@timeit
def sync(
neo4j_session: neo4j.Session,
api_key: str,
tenant_id: str,
update_tag: int,
common_job_parameters: dict[str, Any],
) -> None:
raw = get(api_key, tenant_id) # 1. GET — dumb, raises on failure
data = transform(raw) # 2. TRANSFORM — shape for ingest
load_users(neo4j_session, data, tenant_id, update_tag) # 3. LOAD — data model
cleanup(neo4j_session, common_job_parameters) # 4. CLEANUP — GraphJob
get() should be minimal: set timeouts, call response.raise_for_status(), and let errors propagate. AWS get-functions wrap with @aws_handle_regions. See references/sync-pattern.md for the long-form template, error-handling rules, and transform examples.
Step 5 — Define the data model
Create dataclasses in cartography/models/your_service/. Required for every node:
@dataclass(frozen=True)
class YourServiceUserNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("id") # REQUIRED
lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True) # REQUIRED
# business properties...
tenant_id: PropertyRef = PropertyRef("TENANT_ID", set_in_kwargs=True)
The schema picks a label, properties, and the mandatory sub_resource_relationship to your tenant-like node:
@dataclass(frozen=True)
class YourServiceUserSchema(CartographyNodeSchema):
label: str = "YourServiceUser"
properties: YourServiceUserNodeProperties = YourServiceUserNodeProperties()
sub_resource_relationship: YourServiceTenantToUserRel = YourServiceTenantToUserRel()
other_relationships: OtherRelationships = OtherRelationships([
YourServiceUserToHumanRel(),
])
For advanced node configurations (extra labels, conditional labels, scoped cleanup, one-to-many) see the add-node-type skill. For relationships, MatchLinks, and multi-module patterns see the add-relationship skill. See references/data-model.md for the full reference.
Step 6 — Load + cleanup
def load_users(neo4j_session, data, tenant_id, update_tag):
load(neo4j_session, YourServiceTenantSchema(), [{"id": tenant_id}], lastupdated=update_tag)
load(neo4j_session, YourServiceUserSchema(), data, lastupdated=update_tag, TENANT_ID=tenant_id)
def cleanup(neo4j_session, common_job_parameters):
GraphJob.from_node_schema(YourServiceUserSchema(), common_job_parameters).run(neo4j_session)
If you hand-write a Cypher write query during prototyping, use run_write_query() (managed transaction + retries), never neo4j_session.run().
Step 7 — Integration test
In tests/integration/cartography/intel/your_service/test_users.py, patch only get() and call sync() end-to-end. Assert outcomes (nodes + relationships) using tests.integration.util.check_nodes / check_rels. Do not assert on mock call counts or internal parameters. See references/testing.md for a full template and the test boundary policy.
Step 8 — Schema documentation
Add a page at docs/root/modules/your_service/schema.md. Use ### for node names, #### for the "Relationships" subsection, bold indexed/primary fields. If the node has a semantic label, add the standard ontology mapping blockquote (see the enrich-ontology skill).
Step 9 — Optional: analysis jobs
If the module needs post-ingestion enrichment (internet exposure, permission inheritance, cross-resource linking), call run_analysis_job() / run_scoped_analysis_job() at the end of the entry point. See the analysis-jobs skill.
Step 10 — Pre-submission checks
make lint
# integration test for the module:
pytest tests/integration/cartography/intel/your_service/ -x
Sign every commit: git commit -s -m "...". Update the PR description to match .github/pull_request_template.md.
Final checklist
- Entry point validates config and skips cleanly when unconfigured
- CLI panel +
Configfields wired, secrets resolved from env vars - Module registered in
cartography/sync.py:TOP_LEVEL_MODULESvia_LazyStage, with no top-levelimport cartography.intel.<service>added tosync.py - Sync follows GET -> TRANSFORM -> LOAD -> CLEANUP
- All schemas use only standard fields (
label,properties,sub_resource_relationship,other_relationships,extra_node_labels,scoped_cleanup) - Sub-resource relationship targets a tenant-like node
- Required fields use
data["x"], optional usedata.get("x")withNonedefault -
extra_index=Trueset on frequently queried fields - Integration test exercises
sync(), asserts nodes + rels withcheck_nodes/check_rels - Schema doc added under
docs/root/modules/your_service/schema.md -
make lintclean,git commit -sused
Common issues
See the troubleshooting skill for ModuleNotFoundError, PropertyRef validation failed, missing relationships, cleanup misbehavior, and date-handling pitfalls.
References (load on demand)
references/sync-pattern.md— full templates for__init__.py,sync(),get(),transform(), error-handling rules.references/data-model.md— node properties, schema, sub-resource relationships, loading, ECS example.references/testing.md— integration test template,check_nodes/check_rels, mocking policy, integration test boundary.references/coding-conventions.md— error handling, type hints, logging levels and format, deprecation conventions.