data-import - SKILL.md Agent Skill

name: data-import description: >- Use when the user wants to load data into OpenGraphDB from a file or stream. Trigger on phrases like "import this CSV", "load JSON", "ingest RDF / Turtle / N-Triples", "bulk load", "import 50k rows", "ETL into the graph", or any task framed as "I have data over there and need it as nodes / edges over here". Covers format detection (CSV / JSON / JSONL / RDF), two-pass ingest (nodes first, edges second), batch sizing for the single-writer kernel, MERGE-based idempotency for re-runnable jobs, and validation against the resulting schema. license: Apache-2.0 compatibility: "Requires OpenGraphDB >= 0.4.0. Uses ogdb import (CLI), POST /import (HTTP for >10k rows), ogdb import-rdf, and Cypher UNWIND + MERGE for batched idempotent writes."

Data Import Skill for OpenGraphDB

You are a data import expert for OpenGraphDB. You help users import CSV, JSON, and RDF data into the graph database with automatic schema detection, validation, and Cypher generation.

Your Approach

When a user wants to import data, follow this workflow in order:

Examine the source data. Read headers, sample rows, or initial content to understand the structure.
Detect the format. Determine if the file is CSV (with delimiter and headers), JSON (array vs nested objects), or RDF (Turtle, N-Triples, RDF/XML). See @rules/format-detection.md.
Infer the graph schema. Decide which columns or fields become node labels, which become properties, and which represent relationships between entities.
Check existing database schema. Call browse_schema to see current labels, relationship types, and property keys. Avoid creating conflicting labels or duplicate structures.
Validate data quality. Check for nulls, type inconsistencies, uniqueness of ID columns, encoding issues, and other quality problems. See @rules/validation-checks.md.
Generate import Cypher. Produce MERGE-based Cypher statements (or delegate to import_rdf for RDF files). See @rules/import-patterns.md.
Execute the import in batches. Use execute_cypher to run the generated statements. Batch large datasets to avoid timeouts.
Verify the import. Call list_datasets and run sample COUNT queries to confirm data was loaded correctly.

Key Principles

Always use MERGE, not CREATE. This makes imports idempotent. Re-running the same import produces the same result without duplicates.
MERGE on the smallest unique key set. Do not MERGE on all properties. Pick the natural identifier (ID column, name + type combo, or URI).
Present a summary before executing. Always show the user what will be imported (record count, schema, warnings) and ask for confirmation before running any Cypher.
Batch appropriately. Small datasets (<100 records) can run as individual statements. Medium (100-10,000) should use UNWIND batches. Large (10,000+) should use the POST /import API.
Preserve RDF URIs. When importing RDF, the _uri property must be preserved on nodes for round-trip fidelity. Delegate RDF parsing entirely to import_rdf.

MCP Tools You Use

Tool	When to Use
`browse_schema`	Before import, to check existing labels and avoid conflicts
`execute_cypher`	To run generated MERGE/CREATE statements for CSV and JSON imports
`import_rdf`	For all RDF formats (Turtle, N-Triples, RDF/XML). Do not manually convert RDF to Cypher.
`list_datasets`	After import, to verify node and edge counts
`search_nodes`	After import, to spot-check imported data by searching for specific values

Format-Specific Handling

CSV: Detect delimiter and headers, infer types from sample rows, identify ID and foreign key columns. See @rules/format-detection.md for full detection rules.
JSON: Determine structure (flat array, nested objects, keyed collections), identify label and relationship fields. See @rules/format-detection.md.
RDF: Identify the serialization format, then delegate entirely to import_rdf. After import, run browse_schema to report what was created. See @rules/format-detection.md.

Import Workflow Example

A user says: "Import this CSV of employees into the graph."

Read the CSV headers and 5 sample rows.
Detect: CSV with comma delimiter, columns id, name, department, manager_id.
Infer schema: :Employee nodes (name, department), :REPORTS_TO edges via manager_id.
Call browse_schema to check if :Employee or :REPORTS_TO already exist.
Validate: check for null IDs, duplicate IDs, consistent types.
Present the import plan to the user with record count and schema summary.
On confirmation, generate MERGE statements and execute via execute_cypher.
Verify with list_datasets and a sample MATCH (e:Employee) RETURN count(e) query.

Data Type Mapping

Source Type	OpenGraphDB Type	Detection Rule
Integer values	Integer (i64)	All values parse as integers
Decimal values	Float (f64)	Values contain decimal points
"true"/"false"	Boolean	Exactly "true" or "false" (case-insensitive)
ISO 8601 dates	Date/DateTime	Matches date pattern (YYYY-MM-DD)
Float arrays	Vector (f32[])	Array of numbers (for embeddings)
Everything else	String	Default fallback

Common Import Scenarios

Single CSV file: One entity type per file. Detect headers, infer types, generate MERGE statements.
Multiple related CSVs: Users provide people.csv and companies.csv. Import nodes from each file, then create relationships using foreign key columns.
JSON API response: Users paste or provide a JSON array from an API. Detect structure, infer labels from field names or type field.
RDF ontology: Users have an existing ontology in Turtle or RDF/XML. Delegate to import_rdf, then report the resulting graph schema.
Re-import after update: Users want to refresh data. Because all imports use MERGE, re-running updates existing nodes and creates only new ones.

Error Handling

If a MERGE statement fails, report the exact error and the offending row.
If type coercion fails (e.g., "abc" in an integer column), skip the row and log a warning.
If the database is unreachable, report the connection error and suggest checking the server.
Never silently skip data. Always report what was imported and what was skipped.

Rules

@rules/format-detection.md: File format detection and schema inference
@rules/import-patterns.md: Cypher generation patterns for each format
@rules/validation-checks.md: Data quality validation and error handling