neumann-migrate - SKILL.md Agent Skill

name: neumann-migrate description: Migrate data to Neumann from other databases. Use when moving data from PostgreSQL, MySQL, MongoDB, Neo4j, Pinecone, Redis, or other systems to Neumann.

Neumann Migration Guide

Migration Strategy

Before migrating, map your source data model to Neumann's engines:

Source	Neumann Engine	Command Prefix
SQL tables	Relational engine	`CREATE TABLE`, `INSERT INTO`
Documents (MongoDB, Firestore)	Graph engine or Unified	`NODE CREATE` or `ENTITY CREATE`
Graph data (Neo4j, Neptune)	Graph engine	`NODE CREATE`, `EDGE CREATE`
Vectors (Pinecone, Weaviate, Qdrant)	Vector engine	`EMBED STORE`, `EMBED BATCH`
Key-value (Redis, DynamoDB)	Relational or Unified	`INSERT INTO` or `ENTITY CREATE`
Secrets (Vault, AWS Secrets Manager)	Vault engine	`VAULT SET`

General approach:

Create a checkpoint before starting (CHECKPOINT 'pre-migration').
Test with a small subset first.
Bulk load using batch commands.
Verify row/node/embedding counts match the source.
Build indexes after loading is complete.

From SQL Databases (PostgreSQL, MySQL, SQLite)

Step 1: Recreate the schema

Map source types to Neumann types:

Source Type	Neumann Type
`INTEGER`, `SERIAL`, `BIGSERIAL`	`INT` or `BIGINT`
`VARCHAR(n)`, `TEXT`, `CHAR(n)`	`VARCHAR(n)` or `TEXT`
`FLOAT`, `DOUBLE PRECISION`, `REAL`	`FLOAT` or `DOUBLE`
`DECIMAL(p,s)`, `NUMERIC(p,s)`	`DECIMAL(p,s)`
`BOOLEAN`	`BOOLEAN`
`DATE`, `TIME`, `TIMESTAMP`	`DATE`, `TIME`, `TIMESTAMP`
`BYTEA`, `BLOB`	`BLOB`

CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE,
    age INT,
    created_at TIMESTAMP
)

Step 2: Bulk insert data

Use multi-row INSERT for efficiency:

INSERT INTO users (id, name, email, age) VALUES
    (1, 'Alice', 'alice@example.com', 30),
    (2, 'Bob', 'bob@example.com', 25),
    (3, 'Carol', 'carol@example.com', 35)

For large tables, batch in groups of 500-1000 rows per INSERT statement.

Step 3: Recreate indexes

CREATE INDEX idx_users_email ON users (email)
CREATE INDEX idx_users_name ON users (name)

Step 4: Verify

SELECT COUNT(*) FROM users
DESCRIBE users

From Document Databases (MongoDB, Firestore, CouchDB)

Map documents to nodes or unified entities. Choose based on whether you need graph relationships.

Simple documents to entities

ENTITY CREATE 'user-1' { name: 'Alice', email: 'alice@example.com', tags: 'admin,editor' }
ENTITY CREATE 'user-2' { name: 'Bob', email: 'bob@example.com', tags: 'viewer' }

Nested documents to nodes with edges

For a MongoDB document like { user: "Alice", address: { city: "NYC" } }:

NODE CREATE person { name: 'Alice' }
NODE CREATE address { city: 'NYC', zip: '10001' }
EDGE CREATE 'person-node-id' -> 'address-node-id' : has_address

Batch creation for bulk import

ENTITY BATCH CREATE [
    { key: 'doc-1', name: 'First', category: 'A' },
    { key: 'doc-2', name: 'Second', category: 'B' },
    { key: 'doc-3', name: 'Third', category: 'A' }
]

Arrays as multiple edges

For a document with { user: "Alice", friends: ["Bob", "Carol"] }:

NODE CREATE person { name: 'Alice' }
NODE CREATE person { name: 'Bob' }
NODE CREATE person { name: 'Carol' }
EDGE CREATE 'alice-id' -> 'bob-id' : friends_with
EDGE CREATE 'alice-id' -> 'carol-id' : friends_with

From Graph Databases (Neo4j, Amazon Neptune, ArangoDB)

Node mapping

Neo4j Cypher:

CREATE (n:Person {name: "Alice", age: 30})

Neumann equivalent:

NODE CREATE person { name: 'Alice', age: 30 }

Edge mapping

Neo4j Cypher:

MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
CREATE (a)-[:KNOWS {since: 2020}]->(b)

Neumann equivalent:

EDGE CREATE 'alice-node-id' -> 'bob-node-id' : knows { since: 2020 }

Batch import for large graphs

GRAPH BATCH CREATE NODES [
    (:person {name: 'Alice', age: 30}),
    (:person {name: 'Bob', age: 25}),
    (:company {name: 'Acme', industry: 'tech'})
]

GRAPH BATCH CREATE EDGES [
    ('alice-id' -> 'bob-id' : knows {since: 2020}),
    ('alice-id' -> 'acme-id' : works_at {role: 'engineer'})
]

Recreate graph indexes and constraints

GRAPH INDEX CREATE NODE PROPERTY name
GRAPH INDEX CREATE LABEL
GRAPH CONSTRAINT CREATE unique_email ON NODE person PROPERTY email UNIQUE

From Vector Databases (Pinecone, Weaviate, Qdrant, Milvus)

Bulk vector import

Use EMBED BATCH for the fastest bulk loading:

EMBED BATCH [
    ('vec-1', [0.12, -0.34, 0.56, 0.78]),
    ('vec-2', [0.23, 0.45, -0.67, 0.89]),
    ('vec-3', [-0.11, 0.33, 0.55, -0.77])
]

Collection mapping

If your source uses namespaces or collections:

EMBED STORE 'doc-1' [0.1, 0.2, 0.3] IN products
EMBED STORE 'doc-2' [0.4, 0.5, 0.6] IN products
EMBED STORE 'doc-3' [0.7, 0.8, 0.9] IN articles

Metadata as entity properties

If vectors have associated metadata, use unified entities:

ENTITY CREATE 'doc-1' { title: 'Widget Manual', category: 'product' } EMBEDDING [0.1, 0.2, 0.3]

Build index after bulk load

EMBED BUILD INDEX

Always build the index once after all embeddings are loaded, not after each insert.

Verify

COUNT EMBEDDINGS
SHOW VECTOR INDEX
SIMILAR [0.1, 0.2, 0.3] LIMIT 3 METRIC COSINE

Safety Checklist

Checkpoint before migration:
```
CHECKPOINT 'pre-migration'
```
Test with a small subset first. Load 100 rows/nodes/vectors and verify correctness before running the full migration.

Verify counts match:

SELECT COUNT(*) FROM users
GRAPH AGGREGATE COUNT NODES person
COUNT EMBEDDINGS

Use transactions for atomicity (when available):

BEGIN CHAIN TRANSACTION
-- migration commands here
COMMIT CHAIN

Rollback if something goes wrong:

CHECKPOINTS
ROLLBACK TO 'pre-migration'

Build indexes after loading, not during. This applies to both relational indexes (CREATE INDEX) and vector indexes (EMBED BUILD INDEX).

Batch Import Tips

Relational: Multi-row INSERT INTO ... VALUES (row1), (row2), ... -- batch 500-1000 rows per statement.
Graph nodes: GRAPH BATCH CREATE NODES [...] -- batch up to 1000 nodes per call.
Graph edges: GRAPH BATCH CREATE EDGES [...] -- batch up to 1000 edges per call.
Vectors: EMBED BATCH [...] -- batch up to 1000 vectors per call.
Entities: ENTITY BATCH CREATE [...] -- batch up to 500 entities per call (each entity may create a node + embedding).
Quote all keys that contain hyphens, colons, or other special characters.
Order matters: Create nodes before edges that reference them. Create tables before inserting rows.