name: omop-migration-mapper description: Generates a mapping schema to transform any EMR/healthcare database to OHDSI OMOP CDM format. Use when the user needs to map source healthcare data tables to OMOP using pyomop-migrate, extract a source database schema, define column mappings, resolve concept lookups, validate data quality, and produce a migration-ready JSON mapping file. Don't use for general OMOP queries or post-migration analytics.
Skill purpose
Automate the creation of a production-ready mapping.json file for migrating healthcare data from any source EMR/database to OHDSI OMOP Common Data Model (CDM v5.4 or v6) using the pyomop-migrate tool.
Workflow overview
The skill executes 9 sequential steps: pyomop install → schema extraction → mapping skeleton → column mapping → concept mapping → special-case handling → staged validation → iteration & refinement → handoff.
Step 0: Install pyomop
Input: User confirms they have Python 3.11+ and pip installed.
Action: Run pip install pyomop to install the latest version of the pyomop package, which includes the pyomop-migrate CLI tool.
pip install pyomop
Step 1: Extract source database schema
Input: Source database connection details (type, host, port, user, password, name) provided by user or a pre-extracted schema markdown file.
Action: Run pyomop-migrate --extract-schema to produce a source schema markdown describing all tables, columns, types, and FK relationships.
pyomop-migrate --extract-schema \
--src-dbtype <sqlite|mysql|pgsql> \
--src-name <db_name> \
--src-host <host> \
--src-port <port> \
--src-user <user> \
--src-pw <password> \
--schema-output schema.md
Output: schema.md — human-readable Markdown document describing the source schema.
Decision tree:
- If user provides connection details: Run the extraction command and inspect the output.
- If user provides a schema markdown file: Skip extraction; read and analyze the provided file directly.
- If neither is available: Ask the user for one.
Step 2: Create mapping skeleton
Input: schema.md from Step 1 and an example mapping template. Read assets/mapping.template.json for the JSON structure.
Action: Create a new mapping.json file with top-level keys:
tables(required): list of table-mapping objects.concept(optional): concept lookup definitions.force_text_fields(optional): field names to preserve as text.
For each source table identified in schema.md:
- Create one entry in the
tableslist. - Set
source_table= source table name. - Set
name= target OMOP table name (e.g.,person,condition_occurrence,drug_exposure). - Stub out a
columnsmap (will be filled in Step 3). - Add optional
filtersif rows should be pre-filtered (e.g., exclude deleted records).
Output: A partial mapping.json skeleton with table entries and empty column maps.
Guidance: See references/schema-guides.md for patterns on identifying which source tables map to which OMOP tables (e.g., patient demographics → person, diagnoses → condition_occurrence).
Step 3: Map columns
Input: mapping.json skeleton from Step 2 and source schema schema.md.
Action: For each source table → OMOP table pair, populate the columns map:
{
"source_table": "patient_demographics",
"name": "person",
"columns": {
"person_id": "patient_id",
"year_of_birth": "birth_year",
"gender_concept_id": {"const": 8532},
"race_concept_id": ""
}
}
Column mapping rules:
- Simple copy: Map as a string (
"target": "source_col"). - Constant: Use
{"const": <value>}for hardcoded values (e.g., CDM version, source type). - Null/empty: Use empty string
""to produce empty values (for nullable columns). - Multi-valued: For comma-separated source values, map the field; the loader will use the first element for concept lookups.
Special handling:
- Person ID: If source uses non-numeric person identifiers (e.g., MRN strings), map to
person_source_valueand the loader will resolve numericperson_idviafix_person_idpost-step. - Dates/times: Map as-is; the loader will coerce to
DateorDateTimebased on target column type. - Numerics: Map as-is; the loader will parse to
NumericorIntegerbased on target. - Text fields with complex types (JSON, arrays, lists): Add the target column name to
force_text_fieldsto preserve as text (JSON or comma-joined strings).
Output: Complete mapping.json with all columns entries populated.
Step 4: Define concept mappings
Input: mapping.json from Step 3 and the OMOP concept vocabulary (available via pyomop --vocab <dir> or pre-loaded).
Action: Identify source columns that reference standardized vocabularies (ICD-10, SNOMED, RxNorm, etc.) and add a concept section:
{
"concept": [
{
"table": "condition_occurrence",
"mappings": [
{
"source": "diagnosis_code",
"target": "condition_concept_id"
}
]
}
]
}
The loader will:
- Use
source.diagnosis_codeto look upconcept.concept_idfromconcept.concept_code(orsource_to_concept_map). - Populate the target
condition_concept_idwith the matched concept. - Also populate
condition_source_concept_idif available.
Fallback: If a source code is not found in the vocabulary, record it in a post-migration report and optionally leave *_concept_id as 0 (requires post-processing).
Guidance: See references/decisions.md for common vocabulary mappings by EMR type (Epic, Cerner, etc.).
Step 5: Handle special cases
Input: mapping.json from Step 4 and OMOP CDM schema. Read src/pyomop/cdm6/cdm6_tables.py for authoritative target column names and types.
Action: Review and adjust mappings for common special cases:
Foreign keys (visit, provider, care_site):
- If a source table references a visit/provider/care_site, ensure the mapping includes the FK column (e.g.,
visit_occurrence_id). - If the source does not have these, use
{"const": null}or leave blank.
- If a source table references a visit/provider/care_site, ensure the mapping includes the FK column (e.g.,
Person birth components (year, month, day):
- Source may have a full
date_of_birth; split intoyear_of_birth,month_of_birth,day_of_birthduring migration or pre-process. - Ensure
month_of_birthandday_of_birthare nullable if granularity varies.
- Source may have a full
Gender/race/ethnicity concept IDs:
- Source may store as codes (e.g., 'M', 'F'); map to OMOP standard concept IDs (8532 = Male, 8507 = Female).
- Use the
conceptsection to resolve source codes to concept IDs if vocabulary is available.
Visit types (inpatient, outpatient, ED):
- Map source visit types to OMOP visit concept IDs.
- Typical mappings: office visit → 9202, inpatient → 9201, ED → 9203.
Multiple observation/measurement types in a single table:
- If a source table contains heterogeneous measurements (labs, vitals, custom fields), consider creating multiple target table entries with filters.
Multi-valued coalesce:
- If multiple source columns can populate a single OMOP field, use a pre-migration view or script to coalesce them into a single source column, then map normally.
Output: Updated mapping.json with special cases resolved.
Step 6: Validate with staged migration
Input: mapping.json from Step 5, source DB connection, and target OMOP database (SQLite or test database).
Action: Run a small-scale test load using pyomop-migrate --migrate:
pyomop-migrate --migrate \
--src-dbtype <type> --src-name <db> --src-host <host> --src-user <user> --src-pw <password> \
--mapping mapping.json \
--dbtype sqlite \
--name test_omop.db \
--batch-size 100
Checks to perform (use scripts/validate_mapping.py or manual SQL queries):
- Row counts: Compare source table row counts with imports in target OMOP tables. Expected ratio depends on filters and 1:N mappings (e.g., one patient → many measurements).
- Null checks: Verify required (non-nullable) OMOP columns are populated; flag null counts.
- Concept coverage: Query
condition_occurrence.condition_concept_idfor concept_id = 0 (unmapped codes); list them. - Sample joins: Spot-check 5–10 patient records end-to-end (demographic → condition → measurement) to ensure referential integrity.
- Data type coercion: Confirm dates parse correctly, numbers are numeric, and strings are reasonable lengths.
Decision tree:
- If validation passes: Proceed to Step 7.
- If row counts don't match expectations: Review filters and mapping; re-run migration with
-v(verbose) flag to debug. - If concept mapping is incomplete: Update vocabulary or add fallback concept mappings; re-run Step 4.
- If referential integrity fails: Check FK columns (person_id, visit_occurrence_id) and adjust filters or mapping; re-run Step 6.
Output: Validation report (pass/fail/warnings).
Step 7: Iterate and refine
Input: Validation results from Step 6 and updated mapping.json as needed.
Action:
- If validation revealed issues, update
mapping.json(modify filters, add concept mappings, adjust column mappings). - Re-run
pyomop-migrate --migratewith the updated file. - Re-run validation checks in Step 6.
- Repeat until validation passes.
Record decisions: As you iterate, update references/decisions.md with:
- Which source columns map to which OMOP tables (e.g., "patient.diagnosis_code → condition_occurrence.condition_source_value").
- Vocabulary mappings applied (e.g., "ICD-10 codes resolved via OMOP Athena").
- Filters used (e.g., "Excluded deleted records:
WHERE is_deleted = 0"). - Known issues or caveats (e.g., "Lab unit codes incomplete; will require post-processing").
Step 8: Handoff
Input: Final validated mapping.json and iteration notes from Step 7.
Action:
- Rename
mapping.jsontomapping.<site>.json(e.g.,mapping.epic.json,mapping.cerner.json). - Generate a
migration-report.mdsummary:- Source database schema overview (table + column count).
- OMOP tables targeted and row counts.
- Concept mappings applied.
- Known unmapped source codes (from Step 6 concept coverage check).
- Filters applied (e.g., "Active patients only").
- Caveats and post-migration steps (e.g., "Unit conversions needed for lab results").
- Record location:
references/migration-report.md.
- Archive schema and mapping files in
references/andassets/.
Final deliverables:
mapping.<site>.json— Ready for production migration.migration-report.md— Human-readable summary of decisions and known issues.schema.md— Source schema snapshot for reference.
Error handling
No source database credentials available:
- Ask the user to provide a pre-extracted
schema.mdfile, or provide connection credentials.
Source table not found:
- The extraction step will warn or skip missing tables. Review the source schema and correct the table name in
mapping.json.
Concept codes not in vocabulary:
- Record unmapped codes in the validation report. Suggest user download vocabulary updates from OHDSI Athena or run
pyomop --vocab <dir>.
Referential integrity failure (FK not found):
- Check that person_id, visit_occurrence_id, provider_id are correctly mapped and populated. Adjust filters to exclude orphaned records, or add pre-processing steps.
Data type coercion error (date parse fails):
- Verify source column is in ISO 8601 format (YYYY-MM-DD). If not, pre-process with a SQL view or transformation script.
Validation fails unexpectedly:
- Run
pyomop-migrate --migratewith--verboseor--debugflag for detailed logs. Check target database for partial inserts or constraint violations.
References
- Source extraction guide: See
references/schema-guides.md. - Mapping template: See
assets/mapping.template.json. - Validation script: Run
scripts/validate_mapping.py. - Recurrent patterns: See
references/decisions.mdfor EMR-specific mapping tips (Epic, Cerner, etc.). - External resources: