enterprise-etl-and-data-integration-modernization - SKILL.md Agent Skill

name: enterprise-etl-and-data-integration-modernization description: Guides agents through operating, hardening, and modernizing enterprise ETL and integration stacks such as Informatica, Talend, DataStage, SSIS, and Matillion. Use when legacy mappings, job orchestration, migration, or coexistence with modern lakehouse patterns must be handled safely.

Enterprise ETL And Data Integration Modernization

Overview

Use this skill when delivery depends on classic enterprise ETL tooling, not only modern code-first platforms. It helps agents reason about mapping logic, scheduler dependencies, restart behavior, metadata export, migration sequencing, and coexistence between legacy ETL platforms and newer Spark, dbt, Airflow, or lakehouse stacks.

When to Use

working with Informatica, Talend, DataStage, SSIS, or Matillion
reverse-engineering existing ETL mappings and workflows
modernizing or migrating legacy ETL jobs
adding quality, lineage, and operational controls to GUI-driven pipelines
running hybrid estates where old and new tooling must coexist

Do not assume enterprise ETL logic is simple just because much of it is configured through a UI.

Workflow

Inventory the actual delivery surface. Capture:
- mappings and transformations
- parameter files and environment variables
- scheduler dependencies
- restart and checkpoint behavior
- external scripts and post-load actions
Recover business logic from the platform implementation. Identify:
- joins and filters
- surrogate-key logic
- SCD behavior
- reject handling
- data-quality rules embedded in mappings
Make environment and deployment assumptions explicit. Document:
- connection differences by environment
- credential handling
- promotion rules
- metadata dependencies
- non-obvious manual runbooks
Plan coexistence or migration safely. Decide:
- what remains on the legacy platform
- what moves to code-first pipelines
- how parity will be validated
- how cutover and rollback will work
Add observability and control points around the jobs. Include:
- run metadata
- reconciliation checks
- lineage capture
- failure classification
- repeatable deployment evidence

Common Rationalizations

Rationalization	Reality
"The ETL tool already handles everything."	GUI tooling still hides logic, dependencies, and failure modes that must be made explicit.
"We can just rewrite it later."	Legacy ETL estates become harder to migrate the longer the hidden assumptions remain undocumented.
"The mapping is self-explanatory."	Parameter files, scheduler behavior, and reject handling often carry critical business logic.

Red Flags

mapping logic depends on undocumented manual steps
environment promotion is done by hand with no parity validation
restart or reject behavior is unknown
migration plans ignore reconciliation and rollback

Verification

Mapping logic, dependencies, and restart behavior are inventoried
Hidden environment and scheduler assumptions are documented
Coexistence or migration boundaries are explicit
Reconciliation, lineage, and operational controls exist around delivery
Cutover and rollback plans are defined for migrations