name: paimon description: Apache Paimon streaming lake format expertise for real-time ingestion and lakehouse architectures. Use when the user mentions Paimon, streaming lakehouse, primary-key tables, changelog, Flink CDC into a lake, Paimon Materialized Tables, PyPaimon, deletion vectors, lookup joins, buckets, compaction, or Spark/Flink Paimon tables.
Apache Paimon Expert
Use this skill for Paimon table design, Flink-native streaming ingestion, changelog semantics, compaction, lookup joins, Spark reads, and Iceberg compatibility.
Current Facts
- Current stable Paimon: 1.4.1. A 1.4.2 release candidate exists; do not recommend it as stable unless the user explicitly wants RC testing.
- PyPaimon: 1.4.1 on PyPI, pure Python package.
- Flink CDC: 3.6.0 is the current CDC line; older 3.5 examples remain useful but should not be described as latest.
- Recommended Flink: Flink 1.20.x or 2.2.x for new work when connector compatibility allows.
- Recommended Spark: verify against the Paimon connector matrix for the selected Paimon version; do not hard-code Spark 3.4.3 for new projects without checking.
- Recent focus areas: PyPaimon, data evolution, Iceberg compatibility, deletion vectors, REST Catalog authorization interfaces, lookup join performance, multimodal/blob storage, and Paimon/Lance integration work.
How To Use
- Determine table type first: append-only table or primary-key table.
- Determine workload: streaming ingest, CDC upsert, lookup dimension table, batch analytics, or cross-format Iceberg exposure.
- Choose bucket strategy early; bucket count affects write parallelism, small files, and lookup performance.
Design Rules
- Use primary-key tables for upserts, deletes, and CDC; use append-only tables for immutable event logs.
- Include partition fields in primary keys when tables are partitioned.
- Avoid single-bucket defaults for large tables; choose fixed or dynamic buckets deliberately.
- Use changelog producer settings based on downstream needs:
input,lookup,full-compaction, or none. - Plan compaction separately from ingestion for high-volume streaming tables.
- Use lookup cache only when dimension-table size and freshness requirements justify it.
Update Checklist
- Recheck Apache Paimon tags/downloads and PyPI
pypaimonbefore changing versions. - Recheck Flink CDC compatibility for the selected Flink and Paimon releases.