unity-catalog-oss - SKILL.md Agent Skill

name: unity-catalog-oss description: Unity Catalog OSS 0.4.x — the only catalog in this stack. Load when configuring the UC server, creating catalogs/schemas/tables via REST, or wiring a non-Spark engine (DuckDB, Trino) against UC. Covers the REST API surface, credential vending, and the no-JDBC-catalog rule.

Unity Catalog OSS

This stack uses Unity Catalog OSS only. There is no PostgreSQL JDBC catalog path. If you see spark.sql.catalog.iceberg.type=jdbc or spark.sql.catalog.iceberg.jdbc.user anywhere, that's a leftover bug from the upstream lakehouse-stack reference — remove it, don't replicate it.

UC OSS runs as a Java server. The compose definition is docker-compose-unity-catalog.yml. Backing store is PostgreSQL. REST API is on localhost:8081.

Endpoints

Endpoint	Purpose
`http://localhost:8081/api/2.1/unity-catalog/catalogs`	List/create catalogs
`http://localhost:8081/api/2.1/unity-catalog/schemas`	List/create schemas
`http://localhost:8081/api/2.1/unity-catalog/tables`	List/create/describe tables
`http://localhost:8081/api/2.1/unity-catalog/iceberg/v1/config`	Iceberg REST catalog (Spark uses this)
`http://localhost:8081/api/2.1/unity-catalog/iceberg/v1/namespaces`	Iceberg REST namespace ops

UC 0.4.x speaks the Iceberg REST Catalog spec at the /iceberg/v1/* path. Any Iceberg client (Spark, PyIceberg, DuckDB via iceberg extension) can point at this URL.

Spark config

Already wired in config/spark/spark-defaults.conf.example:

spark.sql.catalog.iceberg               org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.iceberg.catalog-impl  org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.iceberg.uri           http://localhost:8081/api/2.1/unity-catalog/iceberg
spark.sql.catalog.iceberg.warehouse     unity
spark.sql.catalog.iceberg.token         not_used

In Spark, iceberg.bronze.orders resolves through UC. Behind the scenes Spark calls GET /iceberg/v1/namespaces/bronze/tables/orders/.

Creating things via REST

# Create the iceberg catalog (one-time)
curl -X POST http://localhost:8081/api/2.1/unity-catalog/catalogs \
  -H "Content-Type: application/json" \
  -d '{"name":"iceberg","comment":"Default Iceberg catalog"}'

# Create a schema
curl -X POST http://localhost:8081/api/2.1/unity-catalog/schemas \
  -H "Content-Type: application/json" \
  -d '{"name":"bronze","catalog_name":"iceberg"}'

# List tables
curl "http://localhost:8081/api/2.1/unity-catalog/tables?catalog_name=iceberg&schema_name=bronze" | jq .

Auth: 0.4.x ships with no auth by default for local. Don't add a bearer token until you've wired UC's auth provider — most demos run unauth.

Backing store

UC OSS stores its catalog metadata in PostgreSQL. Connection details are in config/unity-catalog/server.properties. The PostgreSQL instance is the same one used by Airflow / system Postgres on host port 5432. UC creates its tables under a unitycatalog schema.

Schema migrations for UC's tables are auto-applied at startup; you don't manage them.

Credential vending

UC OSS can vend S3 credentials to clients so Spark doesn't need hardcoded S3_ACCESS_KEY/S3_SECRET_KEY. Configure in server.properties:

s3.region=us-east-1
s3.endpoint=http://seaweedfs:8333
s3.access-key=<your-seaweedfs-key>
s3.secret-key=<your-seaweedfs-secret>
s3.path-style-access=true

Clients then ask UC for temporary creds when reading a table — no creds in client config. For demo purposes the current spark-defaults.conf still ships static S3 keys; cleaning this up is a follow-on.

Other engines

# DuckDB
import duckdb
con = duckdb.connect()
con.sql("INSTALL iceberg; LOAD iceberg;")
con.sql("ATTACH 'http://localhost:8081/api/2.1/unity-catalog/iceberg' AS uc (TYPE iceberg);")
con.sql("SELECT * FROM uc.bronze.orders LIMIT 10;")

Trino, Dremio: same pattern — register UC's /iceberg/v1/ URL as an Iceberg REST catalog.

Limitations of UC OSS 0.4.x (don't promise users these)

Auth providers (OAuth, SAML) are partial.
Lineage events (system tables) are minimal compared to managed Databricks UC.
Cross-catalog references work; cross-deployment federation does not.

Write-side reality (verified 2026-05-19, v0.4.0 and v0.4.1)

UC OSS's write story is partial and format-specific. What was actually tested:

Iceberg is read-only. UC's Iceberg REST adapter (/iceberg/v1/...) advertises only GET/HEAD endpoints — no POST for namespace or table creation. Spark CREATE TABLE against the iceberg catalog fails with UnsupportedOperationException: Server does not support endpoint. UC's native /tables API rejects ICEBERG as a data_source_format entirely (accepts DELTA, PARQUET, CSV, JSON). Treat the iceberg catalog as read-only — fine for cross-engine reads of tables created elsewhere, not for writes.
Delta writes work via the UC Spark connector (io.unitycatalog.spark.UCSingleCatalog), with sharp edges:
- The connector asserts location != null and provider != null in createTable. The standard Spark SQL path populates location automatically; SDP and other non-analyzer paths don't — you must pass location + provider explicitly (in table_properties for SDP). See [[sdp]] → unity-catalog.md.
- Credential vending (generateTemporaryPathCredentials) only accepts s3 / gs / abfs URI schemes — s3a:// is rejected. Write locations as s3://; Hadoop resolves via the spark.hadoop.fs.s3.impl → S3AFileSystem mapping.
- No truncate support — re-creating an existing table errors. Drop first.
Bucket config gotcha: UC's ServerProperties.getS3Configurations() loop breaks (silently skipping the bucket) if either the IAM-role group OR the static-creds group has any null field. For SeaweedFS / static-cred mode set a non-null s3.sessionToken.0 (any placeholder) or the bucket won't load and credential vending fails with "S3 bucket configuration not found."

When something's wrong

./lakehouse logs unity-catalog | tail -100 shows the Java server's stdout. Most failures are:

PostgreSQL not reachable → UC crash-loops.
server.properties references a SeaweedFS endpoint that's not up yet → table operations fail with S3 errors, catalog ops still succeed.
Stale schema migrations after a UC OSS version bump → wipe the unitycatalog schema in Postgres and restart UC.