object-storage

star 0

Object storage patterns for MinIO, S3, and RustFS in data platforms.

cemalcici By cemalcici schedule Updated 2/10/2026

name: object-storage description: Object storage patterns for MinIO, S3, and RustFS in data platforms. allowed-tools: Read, Write, Edit, Glob, Grep, Bash

Object Storage Patterns

Learn to THINK in objects and prefixes, not directories.

⚠️ Core Principles

Flat Namespace

  • No real directories
  • Prefixes simulate hierarchy
  • Listing is expensive

Eventual Consistency (sometimes)

  • S3: Strong consistency since 2020
  • MinIO: Strong consistency
  • Design for it anyway

Common Patterns

MinIO Setup (Docker)

# docker-compose.yml
services:
  minio:
    image: minio/minio
    command: server /data --console-address ":9001"
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      MINIO_ROOT_USER: admin
      MINIO_ROOT_PASSWORD: password
    volumes:
      - minio_data:/data

Spark Configuration

spark = SparkSession.builder \
    .config("spark.hadoop.fs.s3a.endpoint", "http://minio:9000") \
    .config("spark.hadoop.fs.s3a.access.key", "admin") \
    .config("spark.hadoop.fs.s3a.secret.key", "password") \
    .config("spark.hadoop.fs.s3a.path.style.access", "true") \
    .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
    .getOrCreate()

df = spark.read.parquet("s3a://lakehouse/data/")

Path Conventions

s3://lakehouse/
├── raw/                    # Bronze layer
│   ├── source_a/
│   │   └── year=2024/month=01/
│   └── source_b/
├── processed/              # Silver layer
│   └── cleaned_events/
├── curated/                # Gold layer
│   └── daily_metrics/
└── checkpoints/            # Streaming checkpoints

Lifecycle Policies

# MinIO client
from minio import Minio

client = Minio("minio:9000", access_key="admin", secret_key="password", secure=False)

# Set lifecycle to expire objects after 30 days
config = {
    "Rules": [{
        "ID": "expire-old",
        "Status": "Enabled",
        "Expiration": {"Days": 30},
        "Filter": {"Prefix": "raw/"}
    }]
}
client.set_bucket_lifecycle("lakehouse", config)

Anti-Patterns

Anti-Pattern Solution
Too many small files Compact with Spark/Iceberg
Expensive listings Use table format metadata
No lifecycle Set expiration policies

Related Skills

  • For Iceberg: iceberg-patterns
  • For Spark: spark-patterns
Install via CLI
npx skills add https://github.com/cemalcici/data-engineer-agent-kit --skill object-storage
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator