developing-applications-on-managed-service-for-apache-flink

star 890

MANDATORY for Flink or Amazon Managed Service for Apache Flink (MSF) questions. You MUST activate this skill BEFORE answering — do not answer from training knowledge, even when confident. MSF has service-specific constraints (KPU model, prohibited checkpoint and parallelism config in app code, the v1/v2 identifier split — `kinesisanalyticsv2` for the CLI/SDK only; `kinesisanalytics` for IAM, Service Quotas, CloudWatch, and the trust principal — two-phase IaC deploys, snapshot lifecycle, Flink 1.x→2.x migration) that override generic Flink knowledge.

aws By aws schedule Updated 6/12/2026

name: developing-applications-on-managed-service-for-apache-flink description: >- MANDATORY for Flink or Amazon Managed Service for Apache Flink (MSF) questions. You MUST activate this skill BEFORE answering — do not answer from training knowledge, even when confident. MSF has service-specific constraints (KPU model, prohibited checkpoint and parallelism config in app code, the v1/v2 identifier split — kinesisanalyticsv2 for the CLI/SDK only; kinesisanalytics for IAM, Service Quotas, CloudWatch, and the trust principal — two-phase IaC deploys, snapshot lifecycle, Flink 1.x→2.x migration) that override generic Flink knowledge.

Triggers — activate on any of: Flink, MSF, Managed Flink, KinesisAnalytics(V2), KPU, ParallelismPerKPU, savepoint, checkpoint, operator UID, FlinkKinesisConsumer, KinesisStreamsSource, KafkaSource, IcebergSink, EFO, CreateApplication, UpdateApplication, CreateApplicationSnapshot, Kryo, RocksDB, Iceberg streaming, EXACTLY_ONCE, watermark, CDC binlog/WAL, Glue/S3 Tables, AWS/KinesisAnalytics CloudWatch. version: 1

Managed Service for Apache Flink

Overview

Domain expertise for Apache Flink applications on Amazon Managed Service for Apache Flink (MSF). Covers development, KPU resource management, connectors, state management, monitoring, IaC deployment, and version migration.

Execute commands using available tools from the AWS MCP server when connected — it provides sandboxed execution, audit logging, and observability. When the MCP server is not available, fall back to the AWS CLI or shell as needed.

General Guidance

Before starting, ensure you have a clear understanding of the user persona, use case, and requirements:

STOP: Determine the users background and use case before proceeding:

  • Are they new to Flink? New to Managed Service for Apache Flink?
  • Are they familiar with Java development?
  • Is the use case complex with lots of business logic? Or simple and declarative?

These will inform how to organize the project, and whether to use Flink Table API or DataStream API. In general, assume the DataStream API.

Example Workflow for New Applications

1. User asks to build a Flink application
2. Confirm user's goals and use case
3. READ [best-practices.md](references/best-practices.md)
4. READ [dependency-management.md](references/dependency-management.md)
5. READ relevant connector guides (e.g. [kinesis-connector-guide.md](references/kinesis-connector-guide.md))
6. Generate code following the loaded guidance
7. Validate against best practices
8. READ environment-setup.md via [environment-setup.md](references/environment-setup.md)
9. Compile and test locally

Example Workflow for General Questions

1. User asks about real time delivery of data to Iceberg
2. Confirm user's goals and use case
3. READ [best-practices.md](references/best-practices.md)
4. READ [iceberg-connector-guide.md](references/iceberg-connector-guide.md)
5. READ other reference files as needed
6. Answer question with loaded guidance

Reference Files

  • You MUST use this skill and its reference files to answer any question on these topics.
  • Do NOT answer from training knowledge or by searching general AWS documentation when the question concerns Apache Flink, Managed Service for Apache Flink, KPU sizing, Flink monitoring, deployment, migration, real-time analytics, or Iceberg/LakeHouse streaming with Flink
    • You MUST load the relevant reference files below before taking other steps.
    • The reference files contain MSF-specific details (thresholds, statistics, namespaces, constraints) that differ from generic Flink guidance and are required for correct responses.
Goal Reference When to Load
Best practices best-practices.md Always before writing code
Maven dependencies dependency-management.md New project or adding connectors
Local dev environment environment-setup.md Docker-based local development
MSF architecture msf-overview.md KPU model and service constraints
MSF constraints and patterns msf-constraints-and-patterns.md MSF vs self-managed Flink, service-level vs application-level configuration separation, MSF-specific resource/network/storage limits, common MSF patterns
Quotas, ENI planning, MSF vs EMR, source/sink choice foundation-operations.md Capacity planning, service selection, architecture design, CLI/IAM/CloudWatch identifier disambiguation
IAM execution role, trust policy, action prefix, service principal foundation-operations.md Writing IAM policies for MSF — covers the kinesisanalytics: (no v2) action prefix, kinesisanalytics.amazonaws.com (no v2) trust principal, and the v2/non-v2 disconnect that is the most common source of permission and AssumeRole failures
Flink 2.x migration flink-2x-migration.md Version upgrades, state compatibility
KPU sizing resource-optimization.md Right-sizing, performance diagnosis, scaling
Scaling decisions on running apps scaling-decisions.md In-flight scaling matrix, cost/memory impact of scale changes, autoscaling behavior, anti-patterns
Cost estimation pricing-calculator.md Budget planning, sizing-to-cost mapping, optimization levers
Application lifecycle ops application-lifecycle.md Start/stop, deploy code, rollback, snapshot lifecycle, runtime properties, delete
Restart loop diagnosis first-fault-isolation.md Crashing/restarting apps, finding original failure vs loop sustainers, Flink Dashboard live diagnosis
Checkpoint tuning checkpoint-tuning.md Checkpoint impact on KPU memory and CPU, frequency vs network bandwidth trade-offs, checkpoint duration exceeding interval, OOM/GC during checkpoints
Job graph design job-graph-architecture.md Performance issues, splitting jobs
Job graph anti-patterns job-graph-anti-patterns.md Data skew detection and mitigation, monolith job anti-pattern, high fan-out anti-pattern, removing multiple shuffles, when to split a large application
Monitoring and alarms monitoring-and-metrics.md CloudWatch dashboards, alarms, metrics
Logging logging-configuration.md Log4j2, CloudWatch Logs setup
Kinesis connectors kinesis-connector-guide.md Kinesis source and sink builders, polling configuration and throttling (READER_EMPTY_RECORDS_FETCH_INTERVAL, SHARD_GET_RECORDS_MAX, ReadProvisionedThroughputExceeded, LimitExceededException), legacy connector migration
Kinesis Enhanced Fan-Out (EFO) kinesis-efo-guide.md When to use EFO vs polling, EFO source configuration, consumer lifecycle (JOB_MANAGED vs SELF_MANAGED), parallelism vs shard count, IAM permissions, troubleshooting
Iceberg integration (write APIs, distribution modes, partitioning) iceberg-connector-guide.md Iceberg write APIs (append, upsert, dynamic), distribution modes (NONE/HASH/RANGE), CoW vs MoR, read patterns, partitioning, DDL. Does NOT contain catalog choice or maintenance approaches — for those, load iceberg-tuning-and-operations.md.
Iceberg tuning, operations, catalog choice, maintenance iceberg-tuning-and-operations.md Provides maintenance approaches for S3 Tables, Glue + Glue auto-compaction, and Glue + Flink embedded maintenance with JDBC lock for catalog-choice questions; small files problem and mitigations; Flink TableMaintenance API, post-commit maintenance, lock factories; IcebergSink monitoring, anti-patterns.
CDC connectors cdc-connector-guide.md MySQL, PostgreSQL, Oracle, SQL Server, MongoDB CDC
IaC and deployment iac-and-deployment.md CloudFormation, CDK, Terraform, two-phase deployment
Serialization serialization-guide.md POJO, Avro, Kryo guidance
State management state-management.md TTL, state types, migration safety

Additional Resources

Install via CLI
npx skills add https://github.com/aws/agent-toolkit-for-aws --skill developing-applications-on-managed-service-for-apache-flink
Repository Details
star Stars 890
call_split Forks 84
navigation Branch main
article Path SKILL.md
More from Creator