The Problem: Manual EKS AMI Updates Are a Pain
Every EKS team knows the drill: a new AMI drops, you manually find the ID, read release notes, assess CVEs, draft a PR, wait for approvals, then carefully roll out nodes. It's slow, error-prone, and easy to deprioritize. The result? Outdated nodes, security gaps, and 2 AM surprises.
The Solution: A Fully Automated Pipeline
Suryansh639 built a pipeline that runs twice daily (9 AM and 9 PM UTC) via EventBridge. It detects new EKS-optimized AMIs by querying AWS SSM Parameter Store (/aws/service/eks/optimized-ami/1.34/amazon-linux-2023/recommended/image_id), compares against the current AMI in your Git repo, and if different, triggers a Step Functions workflow.
Phase 1: Detection
A Lambda fetches the latest AMI ID from SSM and checks your GitHub repository (your source of truth). No new AMI? The Lambda exits silently. Difference detected? The Step Functions state machine kicks off.
Phase 2: AI Analysis + Pull Request
Step Functions orchestrates three Lambdas in sequence:
-
bedrock-analyzer: Fetches actual release notes from
awslabs/amazon-eks-amiand sends them to Amazon Bedrock (Claude 3.5 Haiku) with a prompt that asks for a JSON response containingrisk_score(1-10),recommendation(APPROVE/REJECT),summary, andpr_description(full markdown). -
gitops-updater: Uses GitHub App credentials from Secrets Manager to create a branch, update the Karpenter
EC2NodeClassYAML with the new AMI ID, and open a PR with the Bedrock analysis as the description. -
send-notification: Emails the team via SNS with the PR link and AI summary.
The human's only job: read the AI analysis, check the one-line YAML diff, and merge (to approve) or close (to reject).
Phase 3: GitOps Deployment
Once merged:
- ArgoCD detects the commit on
mainand auto-syncs the updatedEC2NodeClassmanifest to the EKS cluster. - Karpenter provisions new EC2 nodes with the updated AMI, then gracefully drains old nodes.
- Zero downtime. No
kubectlneeded.
What the PR Looks Like
The PR description includes the AI's analysis, e.g.:
## EKS AMI Update — ami-04b406d4e6eaca578
**AI Risk Score: 2/10 — APPROVE**
### What changed
- Go updated to 1.25.9
- Kernel updated to 6.12.79-101.147.amzn2023
- No new CVEs introduced
### CVE Assessment
No critical or high-severity CVEs. Two previously known CVEs patched.
Your reviewer doesn't need to dig through release notes — the AI already did.
Deployment: Single CloudFormation Stack
The entire solution deploys from one CloudFormation template. It provisions:
- 4 Lambda functions (detector, analyzer, PR creator, notifier)
- 5 IAM roles (least-privilege per function)
- Amazon Bedrock Guardrail
- Step Functions state machine
- EventBridge rule
- SNS topic + subscription
- Secrets Manager entry
Deploy with:
aws cloudformation create-stack \
--stack-name eks-ami-update \
--template-body file://cloudformation-template.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameters \
ParameterKey=NotificationEmail,Value=your@email.com \
ParameterKey=GitHubAppId,Value= \
ParameterKey=GitHubAppInstallationId,Value= \
ParameterKey=GitHubAppPrivateKey,Value=$(base64 -i app.pem | tr -d '\n') \
ParameterKey=GitHubRepoOwner,Value= \
ParameterKey=GitHubRepoName,Value= \
ParameterKey=GitHubFilePath,Value=karpenter-configs/clusters/your-cluster/nodeclass.yaml \
ParameterKey=GitHubBranch,Value=main \
ParameterKey=EKSVersion,Value=1.34
Takes about 2-3 minutes. Confirm the SNS subscription email.
Prerequisites
- Existing EKS cluster (v1.34+)
- Karpenter installed and configured
- ArgoCD installed with auto-sync enabled
- GitHub repository for Karpenter configs
- GitHub App installed on that repo (App ID, Installation ID, Private Key)
- Amazon Bedrock enabled in your region (Claude 3.5 Haiku access)
- AWS CLI + kubectl configured
Testing the Pipeline
Trigger the detector manually:
aws lambda invoke \
--function-name eks-ami-detector \
--payload '{}' \
--cli-binary-format raw-in-base64-out \
/tmp/response.json && cat /tmp/response.json
You should get an SNS email with the risk analysis and PR link within minutes.
Common Issues
- SNS subscription not confirmed: Check spam folder.
- GitHub App auth failure: Verify App is installed on the correct repo with read/write permissions.
- Bedrock access denied: Enable Claude 3.5 Haiku in Bedrock console.
- ArgoCD not syncing: Ensure
spec.syncPolicy.automatedis set. - Step Functions failures: Check CloudWatch Logs for IAM or missing secret issues.
Why This Architecture Works
- GitHub PRs as approval gate: Engineers already live in GitHub. No new tool, built-in commenting, permanent audit trail.
- AI analysis on real release notes: The prompt uses actual notes from
awslabs/amazon-eks-ami, not hallucinated data. - Karpenter over managed node groups: No drain/cordon scripts needed; Karpenter handles node lifecycle automatically.
- Least-privilege IAM: Five separate roles, each with minimal permissions.
- Bedrock Guardrail: Content filtering on AI output.
What's Next
The author suggests adding:
- Slack notifications instead of (or in addition to) SNS email
- Dry-run mode (log analysis without opening PR)
- Multi-cluster support (different approval thresholds per environment)
- Custom risk criteria (PCI-DSS, SOC 2)
- Automatic REJECT on critical CVEs (skip PR entirely, alert team)
Get the Code
Fork the repo: suryansh639/sample-eks-ami-gitops-pipeline. The CloudFormation template, Lambda code, and example Karpenter configs are all there.
The Right Split
The goal wasn't to remove humans — it was to remove the boring part. AI reads release notes, writes PR description. Human decides. Automation executes. Your nodes get updated on time, every time, with a full audit trail and no 2 AM surprises.
