Suspending and Resuming the EKS Clusters
Objective
Shut down an idle cluster to save costs without deleting cluster data, with the ability to fully recover.
This script is not intended for use on production clusters. It is designed for non-production clusters, particularly during the bootstrapping process for idle clusters.
Background
When the script is executed, the following resources are suspended:
- EKS Controller Nodes
- EKS Karpenter Nodes
- EC2 NAT Nodes
- Load Balancers and IP Addresses
These resources are considered non-essential during periods when the cluster is not actively in use and can be safely suspended to reduce AWS charges to ~$3 a day.
Prerequisites
Before running the suspend or resume script, the following must be completed:
Set up AWS Root Profile: Ensure you have configured an AWS root profile that the script will use to interact with the necessary AWS services.
Update the
.kube/config.user.yaml
: Runpf-update-kube
to ensure your Kubernetes configuration is up to date and ready for cluster operations.
Suspending a Cluster
You are about to suspend the cluster. Please note that the cluster’s certificates will expire in 90 days. If the cluster is not resumed before then, the certificates will not be renewed automatically, and the cluster may require manual intervention to recover.
To avoid this, ensure that you run the resume script before the 90-day limit to allow the cluster to gracefully renew its certificates and restore its normal operation.
To suspend the AWS resources:
- Run
pf-eks-suspend
. - The script will prompt you to select an EKS cluster that has been set up.
- Once the cluster is confirmed, the script proceeds to shut down the defined resources associated with the selected cluster.
Resuming a Cluster
To resume the suspended resources:
- Run
pf-eks-resume
. - The script will prompt you to select an EKS cluster that has previously been suspended.
- The script will validate that the cluster was suspended using
pf-eks-suspend
by checking for thepanfactum.com/suspended=true
tag. If this tag is not found, the resume process will fail, ensuring only properly suspended clusters are resumed.