Kubernetes StatefulSet
Provides a production-hardened instance of a Kubernetes StatefulSet with the following enhancements:
- Automatic headless service creation
- Standardized resource labels
- Pod and container security hardening
- Persistent volume creation and mounting with automatic integrations with the
- PVC Autoresizer and Velero
- Temporary directory mounting
- ConfigMap and Secret mounting
- Downward-API integrations
- Service account configuration with default permissions
- Integration with the Panfactum bin-packing scheduler
- High-availability scheduling constraints
- Readiness and liveness probe configurations
- Automatic reloading via the Reloader
- Vertical pod autoscaling
- Pod disruption budget
- Toleration switches for the various Panfactum node classes
Usage
Basics
This module follows the basic workload deployment patterns describe in this guide.
Horizontal Autoscaling
By default, this module does not have horizontal autoscaling built-in. If you wish to add horizontal autoscaling via the HPA (or similar controller), you should set ignore_replica_count
to true
to prevent this module from overriding the replica count set via horizontal autoscaling.
Persistence
One of the core use cases for a StatefulSet is the ability to persistent data across pod restarts through the use of Persistent Volume Claims (PVCs).
You can configure the StatefulSet’s PVCs via the volume_mounts
input. This input is a map of names (arbitrary) to configuration values for each volume that should mounted to every pod in the StatefulSet.
The configuration values are as follows:
storage_class
: The Storage Class to use for the volume. To learn more about the available storage class options, see our guide.initial_size_gb
: The size of the volume when it is first created.increase_gb
: How much the volume will grow every time it is autoscaled by the PVC autoresizer.increase_threshold_percent
: When free storage drops below this percent on the volume, the volume will be autoscaled.size_limit_gb
: The maximum size the volume is allowed to grow to.mount_path
: Absolute path inside each container that the volume is mounted to.backups_enabled
: Whether the PVC snapshots will be created when Velero backups are created (by default hourly).
PVCs can only be autoscaled every six hours (AWS limitation), so you must choose autoscaling parameters that ensure autoscaling can keep pace with your data growth rate.
You can configure the retention policy of the volumes through the volume_retention_policy
input.
Providers
The following providers are needed by this module:
kubectl (2.1.3)
kubernetes (2.34.0)
pf (0.0.7)
random (3.6.3)
time (0.10.0)
Required Inputs
The following input variables are required:
containers
Description: A list of container configurations for the pod
Type:
list(object({ name = string # A unique name for the container within the pod init = optional(bool, false) # Iff true, the container will be an init container image_registry = string # The URL for a container image registry (e.g., docker.io) image_repository = string # The path to the image repository within the registry (e.g., library/nginx) image_tag = string # The tag for a specific image within the repository (e.g., 1.27.1) image_prepull_enabled = optional(bool, true) # Whether the image will be prepulled to nodes when the nodes are first created (speeds up startup times) image_pin_enabled = optional(bool, true) # Whether the image should be pinned to every node regardless of whether the container is running or not (speeds up startup times) command = list(string) # The command to be run as the root process inside the container working_dir = optional(string, null) # The directory the command will be run in. If left null, will default to the working directory set by the image image_pull_policy = optional(string, "IfNotPresent") # Sets the container's ImagePullPolicy minimum_memory = optional(number, 100) #The minimum amount of memory in megabytes maximum_memory = optional(number, null) #The maximum amount of memory in megabytes memory_limit_multiplier = optional(number, 1.3) # memory limits = memory request x this value minimum_cpu = optional(number, 10) # The minimum amount of cpu millicores maximum_cpu = optional(number, null) # The maximum amount of cpu to allow (in millicores) privileged = optional(bool, false) # Whether to allow the container to run in privileged mode run_as_root = optional(bool, false) # Whether to run the container as root uid = optional(number, 1000) # user to use when running the container if not root linux_capabilities = optional(list(string), []) # Default is drop ALL read_only = optional(bool, true) # Whether to use a readonly file system env = optional(map(string), {}) # Environment variables specific to the container liveness_probe_command = optional(list(string), null) # Will run the specified command as the liveness probe if type is exec liveness_probe_port = optional(number, null) # The number of the port for the liveness_probe liveness_probe_type = optional(string, null) # Either exec, HTTP, or TCP liveness_probe_route = optional(string, null) # The route if using HTTP liveness_probes liveness_probe_scheme = optional(string, "HTTP") # HTTP or HTTPS readiness_probe_command = optional(list(string), null) # Will run the specified command as the ready check probe if type is exec (default to liveness_probe_command) readiness_probe_port = optional(number, null) # The number of the port for the ready check (default to liveness_probe_port) readiness_probe_type = optional(string, null) # Either exec, HTTP, or TCP (default to liveness_probe_type) readiness_probe_route = optional(string, null) # The route if using HTTP ready checks (default to liveness_probe_route) readiness_probe_scheme = optional(string, null) # Whether to use HTTP or HTTPS (default to liveness_probe_scheme) ports = optional(map(object({ # Keys are the port names, and the values are the port configuration. port = number # Port on the backing pods that traffic should be routed to service_port = optional(number, null) # Port to expose on the service. defaults to port protocol = optional(string, "TCP") # One of TCP, UDP, or SCTP expose_on_service = optional(bool, true) # Whether this port should be listed on the StatefulSet's service })), {}) }))
name
Description: The name of this workload
Type: string
namespace
Description: The namespace the cluster is in
Type: string
volume_mounts
Description: A mapping of names to configuration for PersistentVolumeClaims used by the StatefulSet
Type:
map(object({ storage_class = optional(string, "ebs-standard-retained") access_modes = optional(list(string), ["ReadWriteOnce"]) initial_size_gb = optional(number, 1) # The initial size of the volume when first created size_limit_gb = optional(number, null) # The maximum number of GB that this volume will scale to increase_threshold_percent = optional(number, 20) # Dropping below this percent of free storage will trigger an automatic increase in storage size increase_gb = optional(number, 1) # The number of GB to increase the volume by when it needs to scale up mount_path = string # Where in the containers to mount the volume backups_enabled = optional(bool, true) # True iff velero should make snapshot backups of the volumes }))
Optional Inputs
The following input variables are optional (have default values):
arm_nodes_enabled
Description: Whether to allow pods to schedule on arm64 nodes
Type: bool
Default: true
az_anti_affinity_required
Description: Whether to prevent pods from being scheduled on the same availability zone
Type: bool
Default: false
az_spread_preferred
Description: Whether to enable topology spread constraints to spread pods across availability zones (with ScheduleAnyways)
Type: bool
Default: false
az_spread_required
Description: Whether to enable topology spread constraints to spread pods across availability zones (with DoNotSchedule)
Type: bool
Default: true
burstable_nodes_enabled
Description: Whether to allow pods to schedule on burstable nodes
Type: bool
Default: false
cilium_required
Description: True iff the Cilium CNI is required to be installed on a node prior to scheduling on it
Type: bool
Default: true
common_env
Description: Key pair values of the environment variables for each container
Type: map(string)
Default: {}
common_env_from_config_maps
Description: Environment variables that are sourced from existing Kubernetes ConfigMaps. The keys are the environment variables names and the values are the ConfigMap references.
Type:
map(object({ config_map_name = string key = string }))
Default: {}
common_env_from_secrets
Description: Environment variables that are sourced from existing Kubernetes Secrets. The keys are the environment variables names and the values are the Secret references.
Type:
map(object({ secret_name = string key = string }))
Default: {}
common_secrets
Description: Key pair values of secrets to add to the containers as environment variables
Type: map(string)
Default: {}
config_map_mounts
Description: A mapping of ConfigMap names to their mount configuration in the containers of the Pod
Type:
map(object({ mount_path = string # Where in the containers to mount the ConfigMap optional = optional(bool, false) # Whether the pod can launch if this ConfigMap does not exist }))
Default: {}
controller_nodes_enabled
Description: Whether to allow pods to schedule on EKS Node Group nodes (controller nodes)
Type: bool
Default: false
controller_nodes_required
Description: Whether the pods must be scheduled on a controller node
Type: bool
Default: false
dns_policy
Description: The DNS policy for the pods
Type: string
Default: "ClusterFirst"
extra_annotations
Description: A map of extra annotations that will be added to the StatefulSet (not the pods)
Type: map(string)
Default: {}
extra_labels
Description: A map of extra labels that will be added to the StatefulSet (not the pods)
Type: map(string)
Default: {}
extra_pod_annotations
Description: Annotations to add to the pods in the Pod
Type: map(string)
Default: {}
extra_pod_labels
Description: Extra pod labels to use
Type: map(string)
Default: {}
extra_tolerations
Description: Extra tolerations to add to the pods
Type:
list(object({ key = optional(string) operator = string value = optional(string) effect = optional(string) }))
Default: []
host_anti_affinity_required
Description: Whether to prefer preventing pods from being scheduled on the same host
Type: bool
Default: true
ignore_replica_count
Description: Whether to ignore changes to the replica count. When this is true, ‘replicas’ will ONLY be used at initial Deployment creation. Useful when implementing horizontal autoscaling.
Type: bool
Default: false
instance_type_anti_affinity_required
Description: Whether to enable anti-affinity to prevent pods from being scheduled on the same instance type. Defaults to true iff sla_target == 3.
Type: bool
Default: null
linkerd_enabled
Description: True iff the Linkerd sidecar should be injected into the pods
Type: bool
Default: true
linkerd_required
Description: True iff the Linkerd CNI is required to be installed on a node prior to scheduling on it
Type: bool
Default: true
max_unavailable
Description: Controls how many pods are allowed to be unavailable in the StatefulSet under the Pod Disruption Budget
Type: number
Default: 1
mount_owner
Description: The ID of the group that owns the mounted volumes
Type: number
Default: 1000
node_image_cached_enabled
Description: Whether to add the container images to the node image cache for faster startup times
Type: bool
Default: true
node_preferences
Description: Node label preferences for the pods
Type: map(object({ weight = number, operator = string, values = list(string) }))
Default: {}
node_requirements
Description: Node label requirements for the pods
Type: map(list(string))
Default: {}
panfactum_scheduler_enabled
Description: Whether to use the Panfactum pod scheduler with enhanced bin-packing
Type: bool
Default: true
pod_management_policy
Description: The StatefulSets pod management policy
Type: string
Default: "OrderedReady"
pod_version_labels_enabled
Description: Whether to add version labels to the Pod. Useful for ensuring pods do not get recreated on frequent updates.
Type: bool
Default: true
priority_class_name
Description: The priority class to use for pods in the StatefulSet
Type: string
Default: null
pull_through_cache_enabled
Description: Whether to use the ECR pull through cache for the deployed images
Type: bool
Default: true
replicas
Description: The desired number of pods in the StatefulSet
Type: number
Default: 1
restart_policy
Description: The pod restart policy
Type: string
Default: "Always"
secret_mounts
Description: A mapping of Secret names to their mount configuration in the containers of the Pod
Type:
map(object({ mount_path = string # Where in the containers to mount the Secret optional = optional(bool, false) # Whether the pod can launch if this Secret does not exist }))
Default: {}
service_ip
Description: If provided, the StatefulSet’s service will be statically bound to this IP address. Must be within the Service IP CIDR range for the cluster.
Type: string
Default: null
service_load_balancer_class
Description: Iff service_type == LoadBalancer, the loadBalancerClass to use.
Type: string
Default: "service.k8s.aws/nlb"
service_name
Description: If provided, the StatefulSet’s service will have this name. If not provided, will default to name.
Type: string
Default: null
service_public_domain_names
Description: Iff service_type == LoadBalancer, the public domains names that this service will be accessible from.
Type: list(string)
Default: []
service_type
Description: The type of the StatefulSet’s Service.
Type: string
Default: "ClusterIP"
spot_nodes_enabled
Description: Whether to allow pods to schedule on spot nodes
Type: bool
Default: true
termination_grace_period_seconds
Description: The number of seconds to wait for graceful termination before forcing termination
Type: number
Default: 30
tmp_directories
Description: A mapping of temporary directory names (arbitrary) to their configuration
Type:
map(object({ mount_path = string # Where in the containers to mount the temporary directories size_mb = optional(number, 100) # The number of MB to allocate for the directory node_local = optional(bool, false) # If true, the temporary storage will come from the node rather than a PVC }))
Default: {}
unhealthy_pod_eviction_policy
Description: Whether to allow unhealthy pods to be evicted. See https://kubernetes.io/docs/tasks/run-application/configure-pdb/#unhealthy-pod-eviction-policy.
Type: string
Default: "AlwaysAllow"
update_type
Description: The type of update that the StatefulSEt should use
Type: string
Default: "RollingUpdate"
volume_retention_policy
Description: The persistentVolumeClaimRetentionPolicy to use of the StatefulSet
Type:
object({ when_deleted = optional(string, "Retain") when_scaled = optional(string, "Retain") })
Default:
{ "when_deleted": "Retain", "when_scaled": "Retain"}
voluntary_disruption_window_cron_schedule
Description: The times when disruption windows should start
Type: string
Default: "0 0/4 * * *"
voluntary_disruption_window_enabled
Description: Whether to confine voluntary disruptions of pods in this module to specific time windows
Type: bool
Default: false
voluntary_disruption_window_seconds
Description: The length of the disruption window in seconds
Type: number
Default: 900
voluntary_disruptions_enabled
Description: Whether to enable voluntary disruptions of pods in this module.
Type: bool
Default: true
vpa_enabled
Description: Whether to enable the vertical pod autoscaler
Type: bool
Default: true
Outputs
The following outputs are exported:
headless_service_name
Description: The name of the headless service where StatefulSet pods are registered
labels
Description: The labels assigned to all resources in this deployment
match_labels
Description: The labels unique to this deployment that can be used to select the pods in this deployment
service_account_name
Description: The service account used for the pods
service_name
Description: The name of the service for the deployment
Usage
No notes