Panfactum Monitoring Stack
This module provides a deployment of the kube-prometheus-stack. It includes:
-
Grafana
-
Alert Manager
Providers
The following providers are needed by this module:
-
aws (5.70.0)
-
helm (2.12.1)
-
kubectl (2.0.4)
-
kubernetes (2.27.0)
-
pf (0.0.3)
-
random (3.6.0)
-
time (0.10.0)
-
vault (3.25.0)
Required Inputs
The following input variables are required:
eks_cluster_name
Description: The name of the EKS cluster.
Type: string
grafana_domain
Description: The domain on which to expose Grafana.
Type: string
vault_domain
Description: The domain of the Vault instance running in the cluster.
Type: string
Optional Inputs
The following input variables are optional (have default values):
additional_tracked_resource_labels
Description: Kubernetes resource labels to include in metric labels
Type: list(string)
Default: []
additional_tracked_resources
Description: Additional Kubernetes resources to track in kube-state-metrics
Type: list(string)
Default: []
alertmanager_local_storage_initial_size_gb
Description: Number of GB to use for the local alertmanager storage (before autoscaled)
Type: number
Default: 2
alertmanager_log_level
Description: The log level for the alertmanager pods
Type: string
Default: "warn"
alertmanager_storage_class_name
Description: The storage class to use for local alertmanager storage
Type: string
Default: "ebs-standard"
aws_iam_ip_allow_list
Description: A list of IPs that can use the service account token to authenticate with AWS API
Type: list(string)
Default: []
enhanced_ha_enabled
Description: Whether to add extra high-availability scheduling constraints at the trade-off of increased cost
Type: bool
Default: true
grafana_basic_auth_enabled
Description: Whether to enable username and password authentication. Should only be enabled during debugging.
Type: bool
Default: true
grafana_db_recovery_directory
Description: The name of the directory in the backup bucket that contains the Grafana PostgreSQL backups and WAL archives
Type: string
Default: null
grafana_db_recovery_mode_enabled
Description: Whether to enable recovery mode for the Grafana PostgreSQL database
Type: bool
Default: false
grafana_db_recovery_target_time
Description: If provided, will recover the Grafana PostgreSQL database to the indicated target time in RFC 3339 format rather than to the latest data.
Type: string
Default: null
grafana_log_level
Description: The log level for the grafana pods
Type: string
Default: "error"
ingress_enabled
Description: Whether or not to enable the ingress for routing public traffic to prometheus stack components
Type: bool
Default: false
kube_api_server_monitoring_enabled
Description: Whether to enable monitoring of the API server
Type: bool
Default: false
kube_prometheus_stack_version
Description: The version of the kube-prometheus-stack to deploy
Type: string
Default: "58.5.3"
loki_chart_version
Description: The version of the grafana/loki helm chart to deploy
Type: string
Default: "6.6.2"
loki_storage_class_name
Description: The storage class to use for local loki storage
Type: string
Default: "ebs-standard"
metrics_retention_resolution_1h
Description: Number of days 1h metrics resolution should be kept
Type: number
Default: 1825
metrics_retention_resolution_5m
Description: Number of days 5m metrics resolution should be kept
Type: number
Default: 90
metrics_retention_resolution_raw
Description: Number of days the raw metrics resolution should be kept
Type: number
Default: 15
monitoring_enabled
Description: Whether to add active monitoring to the deployed systems
Type: bool
Default: false
monitoring_etcd_enabled
Description: Whether to monitor the Kubernetes API server's etcd instances. Only enable for debugging purposes as it contains a huge amount of metrics.
Type: bool
Default: false
panfactum_scheduler_enabled
Description: Whether to use the Panfactum pod scheduler with enhanced bin-packing
Type: bool
Default: true
prometheus_default_scrape_interval_seconds
Description: The default interval between prometheus scrapes (in seconds)
Type: number
Default: 60
prometheus_local_storage_initial_size_gb
Description: Number of GB to use for the local prometheus storage (before autoscaled)
Type: number
Default: 2
prometheus_log_level
Description: The log level for the prometheus pods
Type: string
Default: "warn"
prometheus_operator_log_level
Description: The log level for the prometheus operator pods
Type: string
Default: "warn"
prometheus_storage_class_name
Description: The storage class to use for local prometheus storage
Type: string
Default: "ebs-standard"
pull_through_cache_enabled
Description: Whether to use the ECR pull through cache for the deployed images
Type: bool
Default: true
thanos_bucket_web_domain
Description: Domain to host the Thanos bucket web UI on. If not provided, will be on the same subdomain as grafana but with the thanos-bucket identifier (thanos-bucket.<grafana-subdomain>)
Type: string
Default: null
thanos_bucket_web_enable
Description: Whether to enable the web dashboard for the Thanos bucket analyzer which can show debugging information about your metrics data
Type: bool
Default: true
thanos_chart_version
Description: The version of the bitnami/thanos helm chart to deploy
Type: string
Default: "15.4.7"
thanos_compactor_disk_storage_gb
Description: Number of GB for the ephemeral thanos compactor disk. See https://thanos.io/tip/components/compact.md/#disk
Type: number
Default: 100
thanos_compactor_storage_class_name
Description: The storage class to use for the thanos compactor local storage
Type: string
Default: "ebs-standard"
thanos_image_version
Description: The version of thanos images to use
Type: string
Default: "v0.35.0"
thanos_log_level
Description: The log level for the thanos pods
Type: string
Default: "warn"
thanos_ruler_storage_class_name
Description: The storage class to use for the thanos ruler local storage
Type: string
Default: "ebs-standard"
thanos_store_gateway_storage_class_name
Description: The storage class to use for the thanos store gateway local storage
Type: string
Default: "ebs-standard"
vpa_enabled
Description: Whether the VPA resources should be enabled
Type: bool
Default: true
Outputs
The following outputs are exported:
bucket_web_url
Description: n/a
grafana_admin_password
Description: n/a
grafana_db_recovery_directory
Description: The name of the directory in the backup bucket that contains the Grafana PostgreSQL backups and WAL archives
namespace
Description: n/a
thanos_query_frontend_url
Description: n/a
Usage
No notes