kube_airbyte
Alpha
Direct

Airbyte

This module deploys Airbyte onto a Kubernetes cluster with a focus on AWS infrastructure, though it can be adapted for other cloud providers.

Scope and Connectors

This module only deploys the core Airbyte engine components required for the platform to function. It does not include or configure any source or destination connectors, which must be installed separately after deployment. The Airbyte platform provides a connector catalog within its user interface where administrators can install the specific connectors needed for their data integration workflows.

To install connectors:

  1. After deployment, log in to the Airbyte UI using the credentials provided
  2. Navigate to the “Sources” or “Destinations” section
  3. Search for and install the required connectors from the catalog

For custom connector development, this module includes the Connector Builder Server component, which provides a development environment for creating and testing custom connectors to meet specialized integration needs.

If you need to pre-install specific connectors or automate connector configuration, consider implementing additional Terraform modules that interact with the Airbyte API after core deployment is complete.

Usage

  1. Create a new directory adjacent to your aws_eks module called kube_airbyte.

  2. Add a terragrunt.hcl file to the directory that looks like this:

    include "panfactum" {
    path = find_in_parent_folders("panfactum.hcl")
    expose = true
    }
    terraform {
    source = include.panfactum.locals.pf_stack_source
    }
    dependency "vault" {
    config_path = "../kube_vault"
    }
    inputs = {
    vault_domain = dependency.vault.outputs.vault_domain
    # Must be domain available to the cluster
    # Example: airbyte.prod.panfactum.com
    domain = "REPLACE_ME"
    # Must be an email address that you have access to
    # Example: james@panfactum.com
    admin_email = "REPLACE_ME"
    }
  3. Run pf-tf-init to enable the required providers

  4. Run terragrunt apply.

Authentication

The module uses Vault for authentication when ingress is enabled, providing secure access to the Airbyte UI.

Providers

The following providers are needed by this module:

Required Inputs

The following input variables are required:

admin_email

Description: Email for the admin user when auth is enabled

Type: string

domain

Description: The domain to access Airbyte (e.g., airbyte.example.com)

Type: string

vault_domain

Description: The domain where Vault is accessible

Type: string

Optional Inputs

The following input variables are optional (have default values):

airbyte_edition

Description: The edition of Airbyte to deploy (community or enterprise)

Type: string

Default: "community"

airbyte_helm_version

Description: The version of the Airbyte Helm chart to deploy

Type: string

Default: "1.5.1"

airbyte_version

Description: The version of Airbyte to deploy (for image caching)

Type: string

Default: "1.5.1"

arm_nodes_enabled

Description: Whether to allow scheduling on arm nodes

Type: bool

Default: true

aws_iam_ip_allow_list

Description: List of IPs to allow for AWS IAM access

Type: list(string)

Default: []

burstable_nodes_enabled

Description: Whether to allow scheduling on burstable nodes

Type: bool

Default: true

connected_s3_bucket_arns

Description: List of S3 bucket ARNs that airbyte will use as connector destinations

Type: list(string)

Default: []

connector_builder_min_cpu_millicores

Description: The minimum amount of cpu millicores for connector builder containers

Type: number

Default: 25

connector_min_builder_memory_mb

Description: Memory request for connector builder containers

Type: number

Default: 300

controller_nodes_enabled

Description: Whether to allow scheduling on controller nodes

Type: bool

Default: false

cron_min_cpu_millicores

Description: The minimum amount of cpu millicores for cron containers

Type: number

Default: 25

cron_min_memory_mb

Description: Memory request for cron containers

Type: number

Default: 368

db_backup_directory

Description: Directory to store database backups (if enabled)

Type: string

Default: "initial"

db_recovery_directory

Description: Directory to restore database from (if recovery mode enabled)

Type: string

Default: null

db_recovery_mode_enabled

Description: Whether to enable recovery mode for the database

Type: bool

Default: false

db_recovery_target_time

Description: Target recovery time for the database (if recovery mode enabled)

Type: string

Default: null

helm_timeout_seconds

Description: The timeout in seconds for Helm operations

Type: number

Default: 600

ingress_enabled

Description: Whether to enable the ingress for Airbyte

Type: bool

Default: true

jobs_cpu_min_millicores

Description: The minimum amount of cpu millicores for jobs containers

Type: number

Default: 100

jobs_env_env

Description: Additional environment variables for Airbyte jobs configuration (e.g. SYNC_JOB_MAX_ATTEMPTS, JOB_MAIN_CONTAINER_MEMORY_LIMIT, etc.) https://docs.airbyte.com/operator-guides/configuring-airbyte#jobs

Type: map(string)

Default: {}

jobs_min_memory_mb

Description: Memory request for jobs containers

Type: number

Default: 1024

jobs_sync_job_retries_complete_failures_backoff_base

Description: Defines the exponential base of the backoff interval between failed attempts in which no data was synchronized.

Type: number

Default: 2

jobs_sync_job_retries_complete_failures_backoff_max_interval_s

Description: Defines the maximum backoff interval in seconds between failed attempts in which no data was synchronized.

Type: number

Default: 3600

jobs_sync_job_retries_complete_failures_backoff_min_interval_s

Description: Defines the minimum backoff interval in seconds between failed attempts in which no data was synchronized.

Type: number

Default: 60

jobs_sync_job_retries_complete_failures_max_successive

Description: Defines the max number of successive attempts in which no data was synchronized before failing the job.

Type: number

Default: 3

jobs_sync_job_retries_complete_failures_max_total

Description: Defines the max number of attempts in which no data was synchronized before failing the job.

Type: number

Default: 30

jobs_sync_job_retries_partial_failures_max_successive

Description: Defines the max number of attempts in which some data was synchronized before failing the job.

Type: number

Default: 3

jobs_sync_job_retries_partial_failures_max_total

Description: Defines the max number of attempts in which some data was synchronized before failing the job.

Type: number

Default: 30

jobs_sync_max_timeout_days

Description: Defines the number of days a sync job will execute for before timing out.

Type: number

Default: 1

license_key

Description: License key for Airbyte Enterprise

Type: string

Default: ""

log_level

Description: The log level for Airbyte components

Type: string

Default: "WARN"

monitoring_enabled

Description: Whether to enable monitoring for Airbyte

Type: bool

Default: false

namespace

Description: The namespace to deploy Airbyte into

Type: string

Default: "airbyte"

node_image_cached_enabled

Description: Whether to enable node image caching

Type: bool

Default: true

panfactum_scheduler_enabled

Description: Whether to enable the Panfactum scheduler

Type: bool

Default: true

pg_initial_storage_gb

Description: The initial storage for PostgreSQL in GB

Type: number

Default: 20

pg_max_cpu_millicores

Description: The maximum amount of cpu to allocate to the postgres pods (in millicores)

Type: number

Default: 10000

pg_max_memory_mb

Description: The maximum amount of memory to allocate to the postgres pods (in Mi)

Type: number

Default: 128000

pg_min_cpu_millicores

Description: The minimum amount of cpu to allocate to the postgres pods (in millicores)

Type: number

Default: 50

pg_min_cpu_update_millicores

Description: The CPU settings for the Postgres won’t be updated until the recommendations from the VPA (if enabled) differ from the current settings by at least this many millicores. This prevents autoscaling thrash.

Type: number

Default: 250

pg_min_memory_mb

Description: The minimum amount of memory to allocate to the postgres pods (in Mi)

Type: number

Default: 500

pgbouncer_max_cpu_millicores

Description: The maximum amount of cpu to allocate to the pgbouncer pods (in millicores)

Type: number

Default: 10000

pgbouncer_max_memory_mb

Description: The maximum amount of memory to allocate to the pgbouncer pods (in Mi)

Type: number

Default: 32000

pgbouncer_min_cpu_millicores

Description: The minimum amount of cpu to allocate to the pgbouncer pods (in millicores)

Type: number

Default: 15

pgbouncer_min_memory_mb

Description: The minimum amount of memory to allocate to the pgbouncer pods (in Mi)

Type: number

Default: 25

pod_annotations

Description: Additional pod annotations to add to all pods

Type: map(string)

Default: {}

pod_min_sweeper_memory_mb

Description: Memory request for pod sweeper containers

Type: number

Default: 32

pod_sweeper_min_cpu_millicores

Description: The minimum amount of cpu millicores for pod sweeper containers

Type: number

Default: 10

pull_through_cache_enabled

Description: Whether to enable pull-through cache for container images

Type: bool

Default: true

server_min_cpu_millicores

Description: The minimum amount of cpu millicores for server containers

Type: number

Default: 50

server_min_memory_mb

Description: Memory request for server containers

Type: number

Default: 512

sla_target

Description: SLA target level (1-3) affecting high availability settings

Type: number

Default: 1

spot_nodes_enabled

Description: Whether to allow scheduling on spot nodes

Type: bool

Default: true

temporal_db_max_conns

Description: Maximum number of connections for Temporal database (SQL_MAX_CONNS)

Type: number

Default: 100

temporal_db_max_idle_conns

Description: Maximum number of idle connections for Temporal database (SQL_MAX_IDLE_CONNS)

Type: number

Default: 20

temporal_min_cpu_millicores

Description: The minimum amount of cpu millicores for temporal containers

Type: number

Default: 150

temporal_min_memory_mb

Description: Memory request for temporal containers

Type: number

Default: 512

vpa_enabled

Description: Whether to enable Vertical Pod Autoscaler

Type: bool

Default: true

wait

Description: Whether to wait for resources to be created before completing

Type: bool

Default: true

webapp_min_cpu_millicores

Description: The minimum amount of cpu millicores webapp containers

Type: number

Default: 50

webapp_min_memory_mb

Description: Memory request for webapp containers

Type: number

Default: 128

worker_min_cpu_millicores

Description: The minimum amount of cpu millicores for worker containers

Type: number

Default: 100

worker_min_memory_mb

Description: Memory request for worker containers

Type: number

Default: 512

worker_replicas

Description: Number of worker replicas

Type: number

Default: 1

workload_api_min_server_memory_mb

Description: Memory request for workload API server containers

Type: number

Default: 325

workload_api_server_min_cpu_millicores

Description: The minimum amount of cpu millicores for workload API server containers

Type: number

Default: 25

workload_launcher_min_cpu_millicores

Description: The minimum amount of cpu millicores for workload launcher containers

Type: number

Default: 25

workload_min_launcher_memory_mb

Description: Memory request for workload launcher containers

Type: number

Default: 350

Outputs

The following outputs are exported:

airbyte_config_secret

Description: The name of the Airbyte configuration secret

airbyte_url

Description: The URL to access Airbyte

database_credentials_secret

Description: The name of the secret containing database credentials

ingress_domain

Description: The domain configured for Airbyte ingress

jobs_labels

Description: Labels applied to the jobs pods

namespace

Description: The namespace where Airbyte is deployed

server_labels

Description: Labels applied to the server pods

server_service_name

Description: The name of the Airbyte server service

server_service_port

Description: The port of the Airbyte server service

service_account_name

Description: The name of the Kubernetes service account used by Airbyte pods

temporal_labels

Description: Labels applied to the temporal pods

temporal_service_name

Description: The name of the Airbyte temporal service

temporal_service_port

Description: The port of the Airbyte temporal service

webapp_labels

Description: Labels applied to the webapp pods

webapp_service_name

Description: The name of the Airbyte webapp service

webapp_service_port

Description: The port of the Airbyte webapp service

worker_labels

Description: Labels applied to the worker pods

Maintainer Notes

None.