Argo Workflow Specification
The primary purpose of this submodule is to create a Workflow spec (workflow_spec
output) that ensures compatibility
with the Panfactum deployment of Argo Workflows (deployed via kube_argo)
and applies sensible defaults for usage in the overall Panfactum Stack.
In particular, this module takes care of the following:
- Assigns AWS permissions to the Workflow's ServiceAccount that allows the Workflow's templates to store artifacts and logs in S3.
- Assigns the necessary Kubernetes permissions to the Workflow's ServiceAccount that allows the Workflow to launch and record its status with the Argo Workflow controller.
- Sets up basic Workflow parallelism, timeouts, and retry configurations
- Assigns defaults for pods created by the Workflow (affinities, tolerations, volumes, security context, scheduler, etc.)
- Provides recommended container defaults (
container_defaults
output) - Disables pod disruptions (overridable)
However, the core Workflow logic (defined via templates
, arguments
, and entrypoint
) is left completely
up to the user to define. For more information on the available values for these fields, see the official
Argo documentation:
Usage
For a basic introduction of how to use this module, see our guide on creating Workflows.
Below we cover some more advanced patterns that you will likely find useful when working with Workflows in the Panfactum Stack.
Defining the Workflow DAG
Workflows are most useful when you need to define a multistep operation. Argo Workflows provides several means to do this, but we recommend the following for almost all use cases:
- DAG Templates: For defining the execution graph across multiple other templates
- ContainerSet Template: For defining an execution graph inside a single pod
ContainerSets can be run faster as all steps will share the same execution context and can share disk space. However, they have a couple drawbacks:
- All containers in the ContainerSet will consume Kubernetes resources (CPU and memory requests) even when they are not running.
- They can only be used to orchestrate containers and not other types of templates.
DAGs offer the most flexibility as they can define a graph of any set of templates, but every template in the execution graph will require creating its own Pod. Generally, this isn't an issue, but it does add some additional time (to create each pod) and can cause issues when trying to share large amounts of data between steps.
If you do need to share large amounts of data between pods in a DAG, you can use the
volume_mounts
module input to create a temporary PVC that will be mounted to each pod. However, you will need to take
care to ensure that only one Pod is running at once as a PVC can only be mounted to a single pod at once.
Overriding the Template Defaults
When using the wf_spec
module, all templates will be provided a default configuration based on the module
inputs. For example, if you provide config_map_mounts
, all pods in the workflow will have a volumes
configured
to include the specified ConfigMaps.
Occasionally, you may want to override the defaults for a given template. This can be done by explicitly providing the relevant parameter to the template:
module "workflow_spec" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/wf_spec?ref=edge.24-11-13" # pf-update
# By default, all pods will have this ConfigMap included as a volume
config_map_mounts = {
my-config-map = {
mount_path = "/tmp/my-config-map"
}
}
templates = [
{
volumes = [] # This overrides the default and ensures that the pod for this template will have no volumes
container = {
image = "some-repo/some-image:some-tag"
command = [ "/bin/some-command" ]
volumeMounts = [] # Since the pod has no volumes, the container cannot have any volume mounts either
}
}
]
}
Critically, all overrides are shallow-merged by key. This makes it possible to drop defaults (as in the example above), but if you wanted to add a new volume, you would need to explicitly concatenate to the original defaults.
module "workflow_spec" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/wf_spec?ref=edge.24-11-13" # pf-update
templates = [
{
volumes = concat(
module.workflow_spec.volumes, # The default volumes (note the self-reference)
[{...}] # The new volume configurations to add
)
}
]
}
Note that OpenTofu supports self-references (i.e., using a module's outputs as a part of its inputs) since values
are lazily evaluated. As a result, we have given the wf_spec
module outputs such as tolerations
, volumes
, env
, etc.,
that can be used as building blocks for overrides.
Parameterizing Workflows and Templates
Often you will want to supply inputs to workflows so to adjust how they behave. Argo calls inputs "parameters" and provides documentation on this functionality here. 1
Note that both Workflows as a whole and their individual templates can be parameterized
although the syntax is slightly different. A Workflow has arguments
and a template has inputs
:
module "workflow_spec" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/wf_spec?ref=edge.24-11-13" # pf-update
# These will show up in the the Argo web UI and you can pass them in when
# creating a Workflow from a WorkflowTemplate programmatically
arguments = {
parameters = [
{
name = "workflow-foo"
description = "Some input"
default = "bar"
}
]
}
entrypoint = "dag"
templates = [
{
name = "first"
inputs = {
parameters = [
{
name = "baz"
}
]
}
container = {
image = "some-repo/some-image:some-tag"
command = [
"/bin/some-command",
"{{workflow.parameters.foo}}", # Will be replaced at execution time with Workflow-level parameter
"{{inputs.parameters.baz}}" # Will be replaced at execution time with Template-level input
]
}
},
{
name = "build-image",
dag = {
tasks = [
# Executes the "first" template and passes in "42" for the "baz" input
{
name = "first"
template = "first"
arguments = {
parameters = [{
name = "baz"
value = "42"
}]
}
}
]
}
}
]
}
Oftentimes, you may want to set the same parameters on a Workflow and each of its
templates and automatically pass through the values. We provide a convenience
utility to do this via the passthrough_parameters
input.
For example:
module "workflow_spec" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/wf_spec?ref=edge.24-11-13" # pf-update
passthrough_parameters = [
{
name = "foo"
description = "Some input"
default = "bar"
}
]
entrypoint = "dag"
templates = [
{
name = "first"
container = {
image = "some-repo/some-image:some-tag"
command = [
"/bin/some-command",
"{{inputs.parameters.foo}}"
]
}
},
{
name = "dag",
dag = {
tasks = [
{
name = "first"
template = "first"
}
]
}
}
]
}
is equivalent to
module "workflow_spec" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/wf_spec?ref=edge.24-11-13" # pf-update
arguments = {
parameters = [
{
name = "foo"
description = "Some input"
default = "bar"
}
]
}
entrypoint = "dag"
templates = [
{
name = "first"
inputs = {
parameters = [
{
name = "foo"
default = "{{workflow.parameters.foo}}"
}
]
}
container = {
image = "some-repo/some-image:some-tag"
command = [
"/bin/some-command",
"{{inputs.parameters.foo}}"
]
}
},
{
name = "dag",
inputs = {
parameters = [
{
name = "foo"
default = "{{workflow.parameters.foo}}"
}
]
},
dag = {
tasks = [
{
name = "first"
template = "first"
arguments = {
parameters = [
{
name = "foo"
value = "{{inputs.parameters.foo}}"
}
]
}
}
]
}
}
]
}
This can be helpful when you aim to reference a template defined on one Workflow from a completely separate Workflow as described here. This ensures that regardless of how a template is executed, it will have the same parameterization capabilities.
Conditional DAG Nodes (Tasks)
When using a DAG Template, you can conditionally execute certain nodes by replaces the dependencies
field for a particular
task with a depends
field that allows enhanced depends logic.
We recommend using this pattern instead of coninueOn
or
Lifecycle hooks as it is more
powerful and less error-prone.
Retries and Timeouts
By default, Workflows created by the wf_spec
module will automatically retry template execution on failure. You can tune
the behavior by adjusting the following inputs:
retry_backoff_initial_duration_seconds
retry_backoff_max_duration_seconds
retry_max_attempts
(set to0
to disable retries)retry_expression
retry_policy
Retries are done on a per-template basis and the above inputs set the default behavior for each template; the entire Workflow will never retry.
Additionally, you can set a timeout for the entire workflow by tuning the active_deadline_seconds
.
Individual templates can also have timeouts by setting the activeDeadlineSeconds
field in each template. However,
note that the template-level timeout is reset on every retry.
Concurrency Controls
There are three different levels of concurrency controls that you can use:
pod_parallelism
: If set, limits how many pods in a single Workflow instance that can be running at once.workflow_parallelism
: Number of instances of this Workflow that can be running at once. See mutexes and semaphores for Workflows for more information.- Template-level mutexes and semaphores: Allows blocking individual templates from running. Can use the same mutex / semaphore across many workflows. Useful for locking access to certain resources (e.g., deploying to an environment).
Saving and Loading Artifacts
Argo Workflows has a feature called Artifacts for saving files and directories across template executions.
We already set up the default artifact behavior to save artifacts to S3, so you do not need to do any setup to begin immediately using artifacts in your Workflows.
Adding Additional AWS Permissions
Each pod that gets created in the Workflow will run with the same ServiceAccount by default. This ServiceAccount will be assigned AWS permissions via IRSA.
You can assign additional permissions to the ServiceAccount's IAM Role via the extra_aws_permissions
input:
data "aws_iam_policy_document" "permissions" {
statement {
sid = "Admin"
effect = "Allow"
actions = ["*"] # Replace with your desired actions
resources = ["*"] # Replace with your desired permissions
}
}
module "workflow_spec" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/wf_spec?ref=edge.24-11-13" # pf-update
extra_aws_permissions = data.aws_iam_policy_document.permissions.json
}
Adding Additional Kubernetes Permissions
Each pod that gets created in the Workflow will run with the same ServiceAccount by default. This ServiceAccount
can be assigned additional Kubernetes permissions by leveraging the service_account_name
output:
# Can use Role instead, if desired
resource "kubernetes_cluster_role_binding" "permissions" {
metadata {
generate_name = "extra-permissions"
labels = module.workflow_spec.labels
}
subject {
kind = "ServiceAccount"
name = module.workflow_spec.service_account_name
namespace = local.namespace
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "cluster-admin" # Replace with your desired ClusterRole
}
}
module "workflow_spec" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/wf_spec?ref=edge.24-11-13" # pf-update
}
Using the Panfactum devShell
We make the Panfactum devShell available as a container image that can be run in a workflow. The specific image tag that is compatible with your version of the Panfactum stack can be sourced from the outputs of the kube_constants submodule. The below code snippet shows an example:
module "constants" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/kube_constants?ref=edge.24-11-13" # pf-update
}
module "pull_through" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/aws_ecr_pull_through_cache_addresses?ref=edge.24-11-13" # pf-update
pull_through_cache_enabled = var.pull_through_cache_enabled
}
module "example_wf" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/wf_spec?ref=edge.24-11-13" # pf-update
name = "example"
namespace = var.namespace
eks_cluster_name = var.eks_cluster_name
entrypoint = "example"
templates = [
{
name = "example"
container = {
image = "${module.pull_through.ecr_public_registry}/${module.constants.panfactum_image}:${module.constants.panfactum_image_version}"
command = ["/some-command-here"]
}
}
]
# pf-generate: pass_vars
pf_stack_version = var.pf_stack_version
pf_stack_commit = var.pf_stack_commit
environment = var.environment
region = var.region
pf_root_module = var.pf_root_module
is_local = var.is_local
extra_tags = var.extra_tags
# end-generate
}
Adding Custom Scripts
You do not need to build a new container image to create and run custom scripts as long as you have an image with a shell installed. Instead, you can mount scripts directly from your IaC.
This can make it much faster to quickly iterate on workflow logic.
The below snippet shows an example of mounting and running a custom script on top of the Panfactum devShell:
# Attach the scripts to a ConfigMap so we can mount them in the workflow spec
resource "kubernetes_config_map" "wf_scripts" {
metadata {
name = "example-scripts"
namespace = var.namespace
}
data = {
"example.sh" = file("${path.module}/example.sh") # This assumes you have an "example.sh" script file in this module
}
}
module "example_wf" {
source = "github.com/Panfactum/stack.git//packages/infrastructure/wf_spec?ref=edge.24-11-13" # pf-update
name = "example"
namespace = var.namespace
eks_cluster_name = var.eks_cluster_name
entrypoint = "example"
templates = [
{
name = "example"
container = {
image = "${module.pull_through.ecr_public_registry}/${module.constants.panfactum_image}:${module.constants.panfactum_image_version}"
command = ["/scripts/example.sh"] # Execute the mounted script
}
}
]
# This will mount the ConfigMap at mount_path inside each container; all the keys of the ConfigMap are file names and the values
# are the file contents.
config_map_mounts = {
"${kubernetes_config_map.wf_scripts.metadata[0].name}" = {
mount_path = "/scripts"
}
}
# pf-generate: pass_vars
pf_stack_version = var.pf_stack_version
pf_stack_commit = var.pf_stack_commit
environment = var.environment
region = var.region
pf_root_module = var.pf_root_module
is_local = var.is_local
extra_tags = var.extra_tags
# end-generate
}
Retaining Workflows and Workflow Pods
When a Workflow completes, it isn't automatically deleted. It is still
Workflow objects will be automatically deleted from the Kubernetes cluster based on the following inputs:
workflow_delete_seconds_after_completion
workflow_delete_seconds_after_failure
workflow_delete_seconds_after_success
When a Workflow is deleted, all the other Kubernetes objects that it owns are also deleted (e.g., Pods, Artifacts, etc.).
If you want to delete pods earlier, you can set pod_delete_delay_seconds
to some lower value; however, pods can never
outlive the Workflow.
Composing Workflows
A common pattern is to compose multiple smaller Workflows into a larger Workflow. We provide guidance on implementing that pattern here.
Providers
The following providers are needed by this module:
-
kubectl (2.0.4)
-
kubernetes (2.27.0)
-
pf (0.0.3)
-
random (3.6.0)
Required Inputs
The following input variables are required:
eks_cluster_name
Description: The name of the EKS cluster that contains the service account.
Type: string
entrypoint
Description: Name of the template that will be used as the first node in this workflow
Type: string
name
Description: The name of this Workflow
Type: string
namespace
Description: The namespace the cluster is in
Type: string
templates
Description: A list of workflow templates. See https://argo-workflows.readthedocs.io/en/stable/fields/#template.
Type: any
Optional Inputs
The following input variables are optional (have default values):
active_deadline_seconds
Description: Duration in seconds relative to the workflow start time which the workflow is allowed to run before the controller terminates the Workflow
Type: number
Default: 86400
archive_logs_enabled
Description: Whether logs should be archived and made available in the Argo web UI
Type: bool
Default: true
arguments
Description: The arguments to set for the Workflow
Type:
object({
artifacts = optional(list(any), [])
parameters = optional(list(any), [])
})
Default:
{
"artifacts": [],
"parameters": []
}
arm_nodes_enabled
Description: Whether to allow Pods to schedule on arm64 nodes
Type: bool
Default: true
burstable_nodes_enabled
Description: Whether to allow Pods to schedule on burstable nodes
Type: bool
Default: false
cluster_workflow_template_ref
Description: Name is the resource name of the ClusterWorkflowTemplate template (https://argo-workflows.readthedocs.io/en/stable/cluster-workflow-templates/)
Type: string
Default: null
common_env
Description: Key pair values of the environment variables for each container
Type: map(string)
Default: {}
common_env_from_config_maps
Description: Environment variables that are sourced from existing Kubernetes ConfigMaps. The keys are the environment variables names and the values are the ConfigMap references.
Type:
map(object({
config_map_name = string
key = string
}))
Default: {}
common_env_from_secrets
Description: Environment variables that are sourced from existing Kubernetes Secrets. The keys are the environment variables names and the values are the Secret references.
Type:
map(object({
secret_name = string
key = string
}))
Default: {}
common_secrets
Description: Key pair values of secrets to add to the containers as environment variables
Type: map(string)
Default: {}
config_map_mounts
Description: A mapping of ConfigMap names to their mount configuration in the containers of the Workflow
Type:
map(object({
mount_path = string # Where in the containers to mount the ConfigMap
optional = optional(bool, false) # Whether the Pod can launch if this ConfigMap does not exist
}))
Default: {}
controller_node_required
Description: Whether the Pods must be scheduled on a controller node
Type: bool
Default: false
default_container_image
Description: The default container image to use
Type: string
Default: "docker.io/library/busybox:1.36.1"
default_resources
Description: The default container resources to use
Type:
object({
requests = optional(object({
memory = optional(string, "100Mi")
cpu = optional(string, "50m")
}), { memory = "100Mi", cpu = "50m" })
limits = optional(object({
memory = optional(string, "100Mi")
cpu = optional(string, null)
}), { memory = "100Mi" })
})
Default:
{
"limits": {
"memory": "100Mi"
},
"requests": {
"cpu": "50m",
"memory": "100Mi"
}
}
delete_artifacts_on_deletion
Description: Change the default behavior to delete artifacts on workflow deletion
Type: bool
Default: false
disruptions_enabled
Description: Whether disruptions should be enabled for Pods in the Workflow
Type: bool
Default: false
dns_policy
Description: The DNS policy for the Pods
Type: string
Default: "ClusterFirst"
extra_aws_permissions
Description: Extra JSON-encoded AWS permissions to assign to the Workflow's service account
Type: string
Default: "{}"
extra_labels
Description: Extra labels to assign to all resources in this workflow
Type: map(string)
Default: {}
extra_pod_annotations
Description: Annotations to add to the Pods in the Workflow
Type: map(string)
Default: {}
extra_pod_labels
Description: Extra Pod labels to use
Type: map(string)
Default: {}
extra_tolerations
Description: Extra tolerations to add to the Pods
Type:
list(object({
key = optional(string)
operator = string
value = optional(string)
effect = optional(string)
}))
Default: []
extra_workflow_labels
Description: Extra labels to add to the Workflow object
Type: map(string)
Default: {}
hooks
Description: Hooks to add to the Workflow
Type: any
Default: {}
ip_allow_list
Description: A list of IPs that can use the service account token to authenticate with AWS API
Type: list(string)
Default: []
linux_capabilities
Description: Extra linux capabilities to add to containers by default
Type: list(string)
Default: []
mount_owner
Description: The ID of the group that owns the mounted volumes
Type: number
Default: 1000
node_preferences
Description: Node label preferences for the Pods
Type: map(object({ weight = number, operator = string, values = list(string) }))
Default: {}
node_requirements
Description: Node label requirements for the Pods
Type: map(list(string))
Default: {}
on_exit
Description: A template reference which is invoked at the end of the workflow, irrespective of the success, failure, or error of the primary template.
Type: string
Default: null
panfactum_scheduler_enabled
Description: Whether to use the Panfactum Pod scheduler with enhanced bin-packing
Type: bool
Default: true
passthrough_parameters
Description: Workflow paramaters that should automatically passthrough to every template on the workflow
Type:
list(object({
default = optional(string)
description = optional(string)
enum = optional(list(string))
globalName = optional(string)
name = string
value = optional(string)
}))
Default: []
pod_delete_delay_seconds
Description: The number of seconds after Workflow completion that Pods will be deleted
Type: number
Default: 180
pod_parallelism
Description: Limits the max total parallel pods that can execute at the same time in a workflow
Type: number
Default: null
priority
Description: Priority is used if controller is configured to process limited number of workflows in parallel. Workflows with higher priority are processed first.
Type: number
Default: null
priority_class_name
Description: The default priority class to use for Pods in the Workflow
Type: string
Default: null
privileged
Description: Whether the generated containers run with elevated privileges
Type: bool
Default: false
pull_through_cache_enabled
Description: Whether to use the ECR pull through cache for the deployed images
Type: bool
Default: true
read_only
Description: Whether the generated containers default to read-only root filesystems
Type: bool
Default: true
retry_backoff_initial_duration_seconds
Description: The initial number of seconds to wait before the next retry in an exponential backoff strategy
Type: number
Default: 30
retry_backoff_max_duration_seconds
Description: The maximum number of seconds to wait before the next retry in an exponential backoff strategy
Type: number
Default: 3600
retry_expression
Description: Expression is a condition expression for when a node will be retried. If it evaluates to false, the node will not be retried and the retry strategy will be ignored.
Type: string
Default: null
retry_max_attempts
Description: The maximum number of allowable retries
Type: number
Default: 5
retry_policy
Description: The policy that determines when the Workflow will be retried
Type: string
Default: "Always"
run_as_root
Description: Whether to enable running as root in the Pods
Type: bool
Default: false
secret_mounts
Description: A mapping of Secret names to their mount configuration in the containers of the Workflow
Type:
map(object({
mount_path = string # Where in the containers to mount the Secret
optional = optional(bool, false) # Whether the Pod can launch if this Secret does not exist
}))
Default: {}
spot_nodes_enabled
Description: Whether to allow Pods to schedule on spot nodes
Type: bool
Default: true
suspend
Description: Whether this workflow is suspended
Type: bool
Default: false
tmp_directories
Description: A mapping of temporary directory names (arbitrary) to their configuration
Type:
map(object({
mount_path = string # Where in the containers to mount the temporary directories
size_mb = optional(number, 100) # The number of MB to allocate for the directory
node_local = optional(bool, false) # If true, the temporary storage will come from the host node rather than a PVC
}))
Default: {}
uid
Description: The UID to use for the user in the Pods
Type: number
Default: 1000
volume_mounts
Description: A mapping of names to configuration for temporary PersistentVolumeClaims used by all Pods in the Workflow
Type:
map(object({
storage_class = optional(string, "ebs-standard")
access_modes = optional(list(string), ["ReadWriteOnce"])
size_gb = optional(number, 1) # The size of the volume in GB
mount_path = string # Where in the containers to mount the volume
}))
Default: {}
workflow_annotations
Description: Annotations to add to the Workflow object
Type: map(string)
Default: {}
workflow_delete_seconds_after_completion
Description: The number of seconds after workflow completion that the Workflow object will be deleted
Type: number
Default: 3600
workflow_delete_seconds_after_failure
Description: The number of seconds after workflow failure that the Workflow object will be deleted
Type: number
Default: 3600
workflow_delete_seconds_after_success
Description: The number of seconds after workflow success that the Workflow object will be deleted
Type: number
Default: 3600
workflow_parallelism
Description: Number of concurrent instances of this Workflow allowed to be running at any given time
Type: number
Default: 1
Outputs
The following outputs are exported:
affinity
Description: The affinity added to each Pod by default
arguments
Description: The arguments to the workflow
aws_role_arn
Description: The name of the AWS role used by the Workflow's Service Account
aws_role_name
Description: The name of the AWS role used by the Workflow's Service Account
container_defaults
Description: Default options for every container spec
container_security_context
Description: The security context to be applied to each container in each Pod generated by this Workflow
env
Description: The environment variables to be added to each container in each Pod generated by this Workflow
generate_name
Description: The prefix for generating Workflow names from this spec
labels
Description: The default labels assigned to all resources in this Workflow
match_labels
Description: The labels unique to this deployment that can be used to select the Pods in this Workflow
name
Description: The non-prefix name of the Workflow spec (should be used for naming derived resources like WorkflowTemplates)
service_account_name
Description: The default service account used for the Pods
template_parameters
Description: The default parameters set on each template
tolerations
Description: Tolerations added to each Pod by default
volume_mounts
Description: The volume mounts to be applied to the main container in each Pod generated by this Workflow
volumes
Description: The volume specification to be applied to all pods generated by this Workflow
workflow_spec
Description: The specification for the Workflow
Maintainer Notes
No notes
Footnotes
-
Technically, there are two types of inputs: parameters (values) and artifacts (files), but we will just focus on parameters in this section. ↩