Creating Workflows

Demonstrate how to use the wf_spec submodule to create Argo Workflow resources.

Prerequisites

You should ensure you have completed the following guides before continuing:

This guide is NOT meant to be a tutorial on how to use Argo Workflows nor an overview of all of the capabilities that are available. For that, you should see Argo’s Getting Started Tutorial.

This guide is meant to be a short introduction to how to effectively create and deploy Argo Workflows as a part of the Panfactum Stack. It assumes basic familiarity with the core Argo Workflow concepts.

Background

While Argo Workflows contain incredible power and flexibility, these benefits come at the cost of increased complexity compared to simpler but less robust tools. Building a Workflow from scratch can be a daunting endeavor due to the amount of configuration knobs available.

We have created a submodule, wf_spec, that aims to simplify the process by applying sane defaults that are compatible with the Panfactum Stack architecture.

wf is shorthand for “workflow,” but why spec?

Well, a Workflow only defines a single run of the workflow execution graph. This is not very useful when you want to run the same execution graph many times. Thus, instead of creating Workflows directly, Workflows are typically created via Workflow Templates or Cron Workflows. ¹

wf_spec creates the “specification” of the Workflow that is added to a resource such as a Workflow Template. We show a concrete example below.

The Basics

This is a very simple example workflow that demonstrates the basics of using our wf_spec submodule: ²

module "workflow_spec" {
  source                      = "${var.pf_module_source}wf_spec${var.pf_module_ref}"

  name                        = "example"
  namespace                   = local.namespace

  entrypoint = "entry"
  templates = [
    {
      name = "entry"
      dag  = {
        tasks = [
          {
            name = "first-task"
            template = "first-template"
          },
          {
            name = "second-task"
            template = "second-template"
            dependencies = [ "first-task" ]
          }
        ]
      }
    },
    {
      name        = "first-template"
      container   = {
        image   = "docker.io/library/busybox:stable"
        command = ["/bin/sh", "-c", "echo first-template"]
      }
    },
    {
      name        = "second-template"
      container   = {
        image   = "docker.io/library/busybox:stable"
        command = ["/bin/sh", "-c", "echo second-template"]
      }
      tolerations = concat(module.workflow_spec.tolerations, [
        { key = "foo", operator = "Exists", effect = "NoSchedule" }
      ])
    }
  ]
}

resource "kubectl_manifest" "workflow_template" {
  yaml_body = yamlencode({
    apiVersion = "argoproj.io/v1alpha1"
    kind       = "WorkflowTemplate"
    metadata   = {
      name      = "example"
      namespace = local.namespace
      labels    = module.workflow_spec.labels
    }

    spec = module.workflow_spec.workflow_spec
  })

  server_side_apply = true
  force_conflicts   = true
}

A few critical points to notice:

In the example, module.workflow_spec instantiates the Panfactum wf_spec module, but critically this module does not create a Workflow; it only creates the specification for a workflow.
However, the above example does create a WorkflowTemplate resource via kubectl_manifest.workflow_template. This WorkflowTemplate uses the workflow_spec output of the module.workflow_spec module for it’s spec field.
The templates array defines all the nodes in the workflow’s execution graph. There are many different types of Templates, but the most common are:
- A Container Template (for running a single container in a Pod)
- A DAG Template (for defining an execution graph of other templates)
- A Container Set Template (for defining an execution graph of containers within a single Pod)
- A Kubernetes Resource Template (for performing CRUD operations against Kubernetes resources)
- A Suspend Template (for suspending a Workflow)
The HTTP Template is currently broken (issue). Instead, use a container template to call into curl (or equivalent alternative).
You might want to override one or more pod-level defaults for a template (example: provide certain tolerations for some pods but not others). See the second-template template above as an example.
Note that OpenTofu supports self-references (i.e., using a module’s outputs as a part of its inputs) since values are lazily evaluated. As a result, we have given the wf_spec module outputs such as tolerations, volumes, env, etc., that can be used as building blocks for overrides.
Finally, note that any overrides are shallow-merged by key, so if concat had not been used in the example, the default tolerations would have been dropped.
Defining the execution graph of a Workflow is typically done via a DAG (Directed Acyclic Graph) template. This template references other templates and defines the order and conditions in which to execute them.
The entrypoint defines which template from the templates array will be initially executed when the Workflow is created.
Our example above starts by executing the DAG template (entry) which in turns will run the first-template Container template and then the second-template Container template.

Field Casing in Templates

When defining templates for your workflow spec, you must use camelCase instead of snake_case as we pass through the templates input directly to the generated Kubernetes manifest (and Kubernetes expects camelCase).

Resource Requests and Limits

The VPA unfortunately does not work for pods generated by Workflows. As a result, you must specifically set the resource requests and limits for each container.

You can also use the default_resources input of wf_spec to set a sensible default for all containers.

Advanced Workflow Patterns

For more details and examples on common workflow patterns that you might wish to deploy, see the documentation for wf_spec.

Workflow Checklist

Below is a set of questions that we recommend asking when you create a Workflow using wf_spec:

Can your pods run on arm64 nodes? If not, did you set arm_nodes_enabled to false? ³
Can your pods tolerate disruptions? If not, set disruptions_enabled to false to avoid all voluntary disruptions. Additionally, you might want to set spot_nodes_enabled to false to avoid the (small) chance of spot node preemption. ⁴
Are your pods a good fit for burstable spot (T-family) instances? If so, set burstable_nodes_enabled to true to save additional money.
How many times do you want each node in the workflow to retry on failure? Have you set retry_max_attempts to the appropriate value? Have you tuned the retry backoff parameters, retry_backoff_initial_duration_seconds and retry_backoff_max_duration_seconds?
Can multiple instances of this workflow be running at once? If so, have you increased workflow_parallelism?
Do you want limit the number of active pods that can be running at a single time in a workflow? If so, have you set pod_parallelism?
Have you set the appropriate resource requests and limits for all your containers?
Have you set an appropriate timeout for the workflow via active_deadline_seconds?
Do the pods in the workflow need to communicate with the AWS API? Have you added the necessary permissions via extra_aws_permissions?
Do the pods in the workflow need to communicate with the Kubernetes API? Have you added the necessary permissions to the ServiceAccount (referenced via the wf_spec.service_account_name output)?
Do your workflow containers need to write to disk? If so, have you set tmp_directories?
Do your workflow containers need access to a common set of environment variables? If so, have you set common_env and/or common_secrets?
Do your workflow containers need access to mounted Secrets or ConfigMaps? If so, have you set config_map_mounts and/or secret_mounts?
Can your workflows containers run as user 1000? If not, have you explicitly set uid? If the containers need to run as root, have you set run_as_root to true?
Do you need to parameterize the Workflow? If so, did you add arguments?

Examples

The Argo project provides 100+ examples of various Workflow patterns.
You provide many prebuilt WorkflowTemplates that you can examine for best practices.

Similar to how Pods are created and managed by a higher level resource like a Deployment. ↩
This example obviously omits some basic module setup such as deploying a kube_namespace and defining input variables. ↩
Running on arm64 nodes will save about 20% of your workflow costs, so it might be worth it to create a multi-platform image. See our guide. ↩
Note that both of these will significantly increase the cost of running this workflow. If the workflow runs very often, you may want to architect your workflow to be able to tolerate the occasional disruption so you at least keep spot_nodes_enabled as true. ↩