Creating Workflows
Demonstrate how to use the wf_spec submodule to create Argo Workflow resources.
Prerequisites
You should ensure you have completed the following guides before continuing:
- Installing the Argo Workflow Engine
- Developing First-party IaC Modules
- Argo’s Getting Started Guide for Argo Workflows
This guide is NOT meant to be a tutorial on how to use Argo Workflows nor an overview of all of the capabilities that are available. For that, you should see Argo’s Getting Started Tutorial.
This guide is meant to be a short introduction to how to effectively create and deploy Argo Workflows as a part of the Panfactum Stack. It assumes basic familiarity with the core Argo Workflow concepts.
Background
While Argo Workflows contain incredible power and flexibility, these benefits come at the cost of increased complexity compared to simpler but less robust tools. Building a Workflow from scratch can be a daunting endeavor due to the amount of configuration knobs available.
We have created a submodule, wf_spec, that aims to simplify the process by applying sane defaults that are compatible with the Panfactum Stack architecture.
wf is shorthand for “workflow,” but why spec?
Well, a Workflow only defines a single run of the workflow execution graph. This is not very useful when you want to run the same execution graph many times. Thus, instead of creating Workflows directly, Workflows are typically created via Workflow Templates or Cron Workflows. 1
wf_spec creates the “specification” of the Workflow that is added to a resource such as a Workflow Template. We show a concrete example below.
The Basics
This is a very simple example workflow that demonstrates the basics of using our wf_spec submodule: 2
module "workflow_spec" {
source = "${var.pf_module_source}wf_spec${var.pf_module_ref}"
name = "example"
namespace = local.namespace
entrypoint = "entry"
templates = [
{
name = "entry"
dag = {
tasks = [
{
name = "first-task"
template = "first-template"
},
{
name = "second-task"
template = "second-template"
dependencies = [ "first-task" ]
}
]
}
},
{
name = "first-template"
container = {
image = "docker.io/library/busybox:stable"
command = ["/bin/sh", "-c", "echo first-template"]
}
},
{
name = "second-template"
container = {
image = "docker.io/library/busybox:stable"
command = ["/bin/sh", "-c", "echo second-template"]
}
tolerations = concat(module.workflow_spec.tolerations, [
{ key = "foo", operator = "Exists", effect = "NoSchedule" }
])
}
]
}
resource "kubectl_manifest" "workflow_template" {
yaml_body = yamlencode({
apiVersion = "argoproj.io/v1alpha1"
kind = "WorkflowTemplate"
metadata = {
name = "example"
namespace = local.namespace
labels = module.workflow_spec.labels
}
spec = module.workflow_spec.workflow_spec
})
server_side_apply = true
force_conflicts = true
}
A few critical points to notice:
In the example,
module.workflow_specinstantiates the Panfactumwf_specmodule, but critically this module does not create a Workflow; it only creates the specification for a workflow.However, the above example does create a WorkflowTemplate resource via
kubectl_manifest.workflow_template. This WorkflowTemplate uses theworkflow_specoutput of themodule.workflow_specmodule for it’sspecfield.The
templatesarray defines all the nodes in the workflow’s execution graph. There are many different types of Templates, but the most common are:- A Container Template (for running a single container in a Pod)
- A DAG Template (for defining an execution graph of other templates)
- A Container Set Template (for defining an execution graph of containers within a single Pod)
- A Kubernetes Resource Template (for performing CRUD operations against Kubernetes resources)
- A Suspend Template (for suspending a Workflow)
The HTTP Template is currently broken (issue). Instead, use a container template to call into
curl(or equivalent alternative).You might want to override one or more pod-level defaults for a template (example: provide certain tolerations for some pods but not others). See the
second-templatetemplate above as an example.Note that OpenTofu supports self-references (i.e., using a module’s outputs as a part of its inputs) since values are lazily evaluated. As a result, we have given the
wf_specmodule outputs such astolerations,volumes,env, etc., that can be used as building blocks for overrides.Finally, note that any overrides are shallow-merged by key, so if
concathad not been used in the example, the default tolerations would have been dropped.Defining the execution graph of a Workflow is typically done via a DAG (Directed Acyclic Graph) template. This template references other templates and defines the order and conditions in which to execute them.
The
entrypointdefines which template from thetemplatesarray will be initially executed when the Workflow is created.Our example above starts by executing the DAG template (
entry) which in turns will run thefirst-templateContainer template and then thesecond-templateContainer template.
Field Casing in Templates
When defining templates for your workflow spec, you must use camelCase instead of snake_case as we pass through the templates input directly to the generated Kubernetes manifest (and Kubernetes expects camelCase).
Resource Requests and Limits
The VPA unfortunately does not work for pods generated by Workflows. As a result, you must specifically set the resource requests and limits for each container.
You can also use the default_resources input of wf_spec to set a sensible default for all containers.
Advanced Workflow Patterns
For more details and examples on common workflow patterns that you might wish to deploy, see the documentation for wf_spec.
Workflow Checklist
Below is a set of questions that we recommend asking when you create a Workflow using wf_spec:
- Can your pods run on
arm64nodes? If not, did you setarm_nodes_enabledtofalse? 3 - Can your pods tolerate disruptions? If not, set
disruptions_enabledtofalseto avoid all voluntary disruptions. Additionally, you might want to setspot_nodes_enabledtofalseto avoid the (small) chance of spot node preemption. 4 - Are your pods a good fit for burstable spot (T-family) instances? If so, set
burstable_nodes_enabledtotrueto save additional money. - How many times do you want each node in the workflow to retry on failure? Have you set
retry_max_attemptsto the appropriate value? Have you tuned the retry backoff parameters,retry_backoff_initial_duration_secondsandretry_backoff_max_duration_seconds? - Can multiple instances of this workflow be running at once? If so, have you increased
workflow_parallelism? - Do you want limit the number of active pods that can be running at a single time in a workflow? If so, have you set
pod_parallelism? - Have you set the appropriate resource requests and limits for all your containers?
- Have you set an appropriate timeout for the workflow via
active_deadline_seconds? - Do the pods in the workflow need to communicate with the AWS API? Have you added the necessary permissions via
extra_aws_permissions? - Do the pods in the workflow need to communicate with the Kubernetes API? Have you added the necessary permissions to the ServiceAccount (referenced via the
wf_spec.service_account_nameoutput)? - Do your workflow containers need to write to disk? If so, have you set
tmp_directories? - Do your workflow containers need access to a common set of environment variables? If so, have you set
common_envand/orcommon_secrets? - Do your workflow containers need access to mounted Secrets or ConfigMaps? If so, have you set
config_map_mountsand/orsecret_mounts? - Can your workflows containers run as user
1000? If not, have you explicitly setuid? If the containers need to run as root, have you setrun_as_roottotrue? - Do you need to parameterize the Workflow? If so, did you add
arguments?
Examples
- The Argo project provides 100+ examples of various Workflow patterns.
- You provide many prebuilt WorkflowTemplates that you can examine for best practices.
Footnotes
Similar to how Pods are created and managed by a higher level resource like a Deployment. ↩
This example obviously omits some basic module setup such as deploying a
kube_namespaceand defining input variables. ↩Running on
arm64nodes will save about 20% of your workflow costs, so it might be worth it to create a multi-platform image. See our guide. ↩Note that both of these will significantly increase the cost of running this workflow. If the workflow runs very often, you may want to architect your workflow to be able to tolerate the occasional disruption so you at least keep
spot_nodes_enabledastrue. ↩