Internal Cluster Networking
Objective
Install the basic kubernetes cluster networking primitives via the kube_cilium module.
Background
In the Panfactum stack, we use Cilium to handle all of the L3/L4 networking in our Kubernetes cluster.
In this guide, we won't go into detail about the underlying design decisions and network concepts, so we recommend reviewing the concept documentation for more information.
Deploy Cilium
Deploy the Infrastructure Module
-
Create a new directory adjacent to your
aws_eks
module calledkube_cilium
. -
Add a
terragrunt.hcl
to that directory that looks like this. -
For now, set
vpa_enabled
tofalse
. We will enable it when we install the autoscalers. -
Add a
module.yaml
that enables theaws
,kubernetes
, andhelm
providers. -
Run
terragrunt apply
. -
If the deployment succeeds, you should see the various cilium pods running:
Remove the Old Networking Stack
Unfortunately, EKS automatically installs a networking stack with no way to disable it via infrastructure-as-code. We need to remove those components as they will conflict with cilium's operations.
-
Run
kubectl -n kube-system delete ds aws-node
. This remove the AWS VPC CNI (cilium already contains similar functionality). -
Run
kubectl -n kube-system delete ds kube-proxy
. This removes kube-proxy (cilium contains a replacement). -
You must terminate every node in the cluster to ensure each node's networking configuration can be rebuilt using only cilium. You can manually terminate all nodes simultaneously in the AWS web console. 1
Run Network Tests
Cilium comes with a companion CLI tool that is bundled with the Panfactum devenv. We will use that to test that cilium is working as intended:
-
Run
cilium connectivity test
. -
Wait about 20-30 minutes for the test to complete.
-
If everything completes successfully, you should receive a message like this:
✅ All 46 tests (472 actions) successful, 18 tests skipped, 0 scenarios skipped.
-
Unfortunately, the test does not clean up after itself. You should run
kubectl delete ns cilium-test
to remove the test resources.
Next Steps
Now that basic networking is working within the cluster, we will configure your storage drivers.
Footnotes
-
It will take a few minutes for EKS to automatically relaunch the new nodes. ↩