Panfactum LogoPanfactum
NetworkingKubernetes Networking

Cluster Networking

Background Information

Networking is an extremely deep and broad topic. We have collated a comprehensive set of resources that we strongly recommend to every engineer looking to understanding how data moves between subsystems in their cluster:

  • Basic Networking Concepts

    • Networking Fundamentals (Video Playlist): Core networking concepts at each layer of the OSI model

    • TCP/IP Illustrated (Textbook): The reference book for any core networking protocol

  • Applied Networking on Linux

    • Iptables Tutorial (Online Textbook): An overview of how networking was originally designed to work on linux systems (while this technology is not usually used anymore, the concepts described here remain pervasive when discussing linux networking)

    • Nftables Tutorial (Article): A brief overview of how nftables replaces iptables on modern linux distributions

    • Netfilter docs (Website): A documentation hub that provides references to just about everything you might want to know about iptables / nftables

    • eBPF Intro (Article): An overview of the new linux technology that most low-level tooling is adopting as a replacement to static configuration tools like iptables / nftables

  • Networking on Kubernetes

    • Life of a Packet (Video): Great high-level overview of how data moves through the discrete components of the Kubernetes stack

    • A visual guide to Kubernetes networking fundamentals (Article): A condensed and visual reference guide covering the major concepts in the prior video

    • CNI Explained in 7 Minutes (Video): A brief overview of the motivations behind the container network interface (CNI) specification

    • Why replace iptables with eBPF (Article): Blog post providing an overview for the motivations for using eBPF in current generation networking implementations

Layer 3 / 4

Layer 3 and 4 (L3/4) of the OSI model is where most software-defined networking (SDN) begins. In Kubernetes, this refers specifically to the TCP and IP protocols.

The Panfactum stack replaces the default networking stack provided by EKS with Cilium by Isovalent. The main motivating factor for this decision is to take advantage of eBPF-based networking rather than the Kubernetes default, iptables.

This provides several benefits:

  • Allows for implementation of Network Policies

  • Reduces operational complexity by consolidating several runtime components that would otherwise need to be individually managed and integrated

  • Significantly improves networking performance per dollar

  • Allows for more intelligent load balancing and system resiliency

  • Adds TCP/IP observability that would otherwise be impossible

IP Address Management (IPAM)

By default, Kubernetes clusters run their own network (discrete CIDR block) on top of the network infrastructure provided by the infrastructure host. This is to provide the necessary flexibility to schedule many workloads (pods) on a single node.

However, EKS clusters run the AWS VPC CNI which assigns pods IP address from the containing VPC. That CNI has several downsides, and Cilium provides an alternative implementation with the same functionality. As a result, we use that instead.

Replacing kube-proxy

Kube-proxy is not a proxy but rather a daemon running on each node that allows for IP address assignment to Kubernetes Services and Endpoints. Unfortunately, it is iptables-based and not natively integrated with cilium. While cilium is typically thought of as a CNI (focused on pod / container networking), it also offers a replacement to this part of the default Kubernetes networking setup. We use this in the Panfactum stack to bring the benefits of eBPF to all parts of the networking infrastructure.

Service Mesh

At higher layers of the OSI model there exists an additional category of technologies that sit atop the L3/L4 stack: service meshes.

Put abstractly, a service mesh is a system component that mediates network communication between workloads in your cluster. Unfortunately, the Kubernetes ecosystem lacks concrete standards for what a service mesh specifically ought to do, but some progress is being made with the Gateway API. 1

That said, the core motivation for a service mesh is to centrally implement best practices for layer 7 communications (e.g., http, grpc, etc.) in your cluster.

This not only ensures consistency but also removes the burden from each application developer to implement best-practices like mTLS, metrics reporting, etc.

How a Service Mesh works

In Kubernetes, a service mesh is typically implemented as a central controller that injects sidecar containers into each pod. Traffic from each pod is routed through the sidecar containers before leaving and entering the pod. The sidecar containers then become the only containers that communicate directly with one another and thus create a mesh that captures and orchestrates all network traffic.

Service mesh diagram

There are many implementations of this paradigm. The most popular ones are:

  • Linkerd: Simple and powerful, but not as many features

  • Istio: Extremely complicated, but allows deep customization

  • Consul: Originally developed by Hashicorp for their Kubernetes alternative, Nomad

Do You Need A Service Mesh?

This can be a divisive topic as many Kubernetes veterans have suffered from poorly implemented service mesh deployments. Moreover, a service mesh is not absolutely essential to the functioning of most clusters.

However, we believe the complexities are overstated, and the benefits are numerous. For an additional 15 minutes of setup time (if using the Panfactum stack), you automatically get the following without any additional work:

  • Automatic mTLS: Never worry about manually implementing https or managing certificates while also ensuring the highest level of for all network communications.

  • Locality-aware traffic routing: Improve performance by ensuring traffic is always routed to the fastest-responding server. Significantly reduce cross-AZ network charges.

  • Circuit breaking: Reroute traffic away from unhealthy or resource-constrained pods to improve overall cluster resiliency.

  • Observability: Collect granular data about network throughput and latency that can be used to power observability tooling.

  • Tapping: Gain the ability to introspect network traffic in realtime to improve overall debugging ability.

Additionally, most service meshes come with many more advanced capabilities that are entirely optional but nice to have if you ever need to use them: per-route traffic splitting, multi-cluster networking, etc.

Ultimately, you do not need a service mesh, but you should choose to use one if you are working with Kubernetes in a professional setting. As a result, we include one in the Panfactum stack.

Linkerd2

In the Panfactum stack, we use Linkerd as our service mesh technology. Linkerd has several benefits over the alternatives:

You can see our implementation in the kube_linkerd terraform module.

New Release Model

In February 2024, the maintainers of Linkerd announced a change (Buoyant) to how they would be supporting Linkerd that caused some consternation as it removed the availability of stable releases from non-paying customers.

While we are monitoring the situation, there are a few key takeaways:

  • The Linkerd project’s license has not changed and is still OSS.
  • The “edge” releases continue to be shipped and will continue to contain the all of the code in the stable releases. Edge releases simply do not provide any semantic versioning, come much more frequently (weekly), and do not receive back-ported bugfixes.
  • We have updated the Panfactum stack to use the edge release channels and perform our own stability testing. With the exception of security issues, we will only update the release versions when we release new major versions of the Panfactum stack.
  • While mildly inconvenient, we hope that this provides the Linkerd maintainers a pathway to fund continued development and maintenance of an excellent project.

Buoyant has posted a more detailed update here.

Footnotes

  1. Prior attempts such as the Service Mesh Interface have been discontinued.