Inbound Networking
Objective
Deploy the necessary components to allow inbound network traffic to workloads running in the cluster.
Background
Like internal networking, inbound networking has several moving parts. We won't cover them in detail within this guide, but we do in our concept documentation.
Deploy ExternalDNS
ExternalDNS is the most popular Kubernetes controller for synchronizing DNS records from internal Service and Ingress resources to external DNS servers like AWS Route53.
We provide a module to deploy it: kube_external_dns
Let's deploy it now:
-
Create a new directory adjacent to your
aws_eks
module calledkube_external_dns
. -
Add a
terragrunt.hcl
to that directory that looks like this. -
The syntax for
route53_zones
is the same as it was forkube_cert_issuers
. You may reference that guide for more information, but unless you have a specific reason not to, simply copy that input into this module. -
Run
pf-tf-init
to enable the required providers. -
Run
terragrunt apply
.
We will test that this works once we set up our first inbound networking resource.
Deploy the AWS Load Balancer Controller
Otherwise known as the ALB controller, the AWS Load Balancer Controller provisions AWS load balancers for our Kubernetes Services of type LoadBalancer.
As the cluster nodes are running in private subnet, the first step to providing inbound networking is deploying a public gateway in your VPC for inbound traffic to connect with before it is forwarded onto your Kubernetes nodes. AWS load balancers are the perfect gateways:
-
Highly available, distributed across all your public AZs
-
Highly scalable, able to handle any amount of traffic
-
Built-in protection against DOS attacks
-
Support for PROXY protocol to preserve IP headers
We provide a module to deploy it: kube_aws_lb_controller
Let's deploy it now:
-
Create a new directory adjacent to your
kube_external_dns
module calledkube_aws_lb_controller
. -
Add a
terragrunt.hcl
to that directory that looks like this. -
Select the public subnets that you want AWS load balancers to be able to use. We suggest passing in all the public subnets created in your
aws_vpc
deployment. -
Run
pf-tf-init
to enable the required providers. -
Run
terragrunt apply
.
We will see it in action in the following section.
Deploy the Ingress System
The AWS load balancer will not route requests directly to our workloads. Instead, they will be mediated by a Kubernetes Ingress resource.
Operating at OSI layer 7 (e.g., HTTP), the ingress system adds several key capabilities:
-
Logging for all inbound traffic in a standard format
-
Public TLS termination
-
Request routing based on domain, pathname, and HTTP headers
-
Compression of large responses
-
Rate-limiting
-
Standard security headers
-
Web application firewall engine
Additionally, it allows you to only use one AWS load balancer instead of one for each service, saving significant costs.
The most popular ingress controller for Kubernetes is the Ingress-Nginx Controller which uses NGINX as the underlying proxy.
We provide a module to deploy it: kube_ingress_nginx.
Deploy the NGINX-Ingress Controller
-
Create a new directory adjacent to your
kube_external_dns
module calledkube_ingress_nginx
. -
Add a
terragrunt.hcl
to that directory that looks like this.-
For
ingress_domains
select all the domains that the cluster can use to serve traffic. Each domain listed here must have been configured in thekube_external_dns
andkube_cert_issuers
modules. -
The
dhparam
is the Diffie-Hellman key used to power perfect forward secrecy in your tls connections. You can generate it by runningopenssl dhparam 4096 2> /dev/null
. This can take several minutes as it depends on the entropy generated by your computer.This is a secret so ensure that you use sops to save it.
-
The
ingress_timeout_seconds
is the maximum number of seconds that NGINX will wait on upstream servers to return a response before a server error is returned. In general, long-lived requests create reliability and resiliency problems, so we recommend keeping this to 60 seconds or less. -
If you need to support legacy TLSv1.2 clients, set
tls_1_2_enabled
totrue
. We recommend using only TLSv1.3 if possible.
-
-
Run
pf-tf-init
to enable the required providers. -
Run
terragrunt apply
. -
In k9s, notice that a service (
:svc
) of typeLoadBalancer
was created:This shows the ALB controller in action. It automatically provisioned a new AWS Network Load Balancer and configured it to route traffic across the NGINX pods.
-
Log into the AWS web console. Notice that the load balancer resource does indeed exist:
-
Select the target group bound to port 443. Notice that this automatically routes traffic to the IP addresses of the NGINX pods running in the cluster:
Deploy the Vault Ingress
While the NGINX-ingress controllers are successfully running, they will not process any traffic until you create an Ingress resource. An Ingress instructs NGINX how to respond to incoming traffic and what workloads to forward requests to.
Currently, the system has one workload that definitely needs inbound connectivity: the Vault cluster.
Let's set that up now:
-
Return the
kube_vault
module you deployed in the Vault guide section. -
Change
ingress_enabled
fromfalse
totrue
. -
Run
terragrunt apply
. -
In k9s, notice that you have your first Ingress resource:
-
ExternalDNS should recognize the new Ingress resource and set up your public DNS records appropriately. Verify that by running
delv @1.1.1.1 vault.prod.panfactum.com
replacingprod.panfactum.com
with your domain. You should receive a response like this:; fully validated vault.prod.panfactum.com. 60 IN A 18.223.233.91 vault.prod.panfactum.com. 60 IN A 52.14.249.23 vault.prod.panfactum.com. 60 IN RRSIG A 13 4 60 20240326215221 20240326195121 42332 prod.panfactum.com. FDoA4LYqJw7KdTTzgcQb1JG74amZE3mf0HafZ06Z7GmWlLw3qWUSll9x KOl8XcMMr+XOLO7Zi4JjbdGn0CUjVg==
Note that the IP addresses listed are the IPs assigned to the AWS load balancer in front of NGINX. The load balancer will forward TCP traffic onto NGINX which will in turn forward HTTP traffic onto the active Vault instance.
-
Let's see that in action. Run
stern . -n ingress-nginx
to start capturing logs from all the NGINX servers. Now visit your Vault cluster in your web browser (use the domain you queried in the previous section). You should now see the Vault login page:Additionally, you should have seen many logs for each resource needed to load the UI:
ingress-nginx-controller-c87487976-gd9cn controller {"tls.version": "TLSv1.3", "tls.cipher": "TLS_AES_256_GCM_SHA384", "http.url": "/v1/sys/seal-status", "http.version": "HTTP/2.0", "http.status_code": "200", "http.method": "GET", "http.referer": "", "http.origin": "", "http.host": "vault.prod.panfactum.com", "http.useragent":"Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0", "time":"2024-03-26T20:58:54+00:00", "remote_addr": "X.X.X.X", "remote_user": "", "response_length": 667, "duration": 0.003, "request_id": "b2718b569ed881072fbe7682a2cc635d", "request_length": 29, "response_content_type": "application/json", "x_forwarded_for": "X.X.X.X"} ingress-nginx-controller-c87487976-gd9cn controller {"tls.version": "TLSv1.3", "tls.cipher": "TLS_AES_256_GCM_SHA384", "http.url": "/v1/sys/health?standbycode=200&sealedcode=200&uninitcode=200&drsecondarycode=200&performancestandbycode=200", "http.version": "HTTP/2.0", "http.status_code": "200", "http.method": "GET", "http.referer": "", "http.origin": "", "http.host": "vault.prod.panfactum.com", "http.useragent":"Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0", "time":"2024-03-26T20:58:54+00:00", "remote_addr": "X.X.X.X", "remote_user": "", "response_length": 638, "duration": 0.004, "request_id": "dca72e8c51d359bcfcd4c702c53d85a8", "request_length": 90, "response_content_type": "application/json", "x_forwarded_for": "X.X.X.X"} ingress-nginx-controller-c87487976-gd9cn controller {"tls.version": "TLSv1.3", "tls.cipher": "TLS_AES_256_GCM_SHA384", "http.url": "/v1/sys/seal-status", "http.version": "HTTP/2.0", "http.status_code": "200", "http.method": "GET", "http.referer": "", "http.origin": "", "http.host": "vault.prod.panfactum.com", "http.useragent":"Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0", "time":"2024-03-26T20:59:04+00:00", "remote_addr": "X.X.X.X", "remote_user": "", "response_length": 667, "duration": 0.020, "request_id": "7296f3e527c7dab1e2868b81e6252c32", "request_length": 29, "response_content_type": "application/json", "x_forwarded_for": "X.X.X.X"} ingress-nginx-controller-c87487976-gd9cn controller {"tls.version": "TLSv1.3", "tls.cipher": "TLS_AES_256_GCM_SHA384", "http.url": "/v1/sys/health?standbycode=200&sealedcode=200&uninitcode=200&drsecondarycode=200&performancestandbycode=200", "http.version": "HTTP/2.0", "http.status_code": "200", "http.method": "GET", "http.referer": "", "http.origin": "", "http.host": "vault.prod.panfactum.com", "http.useragent":"Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0", "time":"2024-03-26T20:59:04+00:00", "remote_addr": "X.X.X.X", "remote_user": "", "response_length": 638, "duration": 0.020, "request_id": "29e496dbb0d04aecf00e1bf6e7bf7b5c", "request_length": 90, "response_content_type": "application/json", "x_forwarded_for": "X.X.X.X"} ingress-nginx-controller-c87487976-gd9cn controller {"tls.version": "TLSv1.3", "tls.cipher": "TLS_AES_256_GCM_SHA384", "http.url": "/v1/sys/health?standbycode=200&sealedcode=200&uninitcode=200&drsecondarycode=200&performancestandbycode=200", "http.version": "HTTP/2.0", "http.status_code": "200", "http.method": "GET", "http.referer": "", "http.origin": "", "http.host": "vault.prod.panfactum.com", "http.useragent":"Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0", "time":"2024-03-26T20:59:05+00:00", "remote_addr": "X.X.X.X", "remote_user": "", "response_length": 638, "duration": 0.003, "request_id": "c7eefe2a1d3d155136618ff6bc3f1d9f", "request_length": 90, "response_content_type": "application/json", "x_forwarded_for": "X.X.X.X"}
Notice that cert-manager has successfully provisioned a public TLS certificate and NGINX has picked it up to allow communication over HTTPS (using TLSv1.3).
-
Moreover, notice that NGINX properly secures the site in a standard way by setting security headers for the browser. You can verify this either directly via the command line (
curl -I <your_vault_address>
) or using a site such as https://securityheaders.com.This is accomplished by our kube_ingress module.
kube_vault
uses it internally, and you can use it directly in your projects. -
Finally, you no longer need to use the finicky
kubectl port-forward
to connect with Vault. Let's update the address in your configuration files:-
Update
VAULT_ADDR
in your.env
. -
In your region's
region.yaml
file, add or update thevault_addr
key to the public address. -
To verify this works as expected, re-apply the
vault_core_resources
module.
-
Deploy the Bastion
While the Ingress system will allow you to publicly expose HTTP endpoints, you still need a way to communicate with other internal systems using other protocols. For example, you might want to connect over TCP with databases running in the cluster.
For that reason, we will deploy an SSH bastion host to proxy connections to your backend resources over raw TCP. This will allow you to use any protocol over the wire such as the PostgreSQL message format. 1
We provide a bastion deployment module: kube_bastion.
This host uses certificate authentication with Vault so that you do not need to manually manage static SSH keys unlike SSH setups you might have used in the past. We will see that in action in a moment.
Deploy the Bastion Module
Let's deploy the bastion now:
-
Create a new directory adjacent to your
kube_ingress_nginx
module calledkube_bastion
. -
Add a
terragrunt.hcl
to that directory that looks like this.-
For
bastion_domains
select the domain names that you want to be able to access the bastion hosts at. -
Vault will issue ssh certificates that allow users in your organization to connect to private network resources. Those certificates are valid for
ssh_cert_lifetime_seconds
. We recommend setting this to a fairly low value ( 8 hours) as long-lived certificates would allow de-provisioned users to continue to access the private network. 2
-
-
Run
pf-tf-init
to enable the required providers. -
Run
terragrunt apply
.
Note that this will deploy a second AWS NLB. We keep the bastion NLB separate to ensure you have a secondary ingress mechanism should the primary NLB fail.
Configure Bastion Connectivity
We provide two CLI utilities for working with the bastion:
-
pf-update-ssh
: Sets up the bastion connectivity settings that you will commit to your repo for your team to share -
pf-tunnel
: Establishes a tunnel through one of the bastions using dynamically generated, individual credentials
Now that the bastion is running, let's configure connectivity:
-
Run
pf-update-ssh
to scaffold yourssh_dir
directory (default:.ssh
). -
Switch to that directory.
-
Copy the
config.example.yaml
file toconfig.yaml
. -
Update the values to the correct values for your setup. See the reference docs for more information.
-
Run
pf-update-ssh --build
to generate theknown_hosts
andconnection_info
files for your project. Additionally, astate.lock
file is used to help determine when you need to rebuild. These files should be committed to version control as they do not contain any sensitive information and can be shared with everyone in your organization.
Test Bastion Connectivity
Everything should now be in place to use the bastion to proxy connections. Let's verify that it is working as intended.
-
We expose an internal service called
nginx-status
that prints some realtime metrics about the NGINX instance. This service is available at the addressnginx-status.ingress-nginx:18080
. We cannot access it via the public internet, so we must use the bastion to connect. -
We will open a tunnel to the service bound to your
localhost:3030
that will route connections through the bastion. Runpf-tunnel -b <bastion_name> -r nginx-status.ingress-nginx:18080 -l 3030
. Replace<bastion_name>
with the name you used in yourconfig.yaml
.-
-b / --bastion
: Selects the name. -
-r / --remote-address
: Selects the remote address. You must specify the port. -
-l / --local-port
: Selects the local port to bind to.
-
-
In a separate terminal session, run
curl localhost:3030/nginx_status
. NGINX should return a result like this:Active connections: 1 server accepts handled requests 28680 28680 13962 Reading: 0 Writing: 1 Waiting: 0
-
Notice that the SSH keys were automatically generated in the configuration directory. The
_signed.pub
is the certificate that was signed by Vault that allows you temporary access to the bastion host. It will expire after thessh_cert_lifetime_seconds
you configured for thekube_bastion
module. These files are secret and automatically ignored from version control.This time it used the root vault token you set in your
.env
file, but in the future it will use your organization's SSO which we will configure in a later section. -
For fun, run
kubectl rollout restart deployment -n bastion
to restart the bastion instances. Notice that the tunnel recovers gracefully even during disruption to the underlying nodes. -
Close the tunnel with
^C
.
While this was a particularly trivial test, this functionality will become important when needing to access private network resources such as databases without needing to manually maintain certificates or IP whitelists.
Next Steps
Now that the core functionality of the cluster is live, let's install a handful of maintenance controllers that will ensure things continue to operate smoothly.
Footnotes
-
We do not want to use
kubectl port-forward
for this purpose; that was just a stop-gap measure during the bootstrapping process. For one, you may choose to make the Kubernetes API server private in a subsequent guide. Additionally, you do not want to burden the API server with heavy traffic spikes as this could disrupt the entire cluster. Finally,kubectl port-forward
connects directly to a single pod which is prone to service disruptions as pods restart and move around the cluster. The bastion will use the highly available service infrastructure to ensure connections are preserved even if the underlying pod changes. ↩ -
Access to the private network is not the only security gate for accessing private systems in the Panfactum stack, but short-lived credentials are an important part of defense-in-depth. ↩