Debugging
Common Issues
Not authenticated to Image Registry
BuildKit uses your local client's credentials to push and pull images from image registries like ECR. If you are not authorized to interact with a registry, you may receive an HTTP 401 error like below:
unexpected status from HEAD request to https://555555555.dkr.ecr.us-east-2.amazonaws.com/v2/website/blobs/sha256:64c920b430dfe10fb711d1be3a7ec01c0a18174df3d05c1effc85ea137c463b1: 401 Unauthorized
The Panfactum setup should automatically authenticate you with ECR, but the following issues can occur:
-
We use the AWS profile that you use for authenticating with the Kubernetes cluster where BuildKit is located. If that profile does not have push and pull access to ECR, you may receive the above error.
-
You may be trying to interact with other private registries such as ghcr.io. See these docs for information on how to properly set up authentication.
Build context terminated
The image building process requires a persistent connection between the buildctl
client and the buildkitd
server.
Occasionally, an intermediate system might break the connection (e.g., an SSH bastion restarting) or the BuildKit
server itself might be closed since they run on spot instances.
In this case, you will receive an error message like this at some point in the build logs:
error: failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOF
This should happen very infrequently. If this occurs many times to you, please report a bug.
When this does occur, you just need to resubmit the build job.
Immutable images
If you have configured your ECR repo to have image immutability, that means that new images cannot be pushed to a tag that has already been used. 1
You may see an error like this if you attempt to do so:
failed commit on ref "manifest-sha256:7425b7e75a781cf30c1ab65315f6e58a0938b380b752bd0d4e234258e54e1d81": unexpected status from PUT request to https://5555555555555.dkr.ecr.us-east-2.amazonaws.com/v2/repo/manifests/test-arm64: 400 Bad Request
Specifically, the indicator is the 400 HTTP error code.
Generally, this error is desirable; you should avoid accidentally changing build artifacts such as images once they are created. This can impact the behavior of live systems that depend on these artifacts.
If you really need to replace the image, a superuser can use the AWS API (either via the web console or CLI) to delete the image.
Errors on the BuildKit server
There is a known issue where BuildKit does not always properly clean up build containers when a build is canceled mid-flight. This results in many error logs like the following:
time="2024-07-01T12:56:32Z" level=error msg="failed to kill process in container id 981xvmuvj4jb2j1iqcpn27vle: buildkit-runc did not terminate successfully: exit status 1: container not running\n"
time="2024-07-01T12:56:32Z" level=error msg="failed to kill process in container id 981xvmuvj4jb2j1iqcpn27vle: buildkit-runc did not terminate successfully: exit status 1: container not running\n"
time="2024-07-01T12:56:32Z" level=error msg="failed to kill process in container id 981xvmuvj4jb2j1iqcpn27vle: buildkit-runc did not terminate successfully: exit status 1: container not running\n"
This is annoying but relatively harmless. It should not disrupt your build operations.
Footnotes
-
Note that you can continue to push the same image to the same tag without error. ↩