Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to publish multi-arch buildpacks #288

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions text/0000-manifest-list-buildpacks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Publish multi-arch buildpacks

## Summary

In order to enable end users to create arm64 images with paketo buildpacks I am proposing that paketo start publishing multi-arch buildpack images for all buildpacks. @dmikusa put together [this](https://github.com/dmikusa/paketo-arm64) excellent guide for building arm64 containers with paketo buildpacks, but it puts a heavy burden on end users to create and maintain copies of paketo buildpacks. This proposal is to start publishing manifest list images that support x86_64 and arm64 architectures. This would be accomplished by using `docker manifest` or similar.

For simplicity I will refer to x86_64 as amd64 throughout this document.

## Motivation

The number one question I get about [paketo] buildpacks is whether they support arm. It is safe to say that workloads are generally moving to arm processors and end users are already evaluating how to build containers on arm. It is difficult for me as an end user to seriously consider building arm images with paketo buildpacks without putting in some engineering effort into building arm versions of the paketo buildpacks I may need. While the above example is definitely helpful, a lot of the steps can be built into the github actions workflows that create and release buildpacks for general consumption. I believe arm support was the number one requested roadmap item for 2023, and this proposal is an attempt to jump start that process.


## Detailed Explanation

Right now binaries are created on/for amd64 and then packaged up into buildpacks using pack. It should be easy to create binaries for arm64 which can then be packaged up into buildpacks using pack. These architecture-specific buildpacks can be pushed to a registry with a platform specific tag as shown in the examples below.

* `paketobuildpacks/ca-certificates:3.6-amd64`
* `paketobuildpacks/ca-certificates:3.6-arm64`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm reading about docker buildx, a lot of the examples seem to push a single image with multiple tags from a single Dockerfile. It looks like docker buildx imagetools create does the same thing but with multiple source images that already exist on a registry.

Is it possible to avoid the intermediate step of publishing buildpacks to the :<version>-<arch> tag? It seems unnecessary once we've created a single manifest/image for all platforms. Could we create the images on a local registry and just push the single multi-arch image?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that pushing the arch-specific tags is messy and suspect it could be avoided with a local registry. Since creating buildpacks doesn't requiring pulling other images, the entire workflow should be able to run on amd64 (github runner).

I'll do some testing to validate this when I get a chance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back on this, what's the outcome? Can we go for publishing the single multi-arch image?


The last step would be to create a manifest list image using `docker manifest` commands (or similar) and push it to the registry.

Users would pull the manifest list image such as: `paketobuildpacks/ca-certificates:3.6`
robdimsdale marked this conversation as resolved.
Show resolved Hide resolved

This is what the manifest might look like for `paketobuildpacks/ca-certificates:3.6` (as an example)

```
# crane manifest paketobuildpacks/ca-certificates:3.6 | jq .
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 12345,
"digest": "sha256:abcdefchijklmnodka0204kdkbjladkj02jfjbe2458801ekdbbks024f555kdkd",
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 12345,
"digest": "sha256:3333defchijklmnodka0204kdkbjladkj02jfjbe2458801ekdbbks024f555kdkd",
"platform": {
"architecture": "arm64",
"os": "linux"
}
}
]
}
#
```

## Rationale and Alternatives

The upstream pack project is already creating multi-arch images for pack and the lifecycle using `docker buildx`, which creates manifest list images. An alternative approach is to create and publish separate images that are tagged by architecture. This would be the steps I mentioned above without the final manifest list image. I think this is an anti-pattern at this point given that multi-arch images are already a norm. I also suspect end users would end up creating their own manifest list images anyway. I think the biggest driver for this is to avoid end users cloning and rebuilding paketo buildpacks.

## Implementation
robdimsdale marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this RFC is missing information about stacks and builders. Both are really necessary to provide first-class support for Paketo users. I'd be OK calling it out-of-scope, if we don't want this RFC to get bigger, but I still think we should mention it because it's part of the overall picture to say that we support arm64.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point and I completely agree. I picked up where you left off on multi-arch stacks and will then look at builders. Once we have those in place I think we can move forward with this RFC.


### Changes to packit
jpena-r7 marked this conversation as resolved.
Show resolved Hide resolved

`Os` and `Architecture` string fields can be added to the [`Dependency`](https://github.com/paketo-buildpacks/packit/blob/v2/postal/buildpack.go#L15) struct in `packit/postal` to allow buildpacks to interrogate and use the new metadata when determining what external dependencies to download. This will allow `buildpack.toml` files to be updated to add in the `os` and `architecture` metadata for all dependencies. They would all be set to something like `os=linux` and `architecture=amd64` initially.

Some logic can be added to the [Resolve](https://github.com/paketo-buildpacks/packit/blob/18202009038b0df285ba0fb7d8b43abbf60d3ed0/postal/service.go#L89) method of `packit/postal` to pick a version that matches the `GOOS` environment variable whenever the `os` and `architecture` fields have been specified.

### Changes to `buildpack.toml` files

There are primarily two types of buildpacks: those that run without any external dependencies such as [ca-certificates](https://github.com/paketo-buildpacks/ca-certificates), and those that download and install external dependencies such as [azul-zulu](https://github.com/paketo-buildpacks/azul-zulu). For the former no changes should be needed other than compiling for amd64 and arm64. The later will require some changes to `buildpack.toml`.

The spec for [`buildpack.toml`](https://github.com/buildpacks/spec/blob/main/buildpack.md) makes no mention of `os` or `architecture`, which means the adding those fields to entries in the `metadata.dependencies` list seems like a perfectly reasonable option. [This](https://github.com/paketo-buildpacks/rfcs/blob/decouple-dependencies/text/0000-decouple-dependencies.md) rfc proposes adding `os` and `arch` fields to the versions in the dependency table. I suggest sticking with the same field names that are used in the manifest image (`os` and `architecture`) for consistency, but the general idea is the same.

Here is an example of what the dependencies would look like.

```toml
[[metadata.dependencies]]
cpe = "cpe:2.3:a:golang:go:1.18.4:*:*:*:*:*:*:*"
id = "go"
licenses = ["BSD-3-Clause"]
name = "Go"
stacks = ["*"]
strip-components = 1
version = "1.18.4"
os = "linux"
architecture = "amd64"
purl = "pkg:generic/[email protected]?checksum=c9b099b68d93f5c5c8a8844a89f8db07eaa58270e3a1e01804f17f4cf8df02f5&download_url=https://go.dev/dl/go1.18.4.linux-amd64.tar.gz"
sha256 = "c9b099b68d93f5c5c8a8844a89f8db07eaa58270e3a1e01804f17f4cf8df02f5"
source = "https://go.dev/dl/go1.18.4.linux-amd64.tar.gz"
source_sha256 = "c9b099b68d93f5c5c8a8844a89f8db07eaa58270e3a1e01804f17f4cf8df02f5"
uri = "https://go.dev/dl/go1.18.4.linux-amd64.tar.gz"


[[metadata.dependencies]]
cpe = "cpe:2.3:a:golang:go:1.18.4:*:*:*:*:*:*:*"
id = "go"
licenses = ["BSD-3-Clause"]
name = "Go"
stacks = ["*"]
strip-components = 1
version = "1.18.4"
os = "linux"
architecture = "arm64"
purl = "pkg:generic/[email protected]?checksum=35014d92b50d97da41dade965df7ebeb9a715da600206aa59ce1b2d05527421f&download_url=https://go.dev/dl/go1.18.4.linux-arm64.tar.gz"
sha256 = "35014d92b50d97da41dade965df7ebeb9a715da600206aa59ce1b2d05527421f"
source = "https://go.dev/dl/go1.18.4.linux-arm64.tar.gz"
source_sha256 = "35014d92b50d97da41dade965df7ebeb9a715da600206aa59ce1b2d05527421f"
uri = "https://go.dev/dl/go1.18.4.linux-arm64.tar.gz"
```

### Github actions and workflows

I haven't looked at github workflows for all buildpacks, but I believe [jam](https://github.com/paketo-buildpacks/jam) is being used to package buildpacks. I did see skopeo being used to push images ([cpython](https://github.com/paketo-buildpacks/cpython/blob/main/.github/workflows/push-buildpackage.yml#L62)).

My proposal is to update jam to create buildpacks for amd64 and arm64. I haven't used jam yet, so I don't have specific code changes for it and would appreciate thoughts on how this could be added.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm thinking more about this, it's not obvious to me what, if any, changes are required to jam pack to support building multi-arch buildpacks. jam pack doesn't compile the buildpack; it just packages it. It also doesn't create OCI images - pack buildpack package does this.

Expanding on this in more detail:

The current workflow to create a buildpack is:

  1. Compile the build and run binaries (e.g. go build).
  2. Run either jam pack (for packit buildpacks) or packager (for libbuildpack buildpacks). This primarily creates a tgz of the buildpack files. For packit buildpacks, this also does some extra steps like setting the buildpack version and optionally downloading dependencies. I'm not sure what other steps packager takes.
  3. pack buildpack package to turn that tgz into an OCI image.

Currently, jam pack creates a single tarball of a single buildpack with no concept of multi-architecture. You can optionally filter dependencies by --stack to only include the stack-specific dependencies in the resultant metadata (and downloaded dependencies), but given that stacks can be multi-architecture, this isn't really sufficient to filter dependencies by architecture.

It's not really obvious to me what it would mean for jam pack to support packaging up a buildpack in a multi-architecture way. It doesn't compile the binaries, and it doesn't produce OCI images. Filtering the dependencies by architecture would be an optimization to reduce metadata (and resultant buildpack size if downloading dependencies), but it's not a necessary step for creating the buildpack.

I suppose you could make modifications to jam pack to support packaging multiple versions of build and run for each target architecture and create a wrapper around them to invoke the correct one at run-time. But it seems just as easy to invoke jam pack multiple times - once per architecture - to create architecture-specific tarballs.

I don't know if pack buildpack package already supports multi-arch images. I took a quick look at the code, and it only seems to be aware of platform, which only contains OS (i.e. linux or windows). This leads me to think either we'd have to make changes to pack, or we'd have to create separate docker images for each architecture, and bundle them together. Perhaps that's what docker buildx imagetools create does - I'm not sure. Perhaps @dmikusa or @dashaun have more info on how to build multi-arch buildpacks in a single manifest?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that pack does not support multi-arch yet, and the new changes being introduced will allow users to create manifest images after they have already been built.

This is an example of creating multi-arch builders. It creates and publishes arch-specific tags, but it is meant for demonstration purposes and testing so it gives an idea.

Creating multi-arch buildpacks should be simpler and may not even require a local registry because no other images are pulled in. That should allow you to create the arch-specific buildpack images locally and then create a manifest from there and push the manifest.

Let's say you compile a buildpack for amd64 in one folder amd64 and arm64 in another folder arm64. You then create local buildpack images for each arch such as mybuildpack:0.0.1-amd64 and mybuildpack:0.0.1-arm64. Then you can just create and push a manifest image as shown below and it would work on both architectures.

docker buildx imagetools create -t mybuildpack:0.0.1 mybuildpack:0.0.1-amd64 mybuildpack:0.0.1-amd64

There is one issue that needs to be considered though. At this time pack seems to set the architecture to amd64. I had to work around this issue when creating the multi-arch builder here, but I just wanted to call this out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is exactly what I was thinking. I don't think we need to make any changes to jam pack to achieve this.

It would look something like the following:

  1. for arch in [amd64, arm64]

    1. compile buildpack executables for target architecture via go build; put them in arch-specific temp directory
    2. package buildpack with jam pack into arch-specific temp directory
    3. create arch-specific OCI image via pack buildpack package, passing in contents of arch-specific temp directory
    4. upload image to local docker daemon/container-runtime/engine (whatever the correct terminology is here) with arch-specific tag
  2. create single multi-arch image via docker builds imagetools create and the arch-specific tagged images previously uploaded to local daemon

  3. publish single image to relevant registries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan was to not actually publish the intermediate images either. Have them live in the Docker daemon, then make the multi-arch index image and publish that. We push to the docker daemon by default, so that's easy. If you're not pushing to it already you can combine multiple docker image archive files into an index image pretty easily (kind of similar to that PR I send to jam recently, but that would need to be done in code. I'm not sure it would be any more efficient, so we'll be going the daemon route.

Copy link
Contributor

@dmikusa dmikusa Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of an implementation detail, not sure it needs to go into the RFC, but skopeo won't work with multi-arch images in OCI image files which get used in a number of places so jam may need some tooling for that. Not sure if the commands I added for publishing the stacks will work with buildpacks as well? Maybe 🤔

On the pipeline-builder side, we use crane, and has similar shortcomings when it comes to multi-arch images in OCI image files. We may need a similar tool, although I'm not sure because we don't use OCI image files as much. All in all, it could be a good chance for both pipelines to remove some 3rd party tools and rally around jam as our solution though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thoughts. Pack is adding a manifest command for generating manifest images. This is part of their first step towards multi-arch support. If it is ready in time, it could be something we utilize instead of docker. Not sure what folks think about this?


These can then be pushed using skopeo with the above mentioned architecture-specific tags. Finally some `docker manifest` commands can be used to create the manifest list image and then push it to the registry.
Copy link
Member

@robdimsdale robdimsdale Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can't use skopeo to publish multi-architecture images - see this PR to jam which adds support for multi-arch stacks to jam directly. We would probably have to do something similar for multi-arch buildpacks - i.e. jam publish-buildpack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


Users would have the same experience using the manifest list images on amd64, while allowing those who want/need arm64 images to benefit as well.


## Prior Art

As mentioned above @dmikusa put together [this](https://github.com/dmikusa/paketo-arm64) excellent guide for building arm64 images with paketo buildpacks. This repo, based on the aforementioned example, is publishing arm version of paketo builders and buildpacks. While it is great to see community involvement in pushing this forward, I think it is time for the paketo maintainers to agree on a path forward so the community can start contributing towards that goal.


## Unresolved Questions and Bikeshedding
robdimsdale marked this conversation as resolved.
Show resolved Hide resolved

jpena-r7 marked this conversation as resolved.
Show resolved Hide resolved
### Testing on arm64

As discussed [here](https://github.com/actions/runner-images/issues/5631), github actions currently only supports amd64 runners. This means that unit and integration tests cannot be run on arm64.

There are two ways to approach this issue. The first is to only run tests for amd64 and label all arm64 artifacts as experimental. The second is to use self-hosted or third-party arm64 runners so everything can be tested on both architectures. Given that paketo bulidpacks are used for production workloads, I think having arm64 runners will be necessary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do both - start with experimental support for arm images for buildpacks that don't require dependencies compiled for arm64 (see my comment above) while we work on getting arm64 runners.

For buildpacks that don't have dependencies that need compiling, we should be able to cross-compile the buildpacks themselves and drop-in arm64 dependencies from upstream, and distribute them as "experimental" before we are truly able to test them.

That seems like a helpful path forward - the community gets easy access to arm64 buildpacks and we get feedback on them, while we work on getting the necessary prerequisites (i.e. arm64 runners) to be confident in labeling the buildpacks production ready.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to start this endeavour with an experimental flag at first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should absolutely cross compile the Buildpacks. I have been doing this in the Arm64 scripts I have and it works perfectly.

We do have access to arm runners through BuildJet but we should reserve these for things where it is absolutely necessary like compiling dependencies. They are not free like the standard GitHub runners.

Copy link
Contributor

@dmikusa dmikusa Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think we need to run all of the test suites on both architectures or use an experimental label. There's very little that is architecture specific about a buildpack. It is really just the code that installs dependencies. That code can be mocked out in tests so we can be confident functionality works correctly regardless of the architecture.

We could potentially look at a small suite of smoke tests for both architectures that we run before release of composites or possibly using some sort of test flags to only run a subset of tests on Arm64. I think we should start without doing this though and see how things go, there will be a period of experimental/beta support for Arm64 as we develop it. We can just watch to see if any platform specific issues crop up and react accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'm mostly concerned about the integration / end-to-end testing of the arm64 compiled dependencies. We already run into issues just with the amd64 compiled dependencies and we have fairly decent test suites for those. I would not be confident calling the arm64-compiled-dependency buildpacks production-ready without integration testing them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're on the same page, but just trying to clarify how I'm thinking about this more.

I think we should treat buildpacks and dependencies as two different things. At the layer of buildpacks, there's an interface between the buildpack and the dependency. It expects files in a certain layout and it expects certain executables to exist. As long as the contract is the same per arch, which it generally should be, then testing buildpacks once should be enough. Even if it's not, you can very like mock or stub those differences and the arch on which those tests runs is irrelevant.

I agree that for the cases where we build and publish dependencies that we should do more testing on those dependencies. If we compile from source we should run test suites that provide some basic level of fitness on those binaries. That's going to require arm runners.

If we download upstream binaries, like with Java or Go, I don't think we need to do that. IMO, that's out of our scope. The publisher of those binaries owns that responsibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. For the buildpacks that don't use libpak (i.e. everything except java/utilities), in order to get the current level of integration test coverage and confidence we have for amd64, we have to get each and every buildpack working on arm64. The integration tests for each buildpack use the buildpack before it. For example, the integration tests for pip-install run a pack build with the pip-install buildpack but also with cpython and pip. Sure, we could just unit-test the buildpack on arm64, but I don't feel comfortable with that reduced level of support.

If you have a different level of comfort for the java buildpacks - which would make sense because you don't really have integration tests for the java buildpacks in the same way as the other Paketo buildpacks - then I'd understand that, but I don't think we can extrapolate that to become a general assertion for all Paketo buildpacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with agreeing to disagree on this point, I think we can do that and still move forward with this effort. Each language family team needs to make the cost/effort versus benefit evaluation and act accordingly. I'm confident each team will make the right choices for their user bases, so perhaps we can put the following takeaways into the RFC:

  1. The project has ARM64 runners for teams to use as needed.
  2. Each language team will need to decide what "production-ready" means. Things for each team to consider: confidence in the code, automated release for timely updates, and feature parity with x86.

Anything else come to mind as a good guideline?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds perfectly reasonable. I don't think I have much else to add to the takeaways.