Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to publish multi-arch buildpacks #288

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jpena-r7
Copy link

Readable

Checklist

  • I have viewed, signed, and submitted the Contributor License Agreement.
  • I have linked issue(s) that this PR should close using keywords or the Github UI (See docs)
  • I have added an integration test, if necessary.
  • I have reviewed the styleguide for guidance on my code quality.
  • I'm happy with the commit history on this PR (I have rebased/squashed as needed).

@jpena-r7 jpena-r7 requested a review from a team as a code owner April 26, 2023 23:33
@loewenstein
Copy link

I haven't really been following the existing arm64 work in the cloud native buildpacks and paketo communities. One thing I am wondering about in the context of this RFC is what the longer-term picture is? Is this RFC a stepping stone towards pack build --multi-arch (or whatever the flag for this would be) producing multi architecture application images with Paketo buildpacks - I assume that's what we want in the end.


My proposal is to update jam to create buildpacks for amd64 and arm64. I haven't used jam yet, so I don't have specific code changes for it and would appreciate thoughts on how this could be added.

These can then be pushed using skopeo with the above mentioned architecture-specific tags. Finally some `docker manifest` commands can be used to create the manifest list image and then push it to the registry.
Copy link
Member

@robdimsdale robdimsdale Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can't use skopeo to publish multi-architecture images - see this PR to jam which adds support for multi-arch stacks to jam directly. We would probably have to do something similar for multi-arch buildpacks - i.e. jam publish-buildpack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


As discussed [here](https://github.com/actions/runner-images/issues/5631), github actions currently only supports amd64 runners. This means that unit and integration tests cannot be run on arm64.

There are two ways to approach this issue. The first is to only run tests for amd64 and label all arm64 artifacts as experimental. The second is to use self-hosted or third-party arm64 runners so everything can be tested on both architectures. Given that paketo bulidpacks are used for production workloads, I think having arm64 runners will be necessary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do both - start with experimental support for arm images for buildpacks that don't require dependencies compiled for arm64 (see my comment above) while we work on getting arm64 runners.

For buildpacks that don't have dependencies that need compiling, we should be able to cross-compile the buildpacks themselves and drop-in arm64 dependencies from upstream, and distribute them as "experimental" before we are truly able to test them.

That seems like a helpful path forward - the community gets easy access to arm64 buildpacks and we get feedback on them, while we work on getting the necessary prerequisites (i.e. arm64 runners) to be confident in labeling the buildpacks production ready.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to start this endeavour with an experimental flag at first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should absolutely cross compile the Buildpacks. I have been doing this in the Arm64 scripts I have and it works perfectly.

We do have access to arm runners through BuildJet but we should reserve these for things where it is absolutely necessary like compiling dependencies. They are not free like the standard GitHub runners.

Copy link
Contributor

@dmikusa dmikusa Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think we need to run all of the test suites on both architectures or use an experimental label. There's very little that is architecture specific about a buildpack. It is really just the code that installs dependencies. That code can be mocked out in tests so we can be confident functionality works correctly regardless of the architecture.

We could potentially look at a small suite of smoke tests for both architectures that we run before release of composites or possibly using some sort of test flags to only run a subset of tests on Arm64. I think we should start without doing this though and see how things go, there will be a period of experimental/beta support for Arm64 as we develop it. We can just watch to see if any platform specific issues crop up and react accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'm mostly concerned about the integration / end-to-end testing of the arm64 compiled dependencies. We already run into issues just with the amd64 compiled dependencies and we have fairly decent test suites for those. I would not be confident calling the arm64-compiled-dependency buildpacks production-ready without integration testing them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're on the same page, but just trying to clarify how I'm thinking about this more.

I think we should treat buildpacks and dependencies as two different things. At the layer of buildpacks, there's an interface between the buildpack and the dependency. It expects files in a certain layout and it expects certain executables to exist. As long as the contract is the same per arch, which it generally should be, then testing buildpacks once should be enough. Even if it's not, you can very like mock or stub those differences and the arch on which those tests runs is irrelevant.

I agree that for the cases where we build and publish dependencies that we should do more testing on those dependencies. If we compile from source we should run test suites that provide some basic level of fitness on those binaries. That's going to require arm runners.

If we download upstream binaries, like with Java or Go, I don't think we need to do that. IMO, that's out of our scope. The publisher of those binaries owns that responsibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. For the buildpacks that don't use libpak (i.e. everything except java/utilities), in order to get the current level of integration test coverage and confidence we have for amd64, we have to get each and every buildpack working on arm64. The integration tests for each buildpack use the buildpack before it. For example, the integration tests for pip-install run a pack build with the pip-install buildpack but also with cpython and pip. Sure, we could just unit-test the buildpack on arm64, but I don't feel comfortable with that reduced level of support.

If you have a different level of comfort for the java buildpacks - which would make sense because you don't really have integration tests for the java buildpacks in the same way as the other Paketo buildpacks - then I'd understand that, but I don't think we can extrapolate that to become a general assertion for all Paketo buildpacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with agreeing to disagree on this point, I think we can do that and still move forward with this effort. Each language family team needs to make the cost/effort versus benefit evaluation and act accordingly. I'm confident each team will make the right choices for their user bases, so perhaps we can put the following takeaways into the RFC:

  1. The project has ARM64 runners for teams to use as needed.
  2. Each language team will need to decide what "production-ready" means. Things for each team to consider: confidence in the code, automated release for timely updates, and feature parity with x86.

Anything else come to mind as a good guideline?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds perfectly reasonable. I don't think I have much else to add to the takeaways.

@dmikusa
Copy link
Contributor

dmikusa commented Apr 27, 2023

One thing I am wondering about in the context of this RFC is what the longer-term picture is? Is this RFC a stepping stone towards pack build --multi-arch (or whatever the flag for this would be) producing multi architecture application images with Paketo buildpacks - I assume that's what we want in the end.

@loewenstein the current work upstream is adding a lower level feature like Docker manifest so you can compose two images into an index.

Down the road that will be more automatic. The normal pack commands will be enhanced to support building multi arch images.

In the meantime Paketo will have to fill in the gaps in our pipelines. Maybe with some additional tooling too. That's what we're doing with the stack builds now, enhancing jam to publish a multi arch image for the stack.

@dmikusa
Copy link
Contributor

dmikusa commented Apr 28, 2023

@jpena-r7 Thanks so much for writing this up. Excellent to pull all of this information together and help get everyone focused on this topic.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 2, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: jpena-r7 / name: Jerico Pena (cbc8ed5)
  • ✅ login: jericop / name: Jerico Pena (f71a028)

@robdimsdale
Copy link
Member

@jpena-r7 FYI looks like you made changes under the @jericop account which hasn't signed a CLA. I'm not sure if you want to recreate them under your Rapid 7 account or sign the CLA on the other account.


I haven't looked at github workflows for all buildpacks, but I believe [jam](https://github.com/paketo-buildpacks/jam) is being used to package buildpacks. I did see skopeo being used to push images ([cpython](https://github.com/paketo-buildpacks/cpython/blob/main/.github/workflows/push-buildpackage.yml#L62)).

My proposal is to update jam to create buildpacks for amd64 and arm64. I haven't used jam yet, so I don't have specific code changes for it and would appreciate thoughts on how this could be added.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm thinking more about this, it's not obvious to me what, if any, changes are required to jam pack to support building multi-arch buildpacks. jam pack doesn't compile the buildpack; it just packages it. It also doesn't create OCI images - pack buildpack package does this.

Expanding on this in more detail:

The current workflow to create a buildpack is:

  1. Compile the build and run binaries (e.g. go build).
  2. Run either jam pack (for packit buildpacks) or packager (for libbuildpack buildpacks). This primarily creates a tgz of the buildpack files. For packit buildpacks, this also does some extra steps like setting the buildpack version and optionally downloading dependencies. I'm not sure what other steps packager takes.
  3. pack buildpack package to turn that tgz into an OCI image.

Currently, jam pack creates a single tarball of a single buildpack with no concept of multi-architecture. You can optionally filter dependencies by --stack to only include the stack-specific dependencies in the resultant metadata (and downloaded dependencies), but given that stacks can be multi-architecture, this isn't really sufficient to filter dependencies by architecture.

It's not really obvious to me what it would mean for jam pack to support packaging up a buildpack in a multi-architecture way. It doesn't compile the binaries, and it doesn't produce OCI images. Filtering the dependencies by architecture would be an optimization to reduce metadata (and resultant buildpack size if downloading dependencies), but it's not a necessary step for creating the buildpack.

I suppose you could make modifications to jam pack to support packaging multiple versions of build and run for each target architecture and create a wrapper around them to invoke the correct one at run-time. But it seems just as easy to invoke jam pack multiple times - once per architecture - to create architecture-specific tarballs.

I don't know if pack buildpack package already supports multi-arch images. I took a quick look at the code, and it only seems to be aware of platform, which only contains OS (i.e. linux or windows). This leads me to think either we'd have to make changes to pack, or we'd have to create separate docker images for each architecture, and bundle them together. Perhaps that's what docker buildx imagetools create does - I'm not sure. Perhaps @dmikusa or @dashaun have more info on how to build multi-arch buildpacks in a single manifest?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that pack does not support multi-arch yet, and the new changes being introduced will allow users to create manifest images after they have already been built.

This is an example of creating multi-arch builders. It creates and publishes arch-specific tags, but it is meant for demonstration purposes and testing so it gives an idea.

Creating multi-arch buildpacks should be simpler and may not even require a local registry because no other images are pulled in. That should allow you to create the arch-specific buildpack images locally and then create a manifest from there and push the manifest.

Let's say you compile a buildpack for amd64 in one folder amd64 and arm64 in another folder arm64. You then create local buildpack images for each arch such as mybuildpack:0.0.1-amd64 and mybuildpack:0.0.1-arm64. Then you can just create and push a manifest image as shown below and it would work on both architectures.

docker buildx imagetools create -t mybuildpack:0.0.1 mybuildpack:0.0.1-amd64 mybuildpack:0.0.1-amd64

There is one issue that needs to be considered though. At this time pack seems to set the architecture to amd64. I had to work around this issue when creating the multi-arch builder here, but I just wanted to call this out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is exactly what I was thinking. I don't think we need to make any changes to jam pack to achieve this.

It would look something like the following:

  1. for arch in [amd64, arm64]

    1. compile buildpack executables for target architecture via go build; put them in arch-specific temp directory
    2. package buildpack with jam pack into arch-specific temp directory
    3. create arch-specific OCI image via pack buildpack package, passing in contents of arch-specific temp directory
    4. upload image to local docker daemon/container-runtime/engine (whatever the correct terminology is here) with arch-specific tag
  2. create single multi-arch image via docker builds imagetools create and the arch-specific tagged images previously uploaded to local daemon

  3. publish single image to relevant registries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan was to not actually publish the intermediate images either. Have them live in the Docker daemon, then make the multi-arch index image and publish that. We push to the docker daemon by default, so that's easy. If you're not pushing to it already you can combine multiple docker image archive files into an index image pretty easily (kind of similar to that PR I send to jam recently, but that would need to be done in code. I'm not sure it would be any more efficient, so we'll be going the daemon route.

Copy link
Contributor

@dmikusa dmikusa Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of an implementation detail, not sure it needs to go into the RFC, but skopeo won't work with multi-arch images in OCI image files which get used in a number of places so jam may need some tooling for that. Not sure if the commands I added for publishing the stacks will work with buildpacks as well? Maybe 🤔

On the pipeline-builder side, we use crane, and has similar shortcomings when it comes to multi-arch images in OCI image files. We may need a similar tool, although I'm not sure because we don't use OCI image files as much. All in all, it could be a good chance for both pipelines to remove some 3rd party tools and rally around jam as our solution though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thoughts. Pack is adding a manifest command for generating manifest images. This is part of their first step towards multi-arch support. If it is ready in time, it could be something we utilize instead of docker. Not sure what folks think about this?

Right now binaries are created on/for amd64 and then packaged up into buildpacks using pack. It should be easy to create binaries for arm64 which can then be packaged up into buildpacks using pack. These architecture-specific buildpacks can be pushed to a registry with a platform specific tag as shown in the examples below.

* `paketobuildpacks/ca-certificates:3.6-amd64`
* `paketobuildpacks/ca-certificates:3.6-arm64`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm reading about docker buildx, a lot of the examples seem to push a single image with multiple tags from a single Dockerfile. It looks like docker buildx imagetools create does the same thing but with multiple source images that already exist on a registry.

Is it possible to avoid the intermediate step of publishing buildpacks to the :<version>-<arch> tag? It seems unnecessary once we've created a single manifest/image for all platforms. Could we create the images on a local registry and just push the single multi-arch image?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that pushing the arch-specific tags is messy and suspect it could be avoided with a local registry. Since creating buildpacks doesn't requiring pulling other images, the entire workflow should be able to run on amd64 (github runner).

I'll do some testing to validate this when I get a chance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back on this, what's the outcome? Can we go for publishing the single multi-arch image?

@jericop
Copy link

jericop commented May 2, 2023

@jpena-r7 FYI looks like you made changes under the @jericop account which hasn't signed a CLA. I'm not sure if you want to recreate them under your Rapid 7 account or sign the CLA on the other account.

I plan to start contributing from @jericop going forward so I signed the CLA. Sorry for the confusion.

Comment on lines +63 to +64
[This](https://github.com/buildpacks/rfcs/blob/main/text/0096-remove-stacks-mixins.md) rfc captures a lot of the concepts implemented in the spec changes. The following [PR](https://github.com/buildpacks/spec/pull/336/files#) to the buildpack spec introduces two new environment variables that will be set by the lifecycle during the detect and build phases, `CNB_TARGET_OS` and `CNB_TARGET_ARCH`. These changes are available in version `0.12` of the platform api as outlined [here](https://github.com/buildpacks/spec/pull/335). These changes were partially implemented in [this](https://github.com/buildpacks/lifecycle/issues/743) issue, but they have not yet made into pack ([here](https://github.com/buildpacks/pack/blob/v0.30.0-pre1/builder/config_reader.go)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should acknowledge stack removal but mark it as out of scope for this RFC, and just target the existing stack-based implementation. There's a whole ton of work that needs to happen for stack removal and some of it is just going to take time as we transition from one to the other.

I think if we focus on the stack-based implementation we have now, a.) we can simplify this RFC, b.) we can start work immediately on it, and c.) we can still implement because we just assume that a stack will have the require arch or the build wouldn't even run (I believe pack build would fail cause it couldn't download the right stack images, or it would fallback to amd64 images if on a Mac which permits both).

Comment on lines +67 to +69
Once all upstream changes to pack have been fully implemented and the newest version of pack and the lifecycle are being used the values in `CNB_TARGET_OS` and `CNB_TARGET_ARCH` will be set by paketo in the builder config toml. This means the changes I'm proposing to `libpak` and `packit` below can just read those environment variables and match any dependencies to the appropriate os and architecture.

Since there are a lot of moving pieces to this project, I suggest the change from using stacks to targets (run and build images) be done separately. This would mean that the `CNB_TARGET_OS` and `CNB_TARGET_ARCH` environment variables would need to be set in the function/method that reads the buildpack. At build time buildpacks should attempt to read those environment variables if set, but otherwise fall back to determining os and architecture from another source such as `GOOS` and `GOARCH` environment variables. This should ultimately match whatever is used to set those values in the builder too.
Copy link
Contributor

@dmikusa dmikusa Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think we should just skip the CNB_TARGET_* stuff and use GOARCH to match the right architecture for dependencies where it is required. This should be reliable as the stack that gets picked will have the right architecture. Then the build will use GOARCH and install deps for that arch.

I also think getting into things like GOOS in this PR is probably not necessary either and it can be more complicated than just the OS. There are a handful of other CNB_TARGET_* variables that can be used for matching compatibility with a binary. All of that may need to be taken into consideration, depending on the binaries being installed. For now, I think we are best to bury our heads in the sand and just stick with the stack approach. When we do the work for removing stacks, then we can discuss how to add in the metadata required for matching, which I think is going to have to include some sort of wildcard/pattern.

One thing that might be worth discussing/adding in regards to GOARCH is what to do for a non-arch-specific dependency. I do not believe GOARCH has a value defined for that, so we might need to define one (or the lack of one). Not sure if folks have a preference, we could use * or say that the lack of an architecture value on the dependency means it is not dependent on the architecture? or something else? Thoughts on that?


The upstream pack project is already creating multi-arch images for pack and the lifecycle using `docker buildx`, which creates manifest list images. An alternative approach is to create and publish separate images that are tagged by architecture. This would be the steps I mentioned above without the final manifest list image. I think this is an anti-pattern at this point given that multi-arch images are already a norm. I also suspect end users would end up creating their own manifest list images anyway. I think the biggest driver for this is to avoid end users cloning and rebuilding paketo buildpacks.

## Implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this RFC is missing information about stacks and builders. Both are really necessary to provide first-class support for Paketo users. I'd be OK calling it out-of-scope, if we don't want this RFC to get bigger, but I still think we should mention it because it's part of the overall picture to say that we support arm64.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point and I completely agree. I picked up where you left off on multi-arch stacks and will then look at builders. Once we have those in place I think we can move forward with this RFC.

@ForestEckhardt ForestEckhardt marked this pull request as draft August 3, 2023 13:43
Comment on lines +75 to +77
`Os` and `Architecture` string fields can be added to the [`Dependency`](https://github.com/paketo-buildpacks/packit/blob/v2/postal/buildpack.go#L15) struct in `packit/postal` to allow buildpacks to interrogate and use the new metadata when determining what external dependencies to download. This will allow `buildpack.toml` files to be updated to add in the `os` and `architecture` metadata for all dependencies. They would all be set to something like `os=linux` and `architecture=amd64` initially.

Some logic can be added to the [Resolve](https://github.com/paketo-buildpacks/packit/blob/18202009038b0df285ba0fb7d8b43abbf60d3ed0/postal/service.go#L89) method of `packit/postal` to pick a version that matches the `GOOS` and `GOARCH` environment variables whenever the `os` and `architecture` fields have been specified.
Copy link
Member

@sophiewigmore sophiewigmore Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the working group today (3/5), will this section be removed/modified? Also the Changes to buildpack.toml files section? Should we allow this to be specified in #297 maybe, and just allude that some type of buildpack.toml fields will be added and supported in packit/libpak as a result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the packit-specific code recommendations can be captured in an issue also

@loewenstein-sap
Copy link

With work being in progress already to fulfill this not yet accepted RFC... Should we make sure this actually get's merged as a basis of that ongoing work? WDYT @paketo-buildpacks/steering-committee?

@dmikusa
Copy link
Contributor

dmikusa commented Dec 11, 2024

With work being in progress already to fulfill this not yet accepted RFC... Should we make sure this actually get's merged as a basis of that ongoing work? WDYT @paketo-buildpacks/steering-committee?

I'm unsure of what we actually need to put into an RFC.

I could see us putting together an RFC that signals the intent to publish multi-arch images across Paketo. It would really just be a signal to users that we intend to do this across the whole project. IMO, it's probably better to just keep writing blog posts about our progress. Those are just as easy to site when folks ask questions about multi-arch support.

I don't think there's any technical coordination that needs to happen to publish multi-arch images though. Upstream has made tooling available to publish multi-arch images. We've done it for a number of buildpacks and it works fine. I don't see any reason why folks would not use that functionality in their implementations, but I suppose we could include that in an RFC.

Maybe there are some details with the stacks/builders that would be helpful to work out in an RFC. I would defer to that team though. From what I've observed, they're already working on it and I don't want to add a process if it's not needed. If everyone is in agreement on how we proceed, then we probably don't need an RFC and we just focus on the work.

That said, happy to help push an RFC forward if folks think it's helpful. Just let me know.

@loewenstein-sap
Copy link

I'm unsure of what we actually need to put into an RFC.

Sure. All I am saying is that this is a draft PR for an RFC - so either we get it merged in one shape or another or we close it with a comment that it doesn't need an RFC.

Either way, I would suggest we at least create a tracking issue for all the work involved, similarly to the one I created for adopting Ubuntu Noble. That way we can ping all maintainers and track overall progress in a central place.

@dmikusa
Copy link
Contributor

dmikusa commented Dec 11, 2024

@loewenstein +1 to both those points. I'd prefer we close this RFC-draft. Then use the issue to track the works that's been done and what's still to be completed.

@dmikusa
Copy link
Contributor

dmikusa commented Dec 11, 2024

If anyone objects to closing this RFC-draft. Please let us know, or we'll close this in one week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants