Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(workspace): don't delete directory on every reconciliation #276

Merged
merged 1 commit into from
Jun 28, 2024

Conversation

arththebird
Copy link
Contributor

@arththebird arththebird commented Jun 28, 2024

Description of your changes

Stop removing the Workspace directory on every reconciliation. This is no longer needed as the bug in go-getter was fixed in v1.7.5 and go-getter was upgraded to latest by Renovate in this PR.

Not removing the workspace directory also results in terraform init not being run every time when there's no change because the checksum is the same. This effectively allow using max-reconcile-rate > 0 while using the plugin cache (issues can still happen if a workspace is applying while another one is being init, but it happens much less frequently than before when there's no changes).

Fixes #275
Fixes #230

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

  1. Deploy the provider locally using make local-deploy
  2. Deploy a workspace using a remote repository as the module source
  3. Check the creation time of the .terraform directory in the workspace directory and notice that it doesn't get deleted on every reconciliation
  4. Check the provider logs and notice that the workspace is no longer being init on every reconciliation
2024-06-28T18:01:04.009Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:01:08.702Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:01:08.908Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:03:00.169Z"}
2024-06-28T18:03:00.180Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:03:04.900Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:03:05.157Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:03:46.505Z"}
2024-06-28T18:03:46.519Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:03:51.150Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:03:51.337Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:03:53.990Z"}
2024-06-28T18:03:53.996Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:03:59.063Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:03:59.263Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:04:59.361Z"}
2024-06-28T18:04:59.376Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:05:04.305Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:05:04.568Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:07:00.865Z"}
2024-06-28T18:07:00.786Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:07:05.777Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:07:05.993Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:08:51.454Z"}
2024-06-28T18:08:51.464Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:08:56.196Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:08:56.409Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:10:35.639Z"}
2024-06-28T18:10:35.647Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:10:40.364Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:10:40.554Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:11:09.485Z"}
2024-06-28T18:11:09.491Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:11:14.273Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:11:14.547Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:12:55.419Z"}
2024-06-28T18:12:55.427Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:13:00.382Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:13:00.609Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:14:51.656Z"}
2024-06-28T18:14:51.661Z	DEBUG	provider-terraform	Reconciling	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:14:57.491Z	DEBUG	provider-terraform	Checksums match - skip running terraform init	{"request": "standard-metadata"}
2024-06-28T18:14:57.656Z	DEBUG	provider-terraform	External resource is up to date	{"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:16:48.223Z"}

@Upbound-CLA
Copy link

Upbound-CLA commented Jun 28, 2024

CLA assistant check
All committers have signed the CLA.

@arththebird arththebird changed the title feat(workspace): don't delete workspace on every reconciliation feat(workspace): don't delete directory on every reconciliation Jun 28, 2024
@arththebird arththebird changed the title feat(workspace): don't delete directory on every reconciliation fix(workspace): don't delete directory on every reconciliation Jun 28, 2024
Copy link
Collaborator

@bobh66 bobh66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@bobh66
Copy link
Collaborator

bobh66 commented Jun 28, 2024

I'm wondering why the CI pipeline didn't run on this PR? I'm reasonably sure the fix is fine but I'd feel a lot better if the CI jobs had run.

@bobh66 bobh66 merged commit ae93265 into upbound:main Jun 28, 2024
1 check passed
@ytsarev
Copy link
Member

ytsarev commented Jun 28, 2024

Hi @bobh66, we have an issue with the multiple runner label selection from yesterday in reusable workflows. It should be fixed after the merge of upbound/official-providers-ci#209

@arththebird thanks a ton for this contribution, just amazing! 🥇

@arththebird arththebird deleted the feat/keep-workspace-dir branch July 2, 2024 15:11
@arththebird
Copy link
Contributor Author

@arththebird thanks a ton for this contribution, just amazing! 🥇

My pleasure, thank you for the fast review and merge 🙌

Not sure how the release cycle for this provider is setup, I'm wondering when can we expect this to be released officially?

@bobh66 bobh66 mentioned this pull request Jul 2, 2024
2 tasks
@ytsarev
Copy link
Member

ytsarev commented Jul 8, 2024

@arththebird I had an internal chat with @turkenf, we target the release for this Thursday 👍

@project-administrator
Copy link

Installed the latest release.
provider-terraform is still deleting the workspace and re-initing it on every reconciliation. I don't have any directories under the "/tf" inside the provider pod with the modification date older than 1 hour.

@bobh66
Copy link
Collaborator

bobh66 commented Jul 11, 2024

@project-administrator do the provider logs indicate that terraform init is being run every reconciliation? Is it possible that go-getter is updating the timestamp of the directory on every reconciliation loop?

@project-administrator
Copy link

project-administrator commented Jul 11, 2024

@bobh66 checked that and verified that checksums never match:

2024-07-11T18:11:14.815Z	DEBUG	provider-terraform	Checksums don't match so run terraform init:	{"request": "demo-dev", "old": "ccbd114f2f61a213c57f3fc023222b3d", "new": "4812c08f3f95a371c9c2e5a6cdbaf9e1"}

So now it's the checksum calculation taking smth into account that is not there.
I wonder if it might be either the crossplane-provider-config.tf which is generated by the Crossplane's providerconfigs.tf.upbound.io resource, or few files which are downloaded by the terraform run?..

I can see that GenerateChecksum uses the following command:

/usr/bin/find . -path ./.git -prune -o -path ./.terraform -prune -o -type f -exec /usr/bin/md5sum {} +

With our config it always picks up more files than needed:

19e9bd65283e39a75bd15d532eeb4a3a  ./.git-credentials
c115cb3d433d8fc4304272d851cc4694  ./crossplane-provider-config.tf
d1b0e16cfb7f8fa5004dc6494c0bbc73  ./.terraform/terraform.tfstate
60715e2ecfcf0d2885edd52f0a506cf5  ./.terraform/environment
9a11bb6c5701d7300a3c6ed8c33dc6dd  ./.terraform.lock.hcl
0507597169bd025f95803b0d1713f943  ./rds-combined-ca-bundle.pem
84a4bf3e807ebcd712ed17a7a9fd576b  ./rds-combined-ca-bundle.crt

None of these are present in the remote git repo...

@arththebird
Copy link
Contributor Author

arththebird commented Jul 11, 2024

I don't have any directories under the "/tf" inside the provider pod with the modification date older than 1 hour.

I see the same thing, but I think @bobh66 theory about go-getter updating the timestamp makes sense because if I check the .terraform directory timestamp (which is not in the remote git repo), it doesn't get updated. My understanding is that deleting this folder is what was causing the checksum diff on every loop in the previous version.

@project-administrator
Copy link

project-administrator commented Jul 11, 2024

These files should be ignored when checksum is calculated.
But they are being taken into consideration and this breaks the checksum calculation:

19e9bd65283e39a75bd15d532eeb4a3a  ./.git-credentials
c115cb3d433d8fc4304272d851cc4694  ./crossplane-provider-config.tf
d1b0e16cfb7f8fa5004dc6494c0bbc73  ./.terraform/terraform.tfstate
60715e2ecfcf0d2885edd52f0a506cf5  ./.terraform/environment
9a11bb6c5701d7300a3c6ed8c33dc6dd  ./.terraform.lock.hcl

@project-administrator
Copy link

Also, does it make sense to calculate the checksum of the URL address only for the remote repository module source? Why do we need to fetch files each time to calculate the checksum?
I mean, if the URL has not changed, then it must be the same TF workspace source, therefore there is no need to run terraform init if it has been run once before.

@bobh66
Copy link
Collaborator

bobh66 commented Jul 11, 2024

Also, does it make sense to calculate the checksum of the URL address only for the remote repository module source? Why do we need to fetch files each time to calculate the checksum? I mean, if the URL has not changed, then it must be the same TF workspace source, therefore there is no need to run terraform init if it has been run once before.

We need to pull the content of the repo (using git fetch/pull) to determine if the content of the repo has changed. We can't assume that the repo is static.

@project-administrator
Copy link

project-administrator commented Jul 11, 2024

We need to pull the content of the repo (using git fetch/pull) to determine if the content of the repo has changed. We can't assume that the repo is static.

Do you think it's a good idea to introduce an optional parameter to control how the checksum is calculated? (repo URL vs repo contents).
For example, we're always using the remote repo git URL with a tag. So there can't be any change unless the tag is updated, and that would be a different URL (as the tag is a part of the URL)

@bobh66
Copy link
Collaborator

bobh66 commented Jul 13, 2024

@project-administrator it's an interesting point - with the existing implementation if the remote URL changes it will now git pull the new repo into the existing repo directory, having removed only the .git directory. We would need to store the current remote URL somewhere (it is in .git/config) and compare it with the current spec.forProvider.module. If they are different then the entire directory should be removed so a clean git pull can occur.

Since we don't require the use of a ref it's entirely possible for people to use a repo that has changes being pushed into it. We have no way of knowing that happens so we need to run the equivalent of git fetch and git pull on every reconciliation.

I wonder if a source of RemoteOnce or something similar would be appropriate - it says to pull the remote content one time and never pull it again (assuming that it exists in the container). The existing Remote value is the equivalent of RemoteAlways - it's basically a pullPolicy for the remote repo.

Though that still doesn't help with the remote URL changing - that we would still need to check for explicitly.

@ytsarev - any thoughts?

@ytsarev
Copy link
Member

ytsarev commented Jul 14, 2024

@bobh66 I like the idea. We can create something like remoteSourcePullPolicy symmetric to the well-known https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy

@bobh66
Copy link
Collaborator

bobh66 commented Jul 14, 2024

Thanks @ytsarev - IfNotPresent and Always seem to make the most sense. I'm not sure that Never applies to our use case. Are there any other possible scenarios? I'm wondering what happens if the module changes when source is Remote but the pull policy is IfNotPresent - I assume we automatically assume the new module content is not present and pull it?

@bobh66
Copy link
Collaborator

bobh66 commented Jul 15, 2024

Also should we save the current Remote module value in status just to make it easier to compare with the current spec.module so we know if it changed without needing to read .git/config? It seems like that might be a more robust solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants