-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use shared caches in the deploymentconfig controller #9002
Use shared caches in the deploymentconfig controller #9002
Conversation
I've kept the runnable controller bits around to make the diff easier to review - will remove once this is good to merge. |
What's the impact on memory usage? On May 24, 2016, at 10:17 AM, Michail Kargakis [email protected] @deads2k https://github.com/deads2k @smarterclayton I still need the ratelimiting queue and shared caches to land with the rebaseYou can view, comment on, or merge this pull request online at: #9002
File Changes
Patch Links:
— |
For a shared cache? should be near zero additional. |
Except no one else would be sharing these caches yet. So I'm making sure On May 24, 2016, at 10:29 AM, David Eads [email protected] wrote: What's the impact on memory usage? For a shared cache? should be near zero additional. — |
Maybe not applicable here yet, but just to get the idea down: we could build an index from deploymentConfigs using their automatic image triggers so that the image change controller could do a lookup of triggered image stream tag name -> triggered deployment configs. That controller currently does a full DC list/scan for every image stream update. |
Image change trigger controller is going to be made generic, but we On May 24, 2016, at 10:50 AM, Dan Mace [email protected] wrote: Maybe not applicable here yet, but just to get the idea down: we could — |
That controller should also requeue image streams when a deploymentconfig with ICT is created. Currently, if a deploymentconfig with an ICT is created, its image will be resolved in the next relist of the image stream (if there is no new image pushed in the meantime) |
[test] |
// osClient provides access to OpenShift resources. | ||
osClient osclient.Interface | ||
// oc provides access to OpenShift resources. | ||
oc osclient.Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tighten these down the namespacers you need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
[test] |
This must be real. |
now integration flake, [test] |
integration and conformance flaked on yum again, I cannot see why origin check failed |
[test] |
@smarterclayton @mfojtik @ironcladlou any other comments from you? |
[test] |
@stevekuznetsov do you have any idea what's going on https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_check/2222/consoleFull ? |
c.queue.AddRateLimited(key) | ||
return | ||
} | ||
utilruntime.HandleError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So by default, errors that are neither fatal, nor transient are treated as fatal? You sure you want to default in that direction? I'm ok with it, but sometimes you'll be getting APIs errors (update errors as a for instance), that are likely to be worth a retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not treated as fatal because we dont' forget them. HandleError currently logs only but I guess I should explicitly log here. Update conflicts should be transientErrors but it seems it;s not the case currently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not treated as fatal because we dont' forget them. HandleError currently logs only but I guess I should explicitly log here. Update conflicts should be transientErrors but it seems it;s not the case currently
But you don't re-queue either, so you won't retry until the object changes again, which may not happen for a very long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you don't re-queue either, so you won't retry until the object changes again, which may not happen for a very long time.
How does the periodic relist works here? Won't we requeue everything that's in etcd every two minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the periodic relist works here? Won't we requeue everything that's in etcd every two minutes?
We want to push that time out longer. A lot longer, because the cost of doing that in a large cluster is extremely high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then what's the difference between current transient errors and other non-fatal errors? Should we redefine error handling of the controller in this pull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then what's the difference between current transient errors and other non-fatal errors? Should we redefine error handling of the controller in this pull?
If we need to do that to prevent bad requeuing behavior, yes.
One more question/comment, then lgtm. |
Failure is #9457 |
Oh, almost. The workqueue keeps track of the number of failures. If you exceed some number of retries, forget the entry and don't requeue. |
So far, we retry transient errors forever. Feels like a separate issue. WDYT? |
Opened #9468. If there are no other comments, I would love to have this merged by today because it's blocking a handful of other pulls. |
} | ||
glog.V(4).Infof("Updated the status for %q (observed generation: %d)", deployutil.LabelForDeploymentConfig(config), config.Status.ObservedGeneration) | ||
return nil | ||
} | ||
|
||
func (c *DeploymentConfigController) handleErr(err error, key interface{}) { | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can have non-nil errors that aren't fatal or transient, you should decide how they're treated. I'd probably default transient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, went ahead and removed the transient check - now everything non-fatal will be retried for 10 times. We can do more in #9468
lgtm [merge] |
Evaluated for origin test up to cdbabcb |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/5186/) |
yum flake (#8571), re[merge] |
Flaked on #9480 [merge] |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/5230/) (Image: devenv-rhel7_4433) |
[merge] |
Evaluated for origin merge up to cdbabcb |
🎉 |
@deads2k @smarterclayton @mfojtik @ironcladlou
Fixes #7272
Fixes #9053
Required for #6233 and #8691