Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use shared caches in the deploymentconfig controller #9002

Merged
merged 3 commits into from
Jun 22, 2016
Merged

Use shared caches in the deploymentconfig controller #9002

merged 3 commits into from
Jun 22, 2016

Conversation

0xmichalis
Copy link
Contributor

@0xmichalis 0xmichalis commented May 24, 2016

@0xmichalis
Copy link
Contributor Author

I've kept the runnable controller bits around to make the diff easier to review - will remove once this is good to merge.

@smarterclayton
Copy link
Contributor

What's the impact on memory usage?

On May 24, 2016, at 10:17 AM, Michail Kargakis [email protected]
wrote:

@deads2k https://github.com/deads2k @smarterclayton
https://github.com/smarterclayton @mfojtik https://github.com/mfojtik
@ironcladlou https://github.com/ironcladlou

I still need the ratelimiting queue and shared caches to land with the

rebase

You can view, comment on, or merge this pull request online at:

#9002
Commit Summary

  • Use caches in the deploymentconfig controller

File Changes

Patch Links:


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#9002

@deads2k
Copy link
Contributor

deads2k commented May 24, 2016

What's the impact on memory usage?

For a shared cache? should be near zero additional.

@smarterclayton
Copy link
Contributor

Except no one else would be sharing these caches yet. So I'm making sure
we have the other work lined up.

On May 24, 2016, at 10:29 AM, David Eads [email protected] wrote:

What's the impact on memory usage?

For a shared cache? should be near zero additional.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#9002 (comment)

@ironcladlou
Copy link
Contributor

Maybe not applicable here yet, but just to get the idea down: we could build an index from deploymentConfigs using their automatic image triggers so that the image change controller could do a lookup of triggered image stream tag name -> triggered deployment configs. That controller currently does a full DC list/scan for every image stream update.

@smarterclayton
Copy link
Contributor

Image change trigger controller is going to be made generic, but we
definitely want to do that.

On May 24, 2016, at 10:50 AM, Dan Mace [email protected] wrote:

Maybe not applicable here yet, but just to get the idea down: we could
build an index from deploymentConfigs using their automatic image triggers
so that the image change controller could do a lookup of triggered image
stream tag name -> triggered deployment configs. That controller currently
does a full DC list/scan for every image stream update.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#9002 (comment)

@0xmichalis
Copy link
Contributor Author

Maybe not applicable here yet, but just to get the idea down: we could build an index from deploymentConfigs using their automatic image triggers so that the image change controller could do a lookup of triggered image stream tag name -> triggered deployment configs. That controller currently does a full DC list/scan for every image stream update.

That controller should also requeue image streams when a deploymentconfig with ICT is created. Currently, if a deploymentconfig with an ICT is created, its image will be resolved in the next relist of the image stream (if there is no new image pushed in the meantime)

@0xmichalis
Copy link
Contributor Author

[test]

// osClient provides access to OpenShift resources.
osClient osclient.Interface
// oc provides access to OpenShift resources.
oc osclient.Interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tighten these down the namespacers you need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@0xmichalis
Copy link
Contributor Author

[test]

@0xmichalis
Copy link
Contributor Author

FAILURE after 0.348s: test/cmd/volumes.sh:56: executing 'oc set volumes dc/test-deployment-config --list' expecting success: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
The connection to the server 127.0.0.1:28443 was refused - did you specify the right host or port?

This must be real.

@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 31, 2016
@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2016
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2016
@0xmichalis 0xmichalis changed the title [WIP] Use caches in the deploymentconfig controller Use shared caches in the deploymentconfig controller Jun 16, 2016
@0xmichalis
Copy link
Contributor Author

now integration flake, [test]

@0xmichalis
Copy link
Contributor Author

integration and conformance flaked on yum again, I cannot see why origin check failed

@0xmichalis
Copy link
Contributor Author

[test]

@0xmichalis
Copy link
Contributor Author

@smarterclayton @mfojtik @ironcladlou any other comments from you?

@0xmichalis
Copy link
Contributor Author

[test]

@0xmichalis
Copy link
Contributor Author

c.queue.AddRateLimited(key)
return
}
utilruntime.HandleError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So by default, errors that are neither fatal, nor transient are treated as fatal? You sure you want to default in that direction? I'm ok with it, but sometimes you'll be getting APIs errors (update errors as a for instance), that are likely to be worth a retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not treated as fatal because we dont' forget them. HandleError currently logs only but I guess I should explicitly log here. Update conflicts should be transientErrors but it seems it;s not the case currently

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not treated as fatal because we dont' forget them. HandleError currently logs only but I guess I should explicitly log here. Update conflicts should be transientErrors but it seems it;s not the case currently

But you don't re-queue either, so you won't retry until the object changes again, which may not happen for a very long time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you don't re-queue either, so you won't retry until the object changes again, which may not happen for a very long time.

How does the periodic relist works here? Won't we requeue everything that's in etcd every two minutes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the periodic relist works here? Won't we requeue everything that's in etcd every two minutes?

We want to push that time out longer. A lot longer, because the cost of doing that in a large cluster is extremely high.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then what's the difference between current transient errors and other non-fatal errors? Should we redefine error handling of the controller in this pull?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then what's the difference between current transient errors and other non-fatal errors? Should we redefine error handling of the controller in this pull?

If we need to do that to prevent bad requeuing behavior, yes.

@deads2k
Copy link
Contributor

deads2k commented Jun 21, 2016

One more question/comment, then lgtm.

@stevekuznetsov
Copy link
Contributor

Failure is #9457

@deads2k
Copy link
Contributor

deads2k commented Jun 21, 2016

One more question/comment, then lgtm.

Oh, almost. The workqueue keeps track of the number of failures. If you exceed some number of retries, forget the entry and don't requeue.

@0xmichalis
Copy link
Contributor Author

Oh, almost. The workqueue keeps track of the number of failures. If you exceed some number of retries, forget the entry and don't requeue.

So far, we retry transient errors forever. Feels like a separate issue. WDYT?

@0xmichalis
Copy link
Contributor Author

So far, we retry transient errors forever. Feels like a separate issue. WDYT?

Opened #9468. If there are no other comments, I would love to have this merged by today because it's blocking a handful of other pulls.

}
glog.V(4).Infof("Updated the status for %q (observed generation: %d)", deployutil.LabelForDeploymentConfig(config), config.Status.ObservedGeneration)
return nil
}

func (c *DeploymentConfigController) handleErr(err error, key interface{}) {
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can have non-nil errors that aren't fatal or transient, you should decide how they're treated. I'd probably default transient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, went ahead and removed the transient check - now everything non-fatal will be retried for 10 times. We can do more in #9468

@deads2k
Copy link
Contributor

deads2k commented Jun 21, 2016

lgtm [merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to cdbabcb

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/5186/)

@0xmichalis
Copy link
Contributor Author

yum flake (#8571), re[merge]

@0xmichalis
Copy link
Contributor Author

Flaked on #9480

[merge]

@openshift-bot
Copy link
Contributor

openshift-bot commented Jun 22, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/5230/) (Image: devenv-rhel7_4433)

@mfojtik
Copy link
Contributor

mfojtik commented Jun 22, 2016

[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to cdbabcb

@openshift-bot openshift-bot merged commit e5ff67e into openshift:master Jun 22, 2016
@0xmichalis 0xmichalis deleted the refactor-dc-controller-to-use-caches branch June 22, 2016 12:23
@0xmichalis
Copy link
Contributor Author

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants