Use shared caches in the deploymentconfig controller #9002

0xmichalis · 2016-05-24T14:17:12Z

@deads2k @smarterclayton @mfojtik @ironcladlou

Fixes #7272
Fixes #9053
Required for #6233 and #8691

0xmichalis · 2016-05-24T14:19:39Z

I've kept the runnable controller bits around to make the diff easier to review - will remove once this is good to merge.

smarterclayton · 2016-05-24T14:24:47Z

What's the impact on memory usage?

On May 24, 2016, at 10:17 AM, Michail Kargakis [email protected]
wrote:

@deads2k https://github.com/deads2k @smarterclayton
https://github.com/smarterclayton @mfojtik https://github.com/mfojtik
@ironcladlou https://github.com/ironcladlou

I still need the ratelimiting queue and shared caches to land with the

rebase

You can view, comment on, or merge this pull request online at:

#9002
Commit Summary

Use caches in the deploymentconfig controller

File Changes

A pkg/client/cache/deploymentconfig.go
https://github.com/openshift/origin/pull/9002/files#diff-0 (44)
M pkg/cmd/server/origin/run_components.go
https://github.com/openshift/origin/pull/9002/files#diff-1 (12)
M pkg/deploy/controller/deploymentconfig/controller.go
https://github.com/openshift/origin/pull/9002/files#diff-2 (88)
M pkg/deploy/controller/deploymentconfig/factory.go
https://github.com/openshift/origin/pull/9002/files#diff-3 (270)
M pkg/deploy/util/util.go
https://github.com/openshift/origin/pull/9002/files#diff-4 (14)

Patch Links:

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#9002

deads2k · 2016-05-24T14:29:29Z

What's the impact on memory usage?

For a shared cache? should be near zero additional.

smarterclayton · 2016-05-24T14:32:35Z

Except no one else would be sharing these caches yet. So I'm making sure
we have the other work lined up.

On May 24, 2016, at 10:29 AM, David Eads [email protected] wrote:

What's the impact on memory usage?

For a shared cache? should be near zero additional.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#9002 (comment)

ironcladlou · 2016-05-24T14:50:06Z

Maybe not applicable here yet, but just to get the idea down: we could build an index from deploymentConfigs using their automatic image triggers so that the image change controller could do a lookup of triggered image stream tag name -> triggered deployment configs. That controller currently does a full DC list/scan for every image stream update.

smarterclayton · 2016-05-24T14:55:24Z

Image change trigger controller is going to be made generic, but we
definitely want to do that.

On May 24, 2016, at 10:50 AM, Dan Mace [email protected] wrote:

Maybe not applicable here yet, but just to get the idea down: we could
build an index from deploymentConfigs using their automatic image triggers
so that the image change controller could do a lookup of triggered image
stream tag name -> triggered deployment configs. That controller currently
does a full DC list/scan for every image stream update.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#9002 (comment)

0xmichalis · 2016-05-25T12:47:18Z

Maybe not applicable here yet, but just to get the idea down: we could build an index from deploymentConfigs using their automatic image triggers so that the image change controller could do a lookup of triggered image stream tag name -> triggered deployment configs. That controller currently does a full DC list/scan for every image stream update.

That controller should also requeue image streams when a deploymentconfig with ICT is created. Currently, if a deploymentconfig with an ICT is created, its image will be resolved in the next relist of the image stream (if there is no new image pushed in the meantime)

0xmichalis · 2016-05-25T13:06:00Z

[test]

deads2k · 2016-05-25T15:23:36Z

pkg/deploy/controller/deploymentconfig/controller.go

-	// osClient provides access to OpenShift resources.
-	osClient osclient.Interface
+	// oc provides access to OpenShift resources.
+	oc osclient.Interface


Can you tighten these down the namespacers you need?

0xmichalis · 2016-05-27T12:26:24Z

[test]

0xmichalis · 2016-05-27T13:52:24Z

FAILURE after 0.348s: test/cmd/volumes.sh:56: executing 'oc set volumes dc/test-deployment-config --list' expecting success: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
The connection to the server 127.0.0.1:28443 was refused - did you specify the right host or port?

This must be real.

0xmichalis · 2016-06-20T19:15:07Z

now integration flake, [test]

0xmichalis · 2016-06-20T21:21:45Z

integration and conformance flaked on yum again, I cannot see why origin check failed

0xmichalis · 2016-06-20T21:56:51Z

[test]

0xmichalis · 2016-06-20T21:57:18Z

@smarterclayton @mfojtik @ironcladlou any other comments from you?

0xmichalis · 2016-06-21T08:22:50Z

[test]

0xmichalis · 2016-06-21T10:01:59Z

@stevekuznetsov do you have any idea what's going on https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_check/2222/consoleFull ?

deads2k · 2016-06-21T12:05:24Z

pkg/deploy/controller/deploymentconfig/controller.go

+		c.queue.AddRateLimited(key)
+		return
+	}
+	utilruntime.HandleError(err)


So by default, errors that are neither fatal, nor transient are treated as fatal? You sure you want to default in that direction? I'm ok with it, but sometimes you'll be getting APIs errors (update errors as a for instance), that are likely to be worth a retry.

They are not treated as fatal because we dont' forget them. HandleError currently logs only but I guess I should explicitly log here. Update conflicts should be transientErrors but it seems it;s not the case currently

They are not treated as fatal because we dont' forget them. HandleError currently logs only but I guess I should explicitly log here. Update conflicts should be transientErrors but it seems it;s not the case currently

But you don't re-queue either, so you won't retry until the object changes again, which may not happen for a very long time.

But you don't re-queue either, so you won't retry until the object changes again, which may not happen for a very long time.

How does the periodic relist works here? Won't we requeue everything that's in etcd every two minutes?

How does the periodic relist works here? Won't we requeue everything that's in etcd every two minutes?

We want to push that time out longer. A lot longer, because the cost of doing that in a large cluster is extremely high.

Then what's the difference between current transient errors and other non-fatal errors? Should we redefine error handling of the controller in this pull?

Then what's the difference between current transient errors and other non-fatal errors? Should we redefine error handling of the controller in this pull?

If we need to do that to prevent bad requeuing behavior, yes.

deads2k · 2016-06-21T12:05:46Z

One more question/comment, then lgtm.

stevekuznetsov · 2016-06-21T12:06:21Z

Failure is #9457

deads2k · 2016-06-21T12:06:51Z

One more question/comment, then lgtm.

Oh, almost. The workqueue keeps track of the number of failures. If you exceed some number of retries, forget the entry and don't requeue.

0xmichalis · 2016-06-21T13:58:12Z

Oh, almost. The workqueue keeps track of the number of failures. If you exceed some number of retries, forget the entry and don't requeue.

So far, we retry transient errors forever. Feels like a separate issue. WDYT?

0xmichalis · 2016-06-21T16:46:10Z

So far, we retry transient errors forever. Feels like a separate issue. WDYT?

Opened #9468. If there are no other comments, I would love to have this merged by today because it's blocking a handful of other pulls.

deads2k · 2016-06-21T16:48:02Z

pkg/deploy/controller/deploymentconfig/controller.go

 	}
 	glog.V(4).Infof("Updated the status for %q (observed generation: %d)", deployutil.LabelForDeploymentConfig(config), config.Status.ObservedGeneration)
 	return nil
 }

+func (c *DeploymentConfigController) handleErr(err error, key interface{}) {
+	if err == nil {


you can have non-nil errors that aren't fatal or transient, you should decide how they're treated. I'd probably default transient.

ok, went ahead and removed the transient check - now everything non-fatal will be retried for 10 times. We can do more in #9468

deads2k · 2016-06-21T17:10:23Z

lgtm [merge]

openshift-bot · 2016-06-21T17:40:46Z

Evaluated for origin test up to cdbabcb

openshift-bot · 2016-06-21T18:59:19Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/5186/)

0xmichalis · 2016-06-22T08:07:29Z

yum flake (#8571), re[merge]

0xmichalis · 2016-06-22T09:36:09Z

Flaked on #9480

[merge]

openshift-bot · 2016-06-22T11:00:28Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/5230/) (Image: devenv-rhel7_4433)

mfojtik · 2016-06-22T11:01:51Z

[merge]

openshift-bot · 2016-06-22T11:05:32Z

Evaluated for origin merge up to cdbabcb

0xmichalis · 2016-06-22T12:23:57Z

🎉

0xmichalis assigned deads2k May 25, 2016

0xmichalis added component/apps area/performance labels May 25, 2016

0xmichalis mentioned this pull request May 25, 2016

Enhance status for deploymentconfigs #6233

Merged

deads2k reviewed May 25, 2016
View reviewed changes

This was referenced May 26, 2016

Reconcile configs when their deployments are terminated #9026

Closed

added cleanup policy for deployments #8691

Merged

0xmichalis mentioned this pull request May 27, 2016

oc scale not taking effect inmediatelly and not shown in console #9053

Closed

0xmichalis added the blocked label May 31, 2016

openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 31, 2016

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2016

0xmichalis removed the blocked label Jun 16, 2016

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2016

0xmichalis changed the title ~~[WIP] Use caches in the deploymentconfig controller~~ Use shared caches in the deploymentconfig controller Jun 16, 2016

This was referenced Jun 20, 2016

Deployment Hook Status Reporting #9260

Closed

Deployment issues for 1.3 #9450

Closed

deads2k reviewed Jun 21, 2016
View reviewed changes

0xmichalis added 2 commits June 21, 2016 19:08

cache: add a deploymentconfig lister

1635bef

deploy: caches obsolete dc update in the deployerpod controller

068ac20

deploy: use shared caches in the dc controller

cdbabcb

openshift-bot merged commit e5ff67e into openshift:master Jun 22, 2016

0xmichalis deleted the refactor-dc-controller-to-use-caches branch June 22, 2016 12:23

Use shared caches in the deploymentconfig controller #9002

Use shared caches in the deploymentconfig controller #9002

Conversation

0xmichalis commented May 24, 2016 • edited Loading

0xmichalis commented May 24, 2016

smarterclayton commented May 24, 2016

rebase

deads2k commented May 24, 2016

smarterclayton commented May 24, 2016

ironcladlou commented May 24, 2016

smarterclayton commented May 24, 2016

0xmichalis commented May 25, 2016

0xmichalis commented May 25, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xmichalis commented May 27, 2016

0xmichalis commented May 27, 2016

0xmichalis commented Jun 20, 2016

0xmichalis commented Jun 20, 2016

0xmichalis commented Jun 20, 2016

0xmichalis commented Jun 20, 2016

0xmichalis commented Jun 21, 2016

0xmichalis commented Jun 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Jun 21, 2016

stevekuznetsov commented Jun 21, 2016

deads2k commented Jun 21, 2016

0xmichalis commented Jun 21, 2016

0xmichalis commented Jun 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Jun 21, 2016

openshift-bot commented Jun 21, 2016

openshift-bot commented Jun 21, 2016

0xmichalis commented Jun 22, 2016

0xmichalis commented Jun 22, 2016

openshift-bot commented Jun 22, 2016 • edited Loading

mfojtik commented Jun 22, 2016

openshift-bot commented Jun 22, 2016

0xmichalis commented Jun 22, 2016

0xmichalis commented May 24, 2016 •

edited

Loading

openshift-bot commented Jun 22, 2016 •

edited

Loading