suggestion: Refactor scaling strategy #1289

withinboredom · 2024-12-23T13:36:53Z

This PR simplifies the logic around scaling with a goal of reducing latency. Using the provided benchmarking scenarios:

mode	rps (#1266)	rps (this PR)	avg (#1266)	avg (this PR)	95% (#1266)	95% (this PR)
api	3,216	3,322	112.06 ms	108.36 ms	187.98 ms	181.48 ms
computation	1,578	1,589	48.69 ms	48.36 ms	92.36 ms	94 ms
database	6,144	7,264	17.48 ms	14.74 ms	29.52 ms	28.11 ms
hanging req	451	467	438.16 ms	435.51 ms	848.58 ms	785.05 ms
hello world	15,757	16,529	12.95 ms	12.33 ms	33.77 ms	32.22 ms
timeouts	518	711	299.69 ms	244.32 ms	36.08 ms	40.48 ms

This is more of a bigger suggestion in the form of a PR, but feel free to merge it or close it!

AlliBalliBaba

Your implementation definitely is more eager to scale, which is probably a good thing. I also like that it's simpler 👍

AlliBalliBaba · 2024-12-23T22:10:39Z

metrics.go

+func (n nullMetrics) GetWorkerQueueDepth(name string) int {
+	return 0
+}


Doesn't this mean that we will never scale in case of null metrics?

yeah, not sure if we want to do an actual implementation. nullMetrics isn't really supposed to be used (it just saves us from writing if metrics != nil { metrics.RecordMetric } in which case it would still be 0-ish behavior).

Oh so metrics are always collected even if disabled? Thinking about it, it would probably still be better if the behavior of the system was decoupled from the metrics implementation.

AlliBalliBaba · 2024-12-23T22:19:09Z

worker.go

-	// dispatch requests to all worker threads in order
-	worker.threadMutex.RLock()
-	for _, thread := range worker.threads {
-		select {
-		case thread.requestChan <- r:
-			worker.threadMutex.RUnlock()
-			<-fc.done
-			metrics.StopWorkerRequest(worker.fileName, time.Since(fc.startedAt))
-			return
-		default:
-			// thread is busy, continue
-		}
-	}
-	worker.threadMutex.RUnlock()


I'd like to keep the logic of dispatching requests to workers in order. It was introduced in #1289 and minimizes potential external connections and memory usage when not all threads are fully utilized. In other words, some workers stay 'hot' while higher index workers remain 'cold'.
It also reduces CPU contention, but that matter less than I initially expected so it's probably fine to remove it for regular threads.

withinboredom · 2024-12-24T13:36:03Z

Your implementation definitely is more eager to scale

I was thinking about cases where I'd want it to scale beyond my default setting:

large amounts of (un)expected traffic; spike
hitting rate limits of external services (causing all threads to be utilized)

and things like that, so in those cases we probably do want to be more eager in scaling. I was thinking it might be good to refactor it so that it can implement "autoscaling strategies" so we could have "conservative" which is more like your implementation, and "eager" which is more like mine, or mix-and-match them as needed. For example, an API probably wants aggressive scaling because it is more sensitive to latency, while an http job handler (such as those in google cloud) probably want a conservative one because they are less sensitive to latency and we don't want to steal threads from out api/frontend.

AlliBalliBaba · 2024-12-25T21:47:09Z

I think one big use case for me would also be making it easier to figure out the 'ideal' amount of threads in the first place (and having to care less about what that ideal is). Scaling should definitely converge towards that ideal, but I agree that the ideal probably is different for different use-cases.

At the same time I'd like to minimize configuration and can see a something like scaling_strategy aggressive being deprecated pretty quickly. Maybe something simple this?

scaling none
scaling slow
scaling normal (default)
scaling fast

withinboredom added 4 commits December 23, 2024 14:09

output the max threads

24e5e98

add metrics to track queue depth

a332a5e

remove per-thread channels

524a84f

add some guards around scaling if there is nothing in the queue

7d06649

withinboredom requested a review from Alliballibaba2 December 23, 2024 13:36

AlliBalliBaba reviewed Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suggestion: Refactor scaling strategy #1289

suggestion: Refactor scaling strategy #1289

withinboredom commented Dec 23, 2024

AlliBalliBaba left a comment

AlliBalliBaba Dec 23, 2024

withinboredom Dec 24, 2024

AlliBalliBaba Dec 25, 2024

AlliBalliBaba Dec 23, 2024

withinboredom commented Dec 24, 2024

AlliBalliBaba commented Dec 25, 2024

suggestion: Refactor scaling strategy #1289

Are you sure you want to change the base?

suggestion: Refactor scaling strategy #1289

Conversation

withinboredom commented Dec 23, 2024

AlliBalliBaba left a comment

Choose a reason for hiding this comment

AlliBalliBaba Dec 23, 2024

Choose a reason for hiding this comment

withinboredom Dec 24, 2024

Choose a reason for hiding this comment

AlliBalliBaba Dec 25, 2024

Choose a reason for hiding this comment

AlliBalliBaba Dec 23, 2024

Choose a reason for hiding this comment

withinboredom commented Dec 24, 2024

AlliBalliBaba commented Dec 25, 2024