feat: thread autoscaling #1266

Alliballibaba2 · 2024-12-19T15:14:53Z

I originally wanted to just create a PR that allows adding threads via the admin API, but after letting threads scale automatically, that PR kind of didn't make sense anymore by itself.

So here is what this PR does:

It adds 4 Caddy admin endpoints

POST     /frankenphp/workers/restart   # restarts workers (this can also be put into a smaller PR if necessary)
GET      /frankenphp/threads           # prints the current state of all threads (for debugging/caddytests)
PUT      /frankenphp/threads           # Adds a thread at runtime - accepts 'worker' and 'count' query parameters
DELETE   /frankenphp/threads           # Removes a thread at runtime - accepts 'worker' and 'count' query parameters

Additionally, the PR also introduces a new directive in the config: max_threads.

frankenphp {
    max_threads 200
    num_threads 40
}

If it's bigger than num_threads, worker and regular threads will attempt to autoscale after a request on a few different conditions:

no thread was available to immediately handle the request
the request was stalled for more than a few ms (15ms currently)
no other scaling is happening at that time
A CPU probe (50ms) successfully determines that PHP threads are consuming less than a predefined amount of CPU (80% currently)
we have not reached max_threads yet

This is all still a WIP. I'm not yet sure if max_threads is the best way to configure autoscaling or if it's even necessary to have the PUT/DELETE endpoints. Maybe it would also make sense to determine max_threads based on available memory.
I'll conduct some benchmarks showing that this approach performs better than default settings in a lot of different scenarios (and makes people worry less about thread configuration).

In regards to recent issues, spawning and destroying threads would also make the server more stable if we're experiencing timeouts (not sure yet how to safely destroy running threads).

# Conflicts: # frankenphp.c # frankenphp.go # php_thread.go # worker.go

# Conflicts: # frankenphp.go

# Conflicts: # frankenphp.c # frankenphp.go # phpmainthread.go # phpmainthread_test.go # phpthread.go # state.go # thread-inactive.go # thread-regular.go # thread-worker.go # worker.go

AlliBalliBaba · 2024-12-19T17:39:13Z

This PR doesn't really have 100+ commints, it was just forked from #1185.
Also, static binary seems to be failing rn, like in #1260 and #1264

dunglas · 2024-12-19T17:53:46Z

Static binary failure seems to be because of crazywhalecc/static-php-cli#577

# Conflicts: # phpmainthread.go

AlliBalliBaba · 2024-12-22T00:10:53Z

I added some load test simulations. Not sure yet if I want to keep them in the repo, it sure would require fixing a lot of linting errors.
They can be run with ./testdata/performance/perf-test.sh

AlliBalliBaba · 2024-12-22T22:39:52Z

Scaling currently works like this:

After every request that is not immediately handled by a thread, start a timer (10ms currently)
If the timer is triggered, start autoscaling and reset the timer with an exponential backoff
When scaling, check that the CPU usage is not already above 80% for 40ms (and allow no other scaling)
Every 5s, look for auto-scaled threads that have been idle for more that 5s and terminate at most 10 of them

Here are my findings from running a few load-test scenarios. I decided to just simulate load and latency via a PHP script. Doing an authentic load-test would have always involved setting up an unbiased cloud environment, which might be something for a different day. Keep in mind that the VUs were adjusted for 20 CPU cores:

Hello world

Type	Threads	handled requests
Default	40	1,600,000
Scaling	8 -14	1,850,000
Ideal	8	1,900,000

The hello world scenario tests raw server performance. It ended up being the only scenario in which a server with lower amount of threads was able to handle more requests. I guess I overestimated the impact of CPU contention in other cases

Database simulation

Type	Threads	handled requests
Default	40	260,000
Scaling	10 -100	420,000
Ideal	100	530,000

This test simulates 1-2 DB queries on 1-10ms latency with load similar to a Laravel request, probably a very common pattern for a lot of apis. What surprised me most is that in this scenario 5xCPU cores ended up being the ideal amount of threads - which is why I would probably recommend a default setting that at least scales to 5xCPU cores.

The reason why 'scaling' was able to handle less requests than 'ideal' is that it takes some time to catch up to the ideal. The overhead of scaling itself is actually relatively negligible and doesn't even appear in the flamegraphs.

External API simulation

Type	Threads	handled requests
Default	40	23,500
Scaling	10 -600	160,000
Ideal	600+?	190,000

This test leans more into big latencies. A lot of applications access external apis or microservices that have much higher response times than databases (test ran with 10ms-150ms). The main learning here is that if you know latencies to be this high, it might not be unreasonable to spawn 30xCPU cores. Threads are in general more lightweight than FPM processes, how many workers could reasonably run on 1GB of RAM is something I haven't tested yet though.

Computation heavy

Type	Threads	handled requests
Default	40	108,000
Scaling	10-27	106,000
Ideal	25	109,000

This test goes into the other extreme and does almost no IO. Main learning here is: If the server is not IO bound, then anything above 20 CPU cores behaves pretty similar. In this case Scaling did not go over 27 threads due to high CPU usage. This is the only test where checking for CPU usage was beneficial since we save memory by not spawning more threads.

Hanging server

Type	Threads	handled requests
Default	40	9,300
Scaling	10-200	34,000
Ideal	200	32,000

This test introduces a 2% chance for a 'hanging' request that takes 15s to complete. I chose this ratio on purpose since it will already make the server hang completely in default settings sometimes. Interestingly, scaling performed better here than spawning a fixed high amount of threads. In some situations being able to spawn 'fresh' threads seems to be beneficial.

Timeouts

Type	Threads	handled requests
Default	40	12,400
Scaling	10-200	90,000
Ideal	200	100,000

This is another resilience simulation. An external resources becomes unavailable every other 10s and causes timeouts for all requests. Again, a server with a higher amount of threads performs much better in this situation and can recover faster. On very severe hanging it might also make sense to terminate and respawn threads (something for a future PR).

withinboredom · 2024-12-23T10:38:20Z

@AlliBalliBaba I spot some improvements that can be made (I think -- needs some testing), but trying to explain it in a review would probably take too long of back-and-forth. Is this branch stable enough to just open a PR to your PR?

dunglas · 2024-12-23T22:57:55Z

caddy/admin.go

+}
+
+func (admin *FrankenPHPAdmin) threads(w http.ResponseWriter, r *http.Request) error {
+	if r.Method == http.MethodPut {


Nit: you could use a switch here.

dunglas · 2024-12-23T22:59:06Z

caddy/admin.go

+}
+
+func (admin *FrankenPHPAdmin) changeThreads(w http.ResponseWriter, r *http.Request, count int) error {
+	if !r.URL.Query().Has("worker") {


Nit: you could store the result of Query() in a variable to prevent parsing the query two times.

You could even directly get the value and check if it is the zero value here.

dunglas · 2024-12-23T23:08:09Z

caddy/admin_test.go

+	adminUrl := "http://localhost:2999/frankenphp/"
+	r, err := http.NewRequest(method, adminUrl+path, nil)
+	if err != nil {
+		panic(err)


assert.NoError()?

dunglas · 2024-12-23T23:09:00Z

caddy/admin_test.go

+	}
+	if expectedBody == "" {
+		_ = tester.AssertResponseCode(r, expectedStatus)
+	} else {


Nit: could be an early return instead.

dunglas · 2024-12-23T23:09:30Z

caddy/admin_test.go

+func getAdminResponseBody(tester *caddytest.Tester, method string, path string) string {
+	adminUrl := "http://localhost:2999/frankenphp/"
+	r, err := http.NewRequest(method, adminUrl+path, nil)
+	if err != nil {


Could be NoError

dunglas · 2024-12-23T23:09:38Z

caddy/admin_test.go

+	resp := tester.AssertResponseCode(r, http.StatusOK)
+	defer resp.Body.Close()
+	bytes, err := io.ReadAll(resp.Body)
+	if err != nil {


dunglas · 2024-12-23T23:12:47Z

caddy/caddy.go

+					return d.ArgErr()
+				}
+
+				v, err := strconv.Atoi(d.Val())


Maybe should we use ParseUint() to prevent negative values?
The "problem" already exists with NumThreads.

dunglas · 2024-12-23T23:14:37Z

docs/worker.md

@@ -128,6 +128,16 @@ A workaround to using this type of code in worker mode is to restart the worker

 The previous worker snippet allows configuring a maximum number of request to handle by setting an environment variable named `MAX_REQUESTS`.

+### Restart Workers manually


Suggested change

### Restart Workers manually

### Restart Workers Manually

Alliballibaba2 added 30 commits November 1, 2024 23:10

Decouple workers.

fe1158f

Moves code to separate file.

ad34140

Cleans up the exponential backoff.

89b211d

Initial working implementation.

7d2ab8c

Refactors php threads to take callbacks.

f7e7d41

Cleanup.

c03c59b

Cleanup.

a9857dc

Cleanup.

bac9555

Cleanup.

a2f8d59

Merge branch 'main' into refactor/start-worker-threads-directly

279924c

Adjusts watcher logic.

0825453

Adjusts the watcher logic.

17d5cbe

Fix opcache_reset race condition.

09e0ca6

Merge branch 'main' into refactor/start-worker-threads-directly

a726a2c

# Conflicts: # frankenphp.c # frankenphp.go # php_thread.go # worker.go

Fixing merge conflicts and formatting.

7f13ada

Prevents overlapping of TSRM reservation and script execution.

13fb4bb

Adjustments as suggested by @dunglas.

a8a00c8

Adds error assertions.

b4dd138

Adds comments.

03f98fa

Removes logs and explicitly compares to C.false.

e52dd0f

Resets check.

cd98e33

Adds cast for safety.

4e2a2c6

Fixes waitgroup overflow.

c51eb93

Resolves waitgroup race condition on startup.

89d8e26

Moves worker request logic to worker.go.

3587243

Removes defer.

ec32f0c

Removes call from go to c.

4e35698

Merge branch 'main' into refactor/start-worker-threads-directly

740fac7

# Conflicts: # frankenphp.go

Fixes merge conflict.

8a272cb

Adds fibers test back in.

ecce5d5

Alliballibaba2 added 6 commits December 19, 2024 14:57

Adds autoscale tests.

0314247

Merge branch 'main' into feat/auto-scale-clock-time

bd4af11

# Conflicts: # frankenphp.c # frankenphp.go # phpmainthread.go # phpmainthread_test.go # phpthread.go # state.go # thread-inactive.go # thread-regular.go # thread-worker.go # worker.go

Merges main.

dc10546

Fixes alpine (probably)

3b9f577

Fixes alpine (definitely)

790ce4e

go fmt

29de62a

Removes unnecessary 'isProtected'

b447412

dunglas marked this pull request as ready for review December 20, 2024 00:51

Alliballibaba2 added 6 commits December 20, 2024 15:09

Adds perf tests.

6fa90d6

Adds request status message to thread debug status.

3bd7c76

Adjusts performance tests.

45cd915

Adds an exponential backoff on request overflow.

af40470

Merge branch 'main' into feat/auto-scale-clock-time

d5e8f86

# Conflicts: # phpmainthread.go

changes dir.

c7acb25

Alliballibaba2 added 9 commits December 22, 2024 17:25

Linting and formatting.

8c22cbf

Linting and formatting.

745b29b

Adds explicit scaling tests.

68ae2e4

Adjusts perf tests.

09a5caf

Uses different worker in removal test.

3cfcb11

More formatting fixes.

cbe45fc

Replaces inline errors and adjusts comments.

1d8e973

Formatting.

bf48b14

Formatting.

4f0cc8a

withinboredom mentioned this pull request Dec 23, 2024

suggestion: Refactor scaling strategy #1289

Draft

dunglas reviewed Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: thread autoscaling #1266

feat: thread autoscaling #1266

Alliballibaba2 commented Dec 19, 2024

AlliBalliBaba commented Dec 19, 2024

dunglas commented Dec 19, 2024

AlliBalliBaba commented Dec 22, 2024

AlliBalliBaba commented Dec 22, 2024

withinboredom commented Dec 23, 2024 •

edited

Loading

dunglas Dec 23, 2024

dunglas Dec 23, 2024

dunglas Dec 23, 2024

dunglas Dec 23, 2024

dunglas Dec 23, 2024

dunglas Dec 23, 2024

dunglas Dec 23, 2024

dunglas Dec 23, 2024

		@@ -128,6 +128,16 @@ A workaround to using this type of code in worker mode is to restart the worker

		The previous worker snippet allows configuring a maximum number of request to handle by setting an environment variable named `MAX_REQUESTS`.

		### Restart Workers manually

feat: thread autoscaling #1266

Are you sure you want to change the base?

feat: thread autoscaling #1266

Conversation

Alliballibaba2 commented Dec 19, 2024

AlliBalliBaba commented Dec 19, 2024

dunglas commented Dec 19, 2024

AlliBalliBaba commented Dec 22, 2024

AlliBalliBaba commented Dec 22, 2024

Hello world

Database simulation

External API simulation

Computation heavy

Hanging server

Timeouts

withinboredom commented Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

withinboredom commented Dec 23, 2024 •

edited

Loading