Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AwsEventLoop threads still exist after closing CRT client #856

Closed
1 task
earakely-amazon opened this issue Dec 19, 2024 · 6 comments
Closed
1 task

AwsEventLoop threads still exist after closing CRT client #856

earakely-amazon opened this issue Dec 19, 2024 · 6 comments

Comments

@earakely-amazon
Copy link

Describe the bug

After closing my async CRT client, I still have AwsEventLoop threads running in my process. These threads are consuming CPU and leaking memory.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

I expect all threads created by the CRT client to be destroyed when I call close().

Current Behavior

Below is an example

final S3AsyncClient s3AsyncClient = S3AsyncClient.crtBuilder()
            .checksumValidationEnabled(true)
            .crossRegionAccessEnabled(true)
            .region(Region.of(region))
            .credentialsProvider(V1V2AwsCredentialProviderAdapter.adapt(credentialsProvider))
            .httpConfiguration(builder -> builder.connectionTimeout(Duration.ofSeconds(60)))
            .maxConcurrency(MAX_CONCURRENCY)
            .targetThroughputInGbps(TARGET_THROUGHPUT_GBPS)
            .retryConfiguration(S3CrtRetryConfiguration.builder().numRetries(MAX_NUM_TRIES).build())
            .build();
final S3TransferManager transferManager = S3TransferManager.builder()
                .s3Client(s3AsyncClient)
                .build();

UploadFileRequest uploadFileRequest = UploadFileRequest.builder()
    .putObjectRequest(b -> b.bucket(myBucket).key(myKey))
    .source(Paths.get(someFilePath))
    .build();
FileUpload fileUpload = transferManager.uploadFile(uploadFileRequest);
CompletedFileUpload uploadResult = fileUpload.completionFuture().join();
System.out.println("upload complete");

transferManager.close();
s3AsyncClient.close()
System.out.println("Closed s3 clients. Sleeping 60s");
Thread.sleep(60000);

After the upload completes & the s3 client is closed, I can still see the AwsEventLoop threads running:

dev-dsk-*****% pidstat -t | grep Aws
10:05:19 PM 12011115         -     96063    0.00    0.00    0.00    0.00    13  |__AwsEventLoop 2
10:05:19 PM 12011115         -     96065    0.00    0.00    0.00    0.00    11  |__AwsEventLoop 4
10:05:19 PM 12011115         -     96066    0.00    0.00    0.00    0.00    18  |__AwsEventLoop 5
10:05:19 PM 12011115         -     96068    0.00    0.00    0.00    0.00    15  |__AwsEventLoop 7
10:05:19 PM 12011115         -     96069    0.00    0.00    0.00    0.00     3  |__AwsEventLoop 8
10:05:19 PM 12011115         -     96070    0.00    0.00    0.00    0.00    22  |__AwsEventLoop 9
10:05:19 PM 12011115         -     96074    0.00    0.00    0.00    0.00    28  |__AwsEventLoop 13
10:05:19 PM 12011115         -     96075    0.00    0.00    0.00    0.00    29  |__AwsEventLoop 14
10:05:19 PM 12011115         -     96078    0.00    0.00    0.00    0.00    22  |__AwsEventLoop 17
10:05:19 PM 12011115         -     96079    0.00    0.00    0.00    0.00     2  |__AwsEventLoop 18
10:05:19 PM 12011115         -     96080    0.00    0.00    0.00    0.00     3  |__AwsEventLoop 19
10:05:19 PM 12011115         -     96082    0.00    0.00    0.00    0.00    21  |__AwsEventLoop 21
10:05:19 PM 12011115         -     96083    0.00    0.00    0.00    0.00    14  |__AwsEventLoop 22
10:05:19 PM 12011115         -     96084    0.00    0.00    0.00    0.00    27  |__AwsEventLoop 23
10:05:19 PM 12011115         -     96085    0.00    0.00    0.00    0.00    20  |__AwsEventLoop 24
10:05:19 PM 12011115         -     96087    0.00    0.00    0.00    0.00    16  |__AwsEventLoop 26
10:05:19 PM 12011115         -     96089    0.00    0.00    0.00    0.00    12  |__AwsEventLoop 28
10:05:19 PM 12011115         -     96090    0.00    0.00    0.00    0.00    21  |__AwsEventLoop 29
10:05:19 PM 12011115         -     96091    0.00    0.00    0.00    0.00     2  |__AwsEventLoop 30
10:05:19 PM 12011115         -     96093    0.00    0.00    0.00    0.00    10  |__AwsEventLoop 32
10:05:19 PM 12011115         -     96126    0.00    0.00    0.00    0.00    12  |__AwsHostResolver

Reproduction Steps

I shared example snippet in Current Behavior section.

Possible Solution

No response

Additional Information/Context

No response

aws-crt-java version used

v0.31.1

Java version used

JDK 17

Operating System and version

Linux version 5.10.230-202.885.amzn2int.x86_64 (mockbuild@ip-10-0-46-226) (gcc10-gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11)

@earakely-amazon earakely-amazon added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 19, 2024
@waahm7
Copy link
Contributor

waahm7 commented Dec 19, 2024

AwsEventLoop threads are created by EventLoopGroup. Our API design allows you to either pass an EventLoopGroup to the S3 Client (link) or omit it, in which case we create a static EventLoopGroup that is shared among multiple clients. The Java SDK uses this default static EventLoopGroup. However, this means we can't destroy the EventLoopGroup when one client is closed. The static event loop group is only destroyed when the JVM is terminated. See: #655.

With that said, these are green threads and should not take much memory/cpu when idle.

@earakely-amazon
Copy link
Author

There is no way to pass an EventLoopGroup into any of the s3 sdk builders as far as I can find. Is there a way you had in mind?

I have profiled these threads in my beta service. They are wasting cpu cycles (they have 2k samples out of 300k samples total) and they are causing a memory leak in my service (I can find memstore perf events coming from these threads).

@bretambrose
Copy link
Contributor

If the event loop has nothing to do then it will sleep for upwards of 100 seconds: https://github.com/awslabs/aws-c-io/blob/main/source/linux/epoll_event_loop.c#L563

If you are seeing non-trivial amounts of CPU time spent in an event loop thread then it still has work to do. Closing a client is not synchronous. Close merely indicates that you're done with the client. There is a lot of work that needs to happen to shut down and clean up the client and it will take "awhile" (usually some reasonable fraction of a second) especially if there are open network connections. My guess is that your sampling interval is catching part of shutdown.

Beyond that, memstore perf events are not evidence of a memory leak. The loop is still active and tracking internal statistics even if 99.99+% of the time it is asleep: https://github.com/awslabs/aws-c-io/blob/main/source/linux/epoll_event_loop.c#L610-L700

@bretambrose bretambrose added closing-soon This issue will automatically close in 4 days unless further comments are made. and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 21, 2024
@earakely-amazon
Copy link
Author

Beyond that, memstore perf events are not evidence of a memory leak. The loop is still active and tracking internal statistics even if 99.99+% of the time it is asleep

Agreed. I did not find any minor page fault events coming from it. I had a bug in my code earlier (now changed) that would instantiate 4 clients on startup, and this did exhibit growing anon memory usage over time (~2GB p/day) despite only being used on startup. I have since changed it to 1 client on startup and do not find this behavior. @bretambrose do you have any idea what that may have been? The ticket can be resolved independent of this question.

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Dec 23, 2024
@bretambrose
Copy link
Contributor

No, that seems pretty weird, but I don't know anything about the s3 client's behavior.

@waahm7
Copy link
Contributor

waahm7 commented Dec 23, 2024

I had a bug in my code earlier (now changed) that would instantiate 4 clients on startup, and this did exhibit growing anon memory usage over time (~2GB p/day) despite only being used on startup.

While I don't know either without more information. AFAIK, the code was not closing the CRTS3Client, which could result in resource leaks. The CRTS3Client internally has resources like a buffer pool which will take some memory, but we do shrink it when not in use, etc.

@waahm7 waahm7 closed this as completed Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants