Shutdown ScheduledExecutorService at runtime termination #272

farhin23 · 2024-12-06T10:11:02Z

Bug Report

The ScheduledExecutorService (responsible for recursive crawling) does not shutdown when the EDC BaseRuntime gets shutdown.

Describe the Bug

In the current implementation of FederatedCatalog, the ExecutionPlan creates a ScheduledExecutorService that repetitively executes the crawling task. However, during runtime shutdown, the ExecutorService is not terminated as expected, causing the tasks to keep running indefinitely.

Expected Behavior

The ScheduledExecutorService should terminate gracefully during runtime shutdown. This includes canceling all future tasks, ensuring a proper shutdown, and cleaning up resources.

Observed Behavior

The ExecutorService does not terminate and continues executing tasks after the runtime shutdown is processed.

Steps to Reproduce

FcRecurringExecutionPlanTest.java contains tests designed to reproduce the issue.

Steps to reproduce the bug:

Add the file FcRecurringExecutionPlanTest.java to project directory ./system-tests/end2end-test/e2e-junit-runner/src/test/java/org/eclipse/edc/end2end/ of FederatedCatalog.
From project root execute the command: ./gradlew test -DincludeTags="EndToEndTest" -PverboseTest=true
Review the logs of catalog runtimes, as they continue to execute the crawling task even after being shut down.

Logs Review

Track the logs of, Task :system-tests:end2end-test:e2e-junit-runner:test, starting from the following log,

Task :system-tests:end2end-test:e2e-junit-runner:test
FcRecurringExecutionPlanTest >

In this test class we have two catalogs: catalog-one and catalog-two. Let's say, catalog-one starts execution first. So it goes through the following phases:

catalog-one runtime boot

[catalog-one] INFO 2024-12-06T18:14:05.821516008 52 service extensions started

[catalog-one] INFO 2024-12-06T18:14:05.82173214 Runtime 2d1ab650-9bad-41df-b2e0-b4dcc31d9b36 ready

INFO 2024-12-06T18:14:05.821937528 Runtime catalog-one started
Starts crawling. This can be observed in the logs through entries similar to the following,

[catalog-one] DEBUG 2024-12-06T18:14:05.905056092 [ExecutionManager] Run pre-execution task

[catalog-one] DEBUG 2024-12-06T18:14:05.905575718 [ExecutionManager] Loaded 1 work items from storage

[catalog-one] DEBUG 2024-12-06T18:14:05.905815223 [ExecutionManager] Crawler parallelism is 1, based on config and number of work items

[catalog-one] DEBUG 2024-12-06T18:14:05.909222619 [ExecutionManager] Crawler-8312f7a8-a0f3-4142-8672-da19f6c0fced: WorkItem acquired
Shutdown catalog-one runtime

[catalog-one] DEBUG 2024-12-06T18:14:08.412094779 Shutdown Core Default Services

[catalog-one] DEBUG 2024-12-06T18:14:08.412149472 Shutdown Federated Catalog Default Services

[catalog-one] INFO 2024-12-06T18:14:08.412202229 Shutdown complete

In ideal case, the catalog-one crawler thread should get terminated after the shutdown. However, even after the runtime gets terminated, the scheduled thread keeps running. And so we can observe the following logs through out rest of the execution (through out the execution of catalog-two test).

[catalog-two] DEBUG 2024-12-06T23:09:41.312296155 Registered Web API context alias: default

[catalog-two] DEBUG 2024-12-06T23:09:41.474718829 Registered Web API context alias: protocol

[catalog-one] DEBUG 2024-12-06T23:09:41.528775262 [ExecutionManager] Run pre-execution task

[catalog-one] DEBUG 2024-12-06T23:09:41.529318959 [ExecutionManager] Loaded 1 work items from storage

[catalog-one] DEBUG 2024-12-06T23:09:41.529418432 [ExecutionManager] Crawler parallelism is 1, based on config and number of work items

[catalog-one] DEBUG 2024-12-06T23:09:41.529636797 [ExecutionManager] Crawler-425f25b6-6afc-41c5-936a-d4f6eb0c9aeb: WorkItem acquired

[catalog-two] DEBUG 2024-12-06T23:09:41.611729624 Registered Web API context alias: management

[catalog-two] DEBUG 2024-12-06T23:09:41.662468043 Registered Web API context alias: catalog

Context Information

Used version: EDC 0.10.1

Detailed Description

If applicable, add screenshots and logs to help explain your problem.

Possible Implementation

Proper shutdown method in FederatedCatalogCacheExtension to trigger the shutdown of the ExecutorService should solve this issue. This will ensure that all the scheduled tasks are terminated properly at the time of runtime shutdown.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-12-06T10:11:26Z

Thanks for your contribution 🔥 We will take a look asap 🚀

paullatzelsperger · 2024-12-06T10:22:12Z

please provide a proper, self-contained reproduction guide that doesn't involve other repositories, preferably in the form of a test, e.g. as Gist.

Do you mean that a thread keeps running after the host runtime (= java process) is killed?

farhin23 · 2024-12-06T10:41:16Z

please provide a proper, self-contained reproduction guide that doesn't involve other repositories, preferably in the form of a test, e.g. as Gist.

Sure. I will work on that.

Do you mean that a thread keeps running after the host runtime (= java process) is killed?

By runtime I mean the EDC BaseRuntime. The thread keeps running after the BaseRuntime shutdown has been processed.

farhin23 · 2024-12-06T22:45:11Z

please provide a proper, self-contained reproduction guide that doesn't involve other repositories, preferably in the form of a test, e.g. as Gist.

I have provided a test class FcRecurringExecutionPlanTest.java which contains the tests designed to reproduce the bug. The updated "Steps to Reproduce" section includes detailed guidelines to replicate the issue.

paullatzelsperger · 2024-12-10T13:07:04Z

I copied the code, and ran it, but unfortunately it does not illustrate the claimed defective behaviour at all - both were green. All you did was more or less copy another e2e test and duplicate it in a nested test suite.

Unless you can provide a test setup to clearly show the defective behaviour in a test with assertions, I'm afraid I'll have to close this issue. Here, what you will have to prove is that exiting the process doesn't actually kill all its threads.

In Java, all Threads that belong to a certain process (i.e. the java programm) are killed when the process exists. Shutting down the runtime is exactly that, terminating the process.

I would also recommend reading up on the use of the @Nested annotation.

farhin23 · 2024-12-10T13:30:30Z

I copied the code, and ran it, but unfortunately it does not illustrate the claimed defective behaviour at all - both were green.

Both of the tests will pass. The behavior is that even after the responsible BaseRuntime gets terminated, the task keeps running. And this can be tracked in the logs.

All you did was more or less copy another e2e test and duplicate it in a nested test suite.
I would also recommend reading up on the use of the @Nested annotation.

The issue occurs when multiple runtimes are booted. To demonstrate this behavior, I used that test example as a baseline.

Unless you can provide a test setup to clearly show the defective behaviour, I'm afraid I'll have to close this issue.

The issue is that tasks continue to run even after their runtimes have ended. While this does not cause the tests to fail, it can be observed in the logs. Although it doesn't impact the current tests, it leads to resource shortages in the GitHub pipeline, affecting other tests.

paullatzelsperger · 2024-12-10T14:17:21Z

yeah, but what you created isn't a test. A (JUnit) test consists of three parts: arrange (set up test data), act (make the system under test do something), assert (verify a certain outcome). You created another, very complicated way to start a runtime, that uses the JUnit framework in the wrong way.

In any event. I think what you actually mean is this: The executor service in the RecurringExecutionPlan is never getting shut down.

If that is the case, then this is more of a feature request, rather than a bug, because all threads die when the owning process dies, so it is not faulty behaviour. Terminating the java process (= runtime) will definitely stop all the threads (= executor service).

As such, I would propose to close this issue, and start with a discussion first.
One way to approach this would be introducing an ExecutionPlan#stop() method which would probably solve things, but I'm not 100% convinced that it is the right solution. Mostly, because there could be execution plans that aren't recurring, so "stopping" them wouldn't make any sense semantically.
Another, possibly better solution would be to make ExecutionPlan#run() return a handle of some sort, which can then be used to forward the shutdown signal. For non-recurring plans this would just be a no-op, or maybe return null. Its a subtle but important difference IMO.

farhin23 · 2024-12-10T18:53:09Z

In any event. I think what you actually mean is this: The executor service in the RecurringExecutionPlan is never getting shut down.

Yes, that's precisely what I intended to say in the issue.

If that is the case, then this is more of a feature request, rather than a bug,
As such, I would propose to close this issue, and start with a discussion first.

We can proceed as you’ve suggested. Should I open a discussion for the feature request?

One way to approach this would be introducing an ExecutionPlan#stop() method which would probably solve things, but I'm not 100% convinced that it is the right solution. Mostly, because there could be execution plans that aren't recurring, so "stopping" them wouldn't make any sense semantically.

I was thinking of the solution more in this direction. This way, execution plans using the executor services for recurring tasks can override the stop() method to adjust to their own shutdown process.

Another, possibly better solution would be to make ExecutionPlan#run() return a handle of some sort, which can then be used to forward the shutdown signal.

For ScheduledExecutorService it is possible to return a ScheduledFuture instance which later can be used to cancel the future tasks.

paullatzelsperger · 2024-12-11T06:38:14Z

Yes please open a discussion for a feature request

farhin23 · 2024-12-11T14:16:54Z

Yes please open a discussion for a feature request

I have created a discussion here: #273

farhin23 added bug_report Suspected bugs, awaiting triage triage all new issues awaiting classification labels Dec 6, 2024

farhin23 self-assigned this Dec 6, 2024

farhin23 assigned farhin23 and unassigned farhin23 Dec 6, 2024

ndr-brt removed the triage all new issues awaiting classification label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shutdown ScheduledExecutorService at runtime termination #272

Shutdown ScheduledExecutorService at runtime termination #272

farhin23 commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024

paullatzelsperger commented Dec 6, 2024

farhin23 commented Dec 6, 2024 •

edited

Loading

farhin23 commented Dec 6, 2024

paullatzelsperger commented Dec 10, 2024 •

edited

Loading

farhin23 commented Dec 10, 2024

paullatzelsperger commented Dec 10, 2024 •

edited

Loading

farhin23 commented Dec 10, 2024

paullatzelsperger commented Dec 11, 2024

farhin23 commented Dec 11, 2024

Shutdown ScheduledExecutorService at runtime termination #272

Shutdown ScheduledExecutorService at runtime termination #272

Comments

farhin23 commented Dec 6, 2024 • edited Loading

Bug Report

Describe the Bug

Expected Behavior

Observed Behavior

Steps to Reproduce

Logs Review

Context Information

Detailed Description

Possible Implementation

github-actions bot commented Dec 6, 2024

paullatzelsperger commented Dec 6, 2024

farhin23 commented Dec 6, 2024 • edited Loading

farhin23 commented Dec 6, 2024

paullatzelsperger commented Dec 10, 2024 • edited Loading

farhin23 commented Dec 10, 2024

paullatzelsperger commented Dec 10, 2024 • edited Loading

farhin23 commented Dec 10, 2024

paullatzelsperger commented Dec 11, 2024

farhin23 commented Dec 11, 2024

farhin23 commented Dec 6, 2024 •

edited

Loading

farhin23 commented Dec 6, 2024 •

edited

Loading

paullatzelsperger commented Dec 10, 2024 •

edited

Loading

paullatzelsperger commented Dec 10, 2024 •

edited

Loading