-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threaded Executor starvation fix #2702
base: rolling
Are you sure you want to change the base?
Conversation
Signed-off-by: HarunTeper <[email protected]>
/* | ||
Test that no tasks are starved | ||
*/ | ||
TEST_F(TestMultiThreadedExecutor, starvation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this should be assertion for the test? if we create the custom callback group with rclcpp::CallbackGroupType::Reentrant
and add it to create_wall_timer
, this test should pass because there are 2 threads that executor can assign the executables concurrently.
in other word, this is expected behavior that user specifies with MutuallyExclusive
(default) as we discussed on #2645?
SingleThreadedExecutor
has the exact same situation like this. if you are trying to fix this timer starvation in the Executor
, i do not think that is the problem only for MultiThreadedExecutor
, if the timer callback overruns.
The executor should be alternating between these two tasks, never executing one task twice before the other.
IMO, i am not sure if this assumption is correct by system. sounds like user requirement, user would want to have a high priority timer which they want it to be executed as fast as possible. in that case, this fix becomes the problem for that requirement. (in that case, what's missing here is priority order user interface?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the unit-test you are creating two timers with a period of 0ms
.
In this scenario there's no guarantee that both timers will be called, because they are both always ready all the time. Note that the problem is not that they have the same period, but rather that the period is 0ms
.
Since they are both always ready, only the "first one" will be invoked.
Can you please check what happens if the period is set to 10ms
for both?
auto start_time = std::chrono::steady_clock::now(); | ||
while (std::chrono::steady_clock::now() - start_time < 100ms) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto start_time = std::chrono::steady_clock::now(); | |
while (std::chrono::steady_clock::now() - start_time < 100ms) { | |
} | |
std::this_thread::wait_for(100ms); |
auto start_time = std::chrono::steady_clock::now(); | ||
while (std::chrono::steady_clock::now() - start_time < 100ms) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto start_time = std::chrono::steady_clock::now(); | |
while (std::chrono::steady_clock::now() - start_time < 100ms) { | |
} | |
std::this_thread::wait_for(100ms); |
The proposed changes, as well as the paper, don't make sense to me. But this might be a wording vs implementation issue. Lets wait for the actual implementation... |
Hi @HarunTeper, The next Client Library WG meeting will happen Friday 12/20/2024 at 8AM PST. |
timer_two = node->create_wall_timer(0ms, timer_two_callback); | ||
|
||
executor.add_node(node); | ||
executor.spin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HarunTeper it would be nice to convert this into a "real" unit-test.
Right now this code will run forever until an assertion fails, so it's not acceptable.
Moreover, we should remove the debug prints.
You should use GTEST assertions and expectations to provide the necessary information (note that you can add log to those; the logs will be printed only when the test fails).
Pull request addressing the issues in #2360 #2645.
So far, I have added a test that detects starvation in the multi-threaded executor.
This test includes a mutually-exclusive callback group with two timers.
The executor should be alternating between these two tasks, never executing one task twice before the other.
To fix starvation, I have identified the following steps (which I was not able to completely implement yet):
These steps are based on the work I published here:
https://ieeexplore.ieee.org/document/9622336
https://daes.cs.tu-dortmund.de/storages/daes-cs/r/publications/teper2024emsoft_preprint.pdf
I have already tried to implement some of these steps, and I will also commit some of the changes to this fork this week. However, for step 4, I may require some help. I also noticed that my changes break some of the tests are currently part of rclcpp, as I move the functions that set callback group flags and trigger guard conditions.