-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threaded Executor starvation fix #2702
base: rolling
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -102,3 +102,66 @@ TEST_F(TestMultiThreadedExecutor, timer_over_take) { | |||||||||
executor.add_node(node); | ||||||||||
executor.spin(); | ||||||||||
} | ||||||||||
|
||||||||||
/* | ||||||||||
Test that no tasks are starved | ||||||||||
*/ | ||||||||||
TEST_F(TestMultiThreadedExecutor, starvation) { | ||||||||||
rclcpp::executors::MultiThreadedExecutor executor(rclcpp::ExecutorOptions(), | ||||||||||
2u); | ||||||||||
|
||||||||||
std::shared_ptr<rclcpp::Node> node = | ||||||||||
std::make_shared<rclcpp::Node>("test_multi_threaded_executor_starvation"); | ||||||||||
|
||||||||||
std::atomic_int timer_one_count{0}; | ||||||||||
std::atomic_int timer_two_count{0}; | ||||||||||
|
||||||||||
rclcpp::TimerBase::SharedPtr timer_one; | ||||||||||
rclcpp::TimerBase::SharedPtr timer_two; | ||||||||||
|
||||||||||
auto timer_one_callback = [&timer_one_count, &timer_two_count]() { | ||||||||||
std::cout << "Timer one callback executed. Count: " | ||||||||||
<< timer_one_count.load() << std::endl; | ||||||||||
|
||||||||||
auto start_time = std::chrono::steady_clock::now(); | ||||||||||
while (std::chrono::steady_clock::now() - start_time < 100ms) { | ||||||||||
} | ||||||||||
Comment on lines
+126
to
+128
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
timer_one_count++; | ||||||||||
|
||||||||||
auto diff = std::abs(timer_one_count - timer_two_count); | ||||||||||
|
||||||||||
std::cout << "Difference in counts: " << diff << std::endl; | ||||||||||
|
||||||||||
if (diff > 1) { | ||||||||||
rclcpp::shutdown(); | ||||||||||
ASSERT_LE(diff, 1); | ||||||||||
} | ||||||||||
}; | ||||||||||
|
||||||||||
auto timer_two_callback = [&timer_one_count, &timer_two_count]() { | ||||||||||
std::cout << "Timer two callback executed. Count: " | ||||||||||
<< timer_two_count.load() << std::endl; | ||||||||||
|
||||||||||
auto start_time = std::chrono::steady_clock::now(); | ||||||||||
while (std::chrono::steady_clock::now() - start_time < 100ms) { | ||||||||||
} | ||||||||||
Comment on lines
+146
to
+148
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
timer_two_count++; | ||||||||||
|
||||||||||
auto diff = std::abs(timer_one_count - timer_two_count); | ||||||||||
|
||||||||||
std::cout << "Difference in counts: " << diff << std::endl; | ||||||||||
|
||||||||||
if (diff > 1) { | ||||||||||
rclcpp::shutdown(); | ||||||||||
ASSERT_LE(diff, 1); | ||||||||||
} | ||||||||||
}; | ||||||||||
|
||||||||||
timer_one = node->create_wall_timer(0ms, timer_one_callback); | ||||||||||
timer_two = node->create_wall_timer(0ms, timer_two_callback); | ||||||||||
|
||||||||||
executor.add_node(node); | ||||||||||
executor.spin(); | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @HarunTeper it would be nice to convert this into a "real" unit-test. Right now this code will run forever until an assertion fails, so it's not acceptable. |
||||||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this should be assertion for the test? if we create the custom callback group with
rclcpp::CallbackGroupType::Reentrant
and add it tocreate_wall_timer
, this test should pass because there are 2 threads that executor can assign the executables concurrently.in other word, this is expected behavior that user specifies with
MutuallyExclusive
(default) as we discussed on #2645?SingleThreadedExecutor
has the exact same situation like this. if you are trying to fix this timer starvation in theExecutor
, i do not think that is the problem only forMultiThreadedExecutor
, if the timer callback overruns.IMO, i am not sure if this assumption is correct by system. sounds like user requirement, user would want to have a high priority timer which they want it to be executed as fast as possible. in that case, this fix becomes the problem for that requirement. (in that case, what's missing here is priority order user interface?)
CC: @jmachowinski @mjcarroll @alsora
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the unit-test you are creating two timers with a period of
0ms
.In this scenario there's no guarantee that both timers will be called, because they are both always ready all the time. Note that the problem is not that they have the same period, but rather that the period is
0ms
.Since they are both always ready, only the "first one" will be invoked.
Can you please check what happens if the period is set to
10ms
for both?