Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threaded Executor starvation fix #2702

Open
wants to merge 1 commit into
base: rolling
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions rclcpp/test/rclcpp/executors/test_multi_threaded_executor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,66 @@ TEST_F(TestMultiThreadedExecutor, timer_over_take) {
executor.add_node(node);
executor.spin();
}

/*
Test that no tasks are starved
*/
TEST_F(TestMultiThreadedExecutor, starvation) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this should be assertion for the test? if we create the custom callback group with rclcpp::CallbackGroupType::Reentrant and add it to create_wall_timer, this test should pass because there are 2 threads that executor can assign the executables concurrently.
in other word, this is expected behavior that user specifies with MutuallyExclusive (default) as we discussed on #2645?

SingleThreadedExecutor has the exact same situation like this. if you are trying to fix this timer starvation in the Executor, i do not think that is the problem only for MultiThreadedExecutor, if the timer callback overruns.

The executor should be alternating between these two tasks, never executing one task twice before the other.

IMO, i am not sure if this assumption is correct by system. sounds like user requirement, user would want to have a high priority timer which they want it to be executed as fast as possible. in that case, this fix becomes the problem for that requirement. (in that case, what's missing here is priority order user interface?)

CC: @jmachowinski @mjcarroll @alsora

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the unit-test you are creating two timers with a period of 0ms.
In this scenario there's no guarantee that both timers will be called, because they are both always ready all the time. Note that the problem is not that they have the same period, but rather that the period is 0ms.
Since they are both always ready, only the "first one" will be invoked.

Can you please check what happens if the period is set to 10ms for both?

rclcpp::executors::MultiThreadedExecutor executor(rclcpp::ExecutorOptions(),
2u);

std::shared_ptr<rclcpp::Node> node =
std::make_shared<rclcpp::Node>("test_multi_threaded_executor_starvation");

std::atomic_int timer_one_count{0};
std::atomic_int timer_two_count{0};

rclcpp::TimerBase::SharedPtr timer_one;
rclcpp::TimerBase::SharedPtr timer_two;

auto timer_one_callback = [&timer_one_count, &timer_two_count]() {
std::cout << "Timer one callback executed. Count: "
<< timer_one_count.load() << std::endl;

auto start_time = std::chrono::steady_clock::now();
while (std::chrono::steady_clock::now() - start_time < 100ms) {
}
Comment on lines +126 to +128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto start_time = std::chrono::steady_clock::now();
while (std::chrono::steady_clock::now() - start_time < 100ms) {
}
std::this_thread::wait_for(100ms);


timer_one_count++;

auto diff = std::abs(timer_one_count - timer_two_count);

std::cout << "Difference in counts: " << diff << std::endl;

if (diff > 1) {
rclcpp::shutdown();
ASSERT_LE(diff, 1);
}
};

auto timer_two_callback = [&timer_one_count, &timer_two_count]() {
std::cout << "Timer two callback executed. Count: "
<< timer_two_count.load() << std::endl;

auto start_time = std::chrono::steady_clock::now();
while (std::chrono::steady_clock::now() - start_time < 100ms) {
}
Comment on lines +146 to +148
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto start_time = std::chrono::steady_clock::now();
while (std::chrono::steady_clock::now() - start_time < 100ms) {
}
std::this_thread::wait_for(100ms);


timer_two_count++;

auto diff = std::abs(timer_one_count - timer_two_count);

std::cout << "Difference in counts: " << diff << std::endl;

if (diff > 1) {
rclcpp::shutdown();
ASSERT_LE(diff, 1);
}
};

timer_one = node->create_wall_timer(0ms, timer_one_callback);
timer_two = node->create_wall_timer(0ms, timer_two_callback);

executor.add_node(node);
executor.spin();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HarunTeper it would be nice to convert this into a "real" unit-test.

Right now this code will run forever until an assertion fails, so it's not acceptable.
Moreover, we should remove the debug prints.
You should use GTEST assertions and expectations to provide the necessary information (note that you can add log to those; the logs will be printed only when the test fails).

}