Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the reliability of rosbag2 tests #1796

Merged
merged 15 commits into from
Sep 12, 2024

Conversation

clalancette
Copy link
Contributor

I initially started this series trying to track down a rare failing flakey test with test_record__rmw_cyclonedds_cpp. That particular flake seems to be able to happen because sometimes discovery takes longer than we expect, and it is possible that the tests "miss" the first publication. If that's the case, then the rest of the test may fail because it is expecting a certai number of messages. Along the way, we cleanup the tests a bit:

  1. Remove the unnecessary wait_until_shutdown method, which was almost exactly the same as wait_for_condition.
  2. Slightly revamp how the async spinner works, so it doesn't need to call rclcpp::shutdown.
  3. Do a similar revamp for the custom async spinner in rosbag2_tests.
  4. Wait for topics to be discovered right after calling recorder->record(). This ensures we can't get into the above situation.

With this series in place, the particular flake of test_record on rmw_cyclonedds_cpp is fixed (or, at least, I can no longer reproduce it). There are still other flakes that can happen under load, but I think fixes for those will have to come separately.

@clalancette clalancette requested a review from a team as a code owner August 28, 2024 19:39
@clalancette clalancette requested review from emersonknapp and james-rms and removed request for a team August 28, 2024 19:39
@clalancette
Copy link
Contributor Author

clalancette commented Aug 28, 2024

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/77bebfb56ccf2c9d926ce6c8343b0773/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14483

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, but one comment.

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, @MichaelOrlov can you have a look too? (i do not have maintainer access on rosbag2)

@MichaelOrlov
Copy link
Contributor

I will take a look. Anyway, hold off from the merging this for a while. We are trying to make a relase on rolling #1791

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clalancette Thanks for trying to make rosbag2 tests less flaky.
I have a concern about adding Wait for topics to be discovered commit and would like to request to revert it along with the 02d5bba

The reason for removal request is because it is really redundant IMO and not robust.

  • Redundant because in all places we already have either ASSERT_TRUE(pub_manager.wait_for_matched(expecting_topic_name); or ASSERT_TRUE(client_manager_1->wait_for_service_to_be_ready());. Which is making sure that recorder have already been subscribed to the topics on which we are going to publish.
  • Not robust because in those newly added wait condition - there is no check for the topic names on which recorder subscribed. There is only check for the number of subscriptions. And had assumption that // /rosout and // /events/write_split topics will always be present and counted, which is not necessarily may be true for all CI configurations and underlying middleware layers.

If aforementioned ASSERT_TRUE(pub_manager.wait_for_matched(expecting_topic_name); and ASSERT_TRUE(client_manager_1->wait_for_service_to_be_ready()); checks are not enough we have a better way to do a wait for the recorder discovery. In some unit tests we use MockRecorder with extended utility methods to facilitate waiting functionality.
For instance:

  ASSERT_TRUE(recorder->wait_for_topic_to_be_discovered(string_topic));
  ASSERT_FALSE(recorder->topic_available_for_recording(string_topic));

Please see implementation details in the

template<typename DurationRepT = int64_t, typename DurationT = std::milli>
bool wait_for_topic_to_be_discovered(
const std::string & topic_name_to_wait_for,
std::chrono::duration<DurationRepT, DurationT> timeout = std::chrono::seconds(10))
{
bool discovered = false;
using clock = std::chrono::steady_clock;
auto start = clock::now();
do {
auto topic_names_and_types = this->get_topic_names_and_types();
for (const auto &[topic_name, topic_types] : topic_names_and_types) {
if (topic_name_to_wait_for == topic_name) {
discovered = true;
break;
}
}
std::this_thread::sleep_for(std::chrono::milliseconds(20));
} while (!discovered && (clock::now() - start) < timeout);
return discovered;
}
bool topic_available_for_recording(const std::string & topic_name)
{
bool available_for_recording = false;
auto topics_to_subscribe = this->get_requested_or_available_topics();
for (const auto & topic_and_type : topics_to_subscribe) {
if (topic_and_type.first == topic_name) {
available_for_recording = true;
break;
}
}
return available_for_recording;
}

This has almost exactly the same functionality as wait_for_condition,
except for two things:

1.  It is templated on the Timeout type.
2.  It calls rclcpp::shutdown after the loop completes.

However, neither of those is necessary; all callers to it use
a std::chrono::duration, and all of the test fixtures already
call rclcpp::shutdown.  Thus, just remove it and make all
callers use wait_for_condition instead.

Signed-off-by: Chris Lalancette <[email protected]>
That is, we really don't actually want to do a full
rclcpp shutdown here; we only want to stop spinning.
Accomplish that with an executor, and timing out
every 100 milliseconds to check if we are done yet.

Signed-off-by: Chris Lalancette <[email protected]>
Make sure it only spins as long as we haven't shutdown,
and that it wakes up every so often to check that fact.

Signed-off-by: Chris Lalancette <[email protected]>
The main reason for that is that these tests generally want
to test certain expectations around how many messages were
received.  However, if discovery takes longer than we expect,
then it could be the case that we "missed" messages at the
beginning because discovery hadn't yet completed.

Fix this by just waiting around for the recorder to get all
the subscriptions it expects before moving on with the test.

Signed-off-by: Chris Lalancette <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
@clalancette
Copy link
Contributor Author

  • Redundant because in all places we already have either ASSERT_TRUE(pub_manager.wait_for_matched(expecting_topic_name); or ASSERT_TRUE(client_manager_1->wait_for_service_to_be_ready());. Which is making sure that recorder have already been subscribed to the topics on which we are going to publish.

The problem is that this is not always true. There are at least two situations this does not cover: for the event topic associated with services (/service_name/_events), and for the event topic associated with the recorder splitting functionality (/event/split_write).

pub_manager_wait_for_matched(expecting_topic_name) will only wait for the named topic, which is a topic that the pub_manager is managing. That does not apply to the /service_name/_events topic that services use for events.

Additionally, client_manager->wait_for_service_to_be_ready() only waits for the service topics to be ready, not the event topic. So again, it does not apply to the /service_name/_events topic that services use for events.

That's why adding in the additional waits for recorder subscriptions makes the tests more reliable, since it handles both of these situations.

If aforementioned ASSERT_TRUE(pub_manager.wait_for_matched(expecting_topic_name); and ASSERT_TRUE(client_manager_1->wait_for_service_to_be_ready()); checks are not enough we have a better way to do a wait for the recorder discovery. In some unit tests we use MockRecorder with extended utility methods to facilitate waiting functionality.

But fair enough, I'll take a look at switching to this.

Signed-off-by: Chris Lalancette <[email protected]>
@clalancette clalancette force-pushed the clalancette/rosbag2-fix-tests branch from 02d5bba to 11bb864 Compare September 4, 2024 16:15
@clalancette
Copy link
Contributor Author

All right, see 11bb864, which uses the MockRecorder methods instead.

Signed-off-by: Chris Lalancette <[email protected]>
@clalancette
Copy link
Contributor Author

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/08cc831129805c683801a5ebf1f8b33a/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14511

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Signed-off-by: Chris Lalancette <[email protected]>
Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, waiting for "/events/write_split" is redundant since it is an internal service in the rosbag2_recorder and it is not participating in the test scenarios where it was added.

Also found a few nitpicks.

Co-authored-by: Michael Orlov <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.
Thanks!

@clalancette
Copy link
Contributor Author

clalancette commented Sep 4, 2024

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/475d8bf4bfedcf19c2136589ee4451f9/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14514

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@MichaelOrlov
Copy link
Contributor

I'm curious what a "surprise" failure with rmw_connextdds on CI is.

@clalancette
Copy link
Contributor Author

I'm curious what a "surprise" failure with rmw_connextdds on CI is.

Yeah, I don't know what is going on with that. The only change to that test is to wait for the clock topic to show up. I'll need to look deeper into it, though, since that seems to be a new failure with this PR.

That's because there is currently at least one bug
associated with spin_some in rclcpp.  However, it turns
out that we don't even need to use it, as we can just as
easily use spin() along with exec.cancel().

Signed-off-by: Chris Lalancette <[email protected]>
@clalancette
Copy link
Contributor Author

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/0e69722137e215ba0dee2c7dc96e1a6c/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14518

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor Author

The only failures here are on Windows, and that has nothing to do with this PR (those will be fixed by ros2/common_interfaces#256). So I think this is ready for final review now.

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clalancette Curious what was behind decision to move explict call for the stop_spinning() to the TearDown() method?

I would prefer to have explicit call to the stop_spinning() in the tests where it is needed in pair with start_spin(node) or start_async_spin(node) with

 auto cleanup_process_handle = rcpputils::make_scope_exit([&]() {stop_spinning();});

to have a clean and concise logic in tests.

In some tests we have some checks after explictly calling stop_spinning() which could rely on the assumption that some handles (node and files) will be free and nothing else will arrive during those extra checks or cleanups.

IMO adding stop_spinning() to the TearDown() method is a last resort and a sort of hacky way to handle it and could potentially cause another flakiness in tests in the future.

@clalancette
Copy link
Contributor Author

@clalancette Curious what was behind decision to move explict call for the stop_spinning() to the TearDown() method?

So the first thing to realize is that because we are putting the spin call in a thread, we have to call stop_spinning() (which calls thread.join()), otherwise we get an exception from the std::thread destructor.

With that in mind, I could have used rcpputils::make_scope_exit to stop the spinning. However, by default that is actually no different than calling it in TearDown, since we wouldn't call that until the end of the function anyway.

I have a prototype locally where I did exactly what you said and added in new scopes, with the rcpputils::make_scope_exit, and that works fine. But it also ends up in a far larger change, because of those additional scopes and the reindentation of a lot of code. Let me know if you still want me to do that, and I can make the change.

Signed-off-by: Chris Lalancette <[email protected]>
@clalancette
Copy link
Contributor Author

As something of a compromise, 404c78a adds in rcpputils::make_scope_exit calls without changing the scopes. This isn't too different from putting it in TearDown (which I've left as a backup), but it would allow us to change scopes later. Let me know what you think.

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clalancette Thanks for adding a safeguard catcher auto cleanup_process_handle = rcpputils::make_scope_exit([&]() {stop_spinning();});
However, my main concern from the previous comment was that the explicit call to the stop_spinning() was deleted in some tests. Could you please add them back?
I added comments in places where it was deleted.

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

@clalancette
Copy link
Contributor Author

clalancette commented Sep 10, 2024

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/1986838f6b0c6a14c9b1af14a8340640/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14549

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor Author

@ros-pull-request-builder retest this please

@MichaelOrlov
Copy link
Contributor

@clalancette I was able to reproduce freeze locally on Ubuntu by increasing discovery wait interval up to the 500mS

rosbag2_transport::RecordOptions record_options =
  {true, false, false, {}, {}, {}, {}, {}, {}, "rmw_format", 500ms};

and repeatedly run only those RecordIntegrationTestFixture, test_keyboard_controls test.

I see in the debugger that freeze occurs in our modified stop_spinning() right on the spin_thread_.join();, exactly as you were suspecting.

It seems the problem is in the underlying SingleThreadedExecutor spin() and cancel() routines. It looks like cancel doen't really work as expected and we are not exiting from the spin().

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clalancette Uff, it was a nasty bug, but I've nailed it down. I've spent a day hunting for it.
I was even start thinking that we have a memory corruption or a race condition in the executor. But not. Everything is more simple.
It is just turns out that we spawn a new spin_thread_ for exec_->spin();.
However, in some rare cases a context switch happened and we ran throw all test scenario and even call exec_->cancel(); from the stop_spinning(), but context haven't yet got back to the spin_thread_ and exec_->spin().
The root cause of the freeze was that exec_->spin() was executing after we already called exec_->cancel();.

I added a comment with proposal to wait untill executor will start spinning before we will get back from the start_async_spin(node).
I was able to robustly reproduce this issue and verify fix on my local setup.

clalancette and others added 2 commits September 12, 2024 12:36
…xture.hpp

Co-authored-by: Michael Orlov <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
@clalancette
Copy link
Contributor Author

clalancette commented Sep 12, 2024

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/5271579611710ffed73686f516a6ad9e/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14555

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor Author

Yep, this seems much happier now. Thanks again @MichaelOrlov , just need approval again then we can merge this in.

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MichaelOrlov MichaelOrlov merged commit 2d4d02f into rolling Sep 12, 2024
12 checks passed
@MichaelOrlov
Copy link
Contributor

https://github.com/Mergifyio backport jazzy

Copy link

mergify bot commented Sep 12, 2024

backport jazzy

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Sep 12, 2024
* Remove wait_until_shutdown.

This has almost exactly the same functionality as wait_for_condition,
except for two things:

1.  It is templated on the Timeout type.
2.  It calls rclcpp::shutdown after the loop completes.

However, neither of those is necessary; all callers to it use
a std::chrono::duration, and all of the test fixtures already
call rclcpp::shutdown.  Thus, just remove it and make all
callers use wait_for_condition instead.

Signed-off-by: Chris Lalancette <[email protected]>

* Shutdown the async spinner node without rclcpp::shutdown.

That is, we really don't actually want to do a full
rclcpp shutdown here; we only want to stop spinning.
Accomplish that with an executor, and timing out
every 100 milliseconds to check if we are done yet.

Signed-off-by: Chris Lalancette <[email protected]>

* Small fixes to start_async_spin in rosbag2_tests.

Make sure it only spins as long as we haven't shutdown,
and that it wakes up every so often to check that fact.

Signed-off-by: Chris Lalancette <[email protected]>

* Wait for topics to be discovered during recorder->record().

The main reason for that is that these tests generally want
to test certain expectations around how many messages were
received.  However, if discovery takes longer than we expect,
then it could be the case that we "missed" messages at the
beginning because discovery hadn't yet completed.

Fix this by just waiting around for the recorder to get all
the subscriptions it expects before moving on with the test.

Signed-off-by: Chris Lalancette <[email protected]>

* Feedback from review.

Signed-off-by: Chris Lalancette <[email protected]>

* Switch to using MockRecorder.

Signed-off-by: Chris Lalancette <[email protected]>

* Fixes from review.

Signed-off-by: Chris Lalancette <[email protected]>

* Feedback from review.

Signed-off-by: Chris Lalancette <[email protected]>

* Apply suggestions from code review

Co-authored-by: Michael Orlov <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>

* Switch to using spin, rather than spin_some.

That's because there is currently at least one bug
associated with spin_some in rclcpp.  However, it turns
out that we don't even need to use it, as we can just as
easily use spin() along with exec.cancel().

Signed-off-by: Chris Lalancette <[email protected]>

* Make sure to stop_spinning when we tear down the test.

Signed-off-by: Chris Lalancette <[email protected]>

* Use scopes to shutdown spinning.

Signed-off-by: Chris Lalancette <[email protected]>

* Nested contexts just to explicitly cleanup the async spinners.

Signed-off-by: Chris Lalancette <[email protected]>

* Update rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp

Co-authored-by: Michael Orlov <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>

* Apply the same fix to rosbag2_tests.

Signed-off-by: Chris Lalancette <[email protected]>

---------

Signed-off-by: Chris Lalancette <[email protected]>
Co-authored-by: Michael Orlov <[email protected]>
(cherry picked from commit 2d4d02f)

# Conflicts:
#	rosbag2_transport/test/rosbag2_transport/test_record.cpp
#	rosbag2_transport/test/rosbag2_transport/test_record_all.cpp
#	rosbag2_transport/test/rosbag2_transport/test_record_all_ignore_leaf_topics.cpp
#	rosbag2_transport/test/rosbag2_transport/test_record_all_no_discovery.cpp
#	rosbag2_transport/test/rosbag2_transport/test_record_all_use_sim_time.cpp
#	rosbag2_transport/test/rosbag2_transport/test_record_regex.cpp
@MichaelOrlov MichaelOrlov deleted the clalancette/rosbag2-fix-tests branch September 12, 2024 20:36
MichaelOrlov added a commit that referenced this pull request Sep 18, 2024
* Remove wait_until_shutdown.

This has almost exactly the same functionality as wait_for_condition,
except for two things:

1.  It is templated on the Timeout type.
2.  It calls rclcpp::shutdown after the loop completes.

However, neither of those is necessary; all callers to it use
a std::chrono::duration, and all of the test fixtures already
call rclcpp::shutdown.  Thus, just remove it and make all
callers use wait_for_condition instead.

Signed-off-by: Chris Lalancette <[email protected]>

* Shutdown the async spinner node without rclcpp::shutdown.

That is, we really don't actually want to do a full
rclcpp shutdown here; we only want to stop spinning.
Accomplish that with an executor, and timing out
every 100 milliseconds to check if we are done yet.

Signed-off-by: Chris Lalancette <[email protected]>

* Small fixes to start_async_spin in rosbag2_tests.

Make sure it only spins as long as we haven't shutdown,
and that it wakes up every so often to check that fact.

Signed-off-by: Chris Lalancette <[email protected]>

* Wait for topics to be discovered during recorder->record().

The main reason for that is that these tests generally want
to test certain expectations around how many messages were
received.  However, if discovery takes longer than we expect,
then it could be the case that we "missed" messages at the
beginning because discovery hadn't yet completed.

Fix this by just waiting around for the recorder to get all
the subscriptions it expects before moving on with the test.

Signed-off-by: Chris Lalancette <[email protected]>

* Feedback from review.

Signed-off-by: Chris Lalancette <[email protected]>

* Switch to using MockRecorder.

Signed-off-by: Chris Lalancette <[email protected]>

* Fixes from review.

Signed-off-by: Chris Lalancette <[email protected]>

* Feedback from review.

Signed-off-by: Chris Lalancette <[email protected]>

* Apply suggestions from code review

Co-authored-by: Michael Orlov <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>

* Switch to using spin, rather than spin_some.

That's because there is currently at least one bug
associated with spin_some in rclcpp.  However, it turns
out that we don't even need to use it, as we can just as
easily use spin() along with exec.cancel().

Signed-off-by: Chris Lalancette <[email protected]>

* Make sure to stop_spinning when we tear down the test.

Signed-off-by: Chris Lalancette <[email protected]>

* Use scopes to shutdown spinning.

Signed-off-by: Chris Lalancette <[email protected]>

* Nested contexts just to explicitly cleanup the async spinners.

Signed-off-by: Chris Lalancette <[email protected]>

* Update rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp

Co-authored-by: Michael Orlov <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>

* Apply the same fix to rosbag2_tests.

Signed-off-by: Chris Lalancette <[email protected]>

---------

Signed-off-by: Chris Lalancette <[email protected]>
Co-authored-by: Michael Orlov <[email protected]>
MichaelOrlov added a commit that referenced this pull request Sep 18, 2024
* Remove wait_until_shutdown.

This has almost exactly the same functionality as wait_for_condition,
except for two things:

1.  It is templated on the Timeout type.
2.  It calls rclcpp::shutdown after the loop completes.

However, neither of those is necessary; all callers to it use
a std::chrono::duration, and all of the test fixtures already
call rclcpp::shutdown.  Thus, just remove it and make all
callers use wait_for_condition instead.



* Shutdown the async spinner node without rclcpp::shutdown.

That is, we really don't actually want to do a full
rclcpp shutdown here; we only want to stop spinning.
Accomplish that with an executor, and timing out
every 100 milliseconds to check if we are done yet.



* Small fixes to start_async_spin in rosbag2_tests.

Make sure it only spins as long as we haven't shutdown,
and that it wakes up every so often to check that fact.



* Wait for topics to be discovered during recorder->record().

The main reason for that is that these tests generally want
to test certain expectations around how many messages were
received.  However, if discovery takes longer than we expect,
then it could be the case that we "missed" messages at the
beginning because discovery hadn't yet completed.

Fix this by just waiting around for the recorder to get all
the subscriptions it expects before moving on with the test.



* Feedback from review.



* Switch to using MockRecorder.



* Fixes from review.



* Feedback from review.



* Apply suggestions from code review




* Switch to using spin, rather than spin_some.

That's because there is currently at least one bug
associated with spin_some in rclcpp.  However, it turns
out that we don't even need to use it, as we can just as
easily use spin() along with exec.cancel().



* Make sure to stop_spinning when we tear down the test.



* Use scopes to shutdown spinning.



* Nested contexts just to explicitly cleanup the async spinners.



* Update rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp




* Apply the same fix to rosbag2_tests.



---------

Signed-off-by: Chris Lalancette <[email protected]>
Co-authored-by: Chris Lalancette <[email protected]>
Co-authored-by: Michael Orlov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants