Improve the reliability of rosbag2 tests #1796

clalancette · 2024-08-28T19:39:06Z

I initially started this series trying to track down a rare failing flakey test with test_record__rmw_cyclonedds_cpp. That particular flake seems to be able to happen because sometimes discovery takes longer than we expect, and it is possible that the tests "miss" the first publication. If that's the case, then the rest of the test may fail because it is expecting a certai number of messages. Along the way, we cleanup the tests a bit:

Remove the unnecessary wait_until_shutdown method, which was almost exactly the same as wait_for_condition.
Slightly revamp how the async spinner works, so it doesn't need to call rclcpp::shutdown.
Do a similar revamp for the custom async spinner in rosbag2_tests.
Wait for topics to be discovered right after calling recorder->record(). This ensures we can't get into the above situation.

With this series in place, the particular flake of test_record on rmw_cyclonedds_cpp is fixed (or, at least, I can no longer reproduce it). There are still other flakes that can happen under load, but I think fixes for those will have to come separately.

clalancette · 2024-08-28T19:40:13Z

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/77bebfb56ccf2c9d926ce6c8343b0773/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14483

Linux
Linux-aarch64
Linux-rhel
Windows

fujitatomoya

overall lgtm, but one comment.

rosbag2_transport/test/rosbag2_transport/test_record.cpp

rosbag2_transport/test/rosbag2_transport/test_record_all.cpp

fujitatomoya

lgtm, @MichaelOrlov can you have a look too? (i do not have maintainer access on rosbag2)

MichaelOrlov · 2024-09-03T19:43:14Z

I will take a look. Anyway, hold off from the merging this for a while. We are trying to make a relase on rolling #1791

MichaelOrlov

@clalancette Thanks for trying to make rosbag2 tests less flaky.
I have a concern about adding Wait for topics to be discovered commit and would like to request to revert it along with the 02d5bba

The reason for removal request is because it is really redundant IMO and not robust.

Redundant because in all places we already have either ASSERT_TRUE(pub_manager.wait_for_matched(expecting_topic_name); or ASSERT_TRUE(client_manager_1->wait_for_service_to_be_ready());. Which is making sure that recorder have already been subscribed to the topics on which we are going to publish.
Not robust because in those newly added wait condition - there is no check for the topic names on which recorder subscribed. There is only check for the number of subscriptions. And had assumption that // /rosout and // /events/write_split topics will always be present and counted, which is not necessarily may be true for all CI configurations and underlying middleware layers.

If aforementioned ASSERT_TRUE(pub_manager.wait_for_matched(expecting_topic_name); and ASSERT_TRUE(client_manager_1->wait_for_service_to_be_ready()); checks are not enough we have a better way to do a wait for the recorder discovery. In some unit tests we use MockRecorder with extended utility methods to facilitate waiting functionality.
For instance:

  ASSERT_TRUE(recorder->wait_for_topic_to_be_discovered(string_topic));
  ASSERT_FALSE(recorder->topic_available_for_recording(string_topic));

Please see implementation details in the

rosbag2/rosbag2_transport/test/rosbag2_transport/mock_recorder.hpp

Lines 41 to 73 in 88c51a1

    
           template<typename DurationRepT = int64_t, typename DurationT = std::milli> 
        
           bool wait_for_topic_to_be_discovered( 
        
             const std::string & topic_name_to_wait_for, 
        
             std::chrono::duration<DurationRepT, DurationT> timeout = std::chrono::seconds(10)) 
        
           { 
        
             bool discovered = false; 
        
             using clock = std::chrono::steady_clock; 
        
             auto start = clock::now(); 
        
             do { 
        
               auto topic_names_and_types = this->get_topic_names_and_types(); 
        
               for (const auto &[topic_name, topic_types] : topic_names_and_types) { 
        
                 if (topic_name_to_wait_for == topic_name) { 
        
                   discovered = true; 
        
                   break; 
        
                 } 
        
               } 
        
               std::this_thread::sleep_for(std::chrono::milliseconds(20)); 
        
             } while (!discovered && (clock::now() - start) < timeout); 
        
             return discovered; 
        
           } 
        
           bool topic_available_for_recording(const std::string & topic_name) 
        
           { 
        
             bool available_for_recording = false; 
        
             auto topics_to_subscribe = this->get_requested_or_available_topics(); 
        
             for (const auto & topic_and_type : topics_to_subscribe) { 
        
               if (topic_and_type.first == topic_name) { 
        
                 available_for_recording = true; 
        
                 break; 
        
               } 
        
             } 
        
             return available_for_recording; 
        
           }

rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp

rosbag2_tests/test/rosbag2_tests/test_rosbag2_cpp_get_service_info.cpp

This has almost exactly the same functionality as wait_for_condition, except for two things: 1. It is templated on the Timeout type. 2. It calls rclcpp::shutdown after the loop completes. However, neither of those is necessary; all callers to it use a std::chrono::duration, and all of the test fixtures already call rclcpp::shutdown. Thus, just remove it and make all callers use wait_for_condition instead. Signed-off-by: Chris Lalancette <[email protected]>

That is, we really don't actually want to do a full rclcpp shutdown here; we only want to stop spinning. Accomplish that with an executor, and timing out every 100 milliseconds to check if we are done yet. Signed-off-by: Chris Lalancette <[email protected]>

Make sure it only spins as long as we haven't shutdown, and that it wakes up every so often to check that fact. Signed-off-by: Chris Lalancette <[email protected]>

The main reason for that is that these tests generally want to test certain expectations around how many messages were received. However, if discovery takes longer than we expect, then it could be the case that we "missed" messages at the beginning because discovery hadn't yet completed. Fix this by just waiting around for the recorder to get all the subscriptions it expects before moving on with the test. Signed-off-by: Chris Lalancette <[email protected]>

Signed-off-by: Chris Lalancette <[email protected]>

clalancette · 2024-09-04T15:44:08Z

Redundant because in all places we already have either ASSERT_TRUE(pub_manager.wait_for_matched(expecting_topic_name); or ASSERT_TRUE(client_manager_1->wait_for_service_to_be_ready());. Which is making sure that recorder have already been subscribed to the topics on which we are going to publish.

The problem is that this is not always true. There are at least two situations this does not cover: for the event topic associated with services (/service_name/_events), and for the event topic associated with the recorder splitting functionality (/event/split_write).

pub_manager_wait_for_matched(expecting_topic_name) will only wait for the named topic, which is a topic that the pub_manager is managing. That does not apply to the /service_name/_events topic that services use for events.

Additionally, client_manager->wait_for_service_to_be_ready() only waits for the service topics to be ready, not the event topic. So again, it does not apply to the /service_name/_events topic that services use for events.

That's why adding in the additional waits for recorder subscriptions makes the tests more reliable, since it handles both of these situations.

If aforementioned ASSERT_TRUE(pub_manager.wait_for_matched(expecting_topic_name); and ASSERT_TRUE(client_manager_1->wait_for_service_to_be_ready()); checks are not enough we have a better way to do a wait for the recorder discovery. In some unit tests we use MockRecorder with extended utility methods to facilitate waiting functionality.

But fair enough, I'll take a look at switching to this.

Signed-off-by: Chris Lalancette <[email protected]>

clalancette · 2024-09-04T16:15:58Z

All right, see 11bb864, which uses the MockRecorder methods instead.

Signed-off-by: Chris Lalancette <[email protected]>

clalancette · 2024-09-04T16:51:52Z

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/08cc831129805c683801a5ebf1f8b33a/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14511

Linux
Linux-aarch64
Linux-rhel
Windows

rosbag2_transport/test/rosbag2_transport/test_record_all.cpp

Signed-off-by: Chris Lalancette <[email protected]>

MichaelOrlov

IMO, waiting for "/events/write_split" is redundant since it is an internal service in the rosbag2_recorder and it is not participating in the test scenarios where it was added.

Also found a few nitpicks.

rosbag2_transport/test/rosbag2_transport/test_record_all.cpp

rosbag2_transport/test/rosbag2_transport/test_record_regex.cpp

Co-authored-by: Michael Orlov <[email protected]> Signed-off-by: Chris Lalancette <[email protected]>

MichaelOrlov

LGTM now.
Thanks!

clalancette · 2024-09-04T20:51:24Z

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/475d8bf4bfedcf19c2136589ee4451f9/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14514

Linux
Linux-aarch64
Linux-rhel
Windows

MichaelOrlov · 2024-09-05T01:15:52Z

I'm curious what a "surprise" failure with rmw_connextdds on CI is.

clalancette · 2024-09-05T12:17:31Z

I'm curious what a "surprise" failure with rmw_connextdds on CI is.

Yeah, I don't know what is going on with that. The only change to that test is to wait for the clock topic to show up. I'll need to look deeper into it, though, since that seems to be a new failure with this PR.

That's because there is currently at least one bug associated with spin_some in rclcpp. However, it turns out that we don't even need to use it, as we can just as easily use spin() along with exec.cancel(). Signed-off-by: Chris Lalancette <[email protected]>

clalancette · 2024-09-05T15:36:44Z

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/0e69722137e215ba0dee2c7dc96e1a6c/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14518

Linux
Linux-aarch64
Linux-rhel
Windows

clalancette · 2024-09-05T18:01:08Z

The only failures here are on Windows, and that has nothing to do with this PR (those will be fixed by ros2/common_interfaces#256). So I think this is ready for final review now.

MichaelOrlov

@clalancette Curious what was behind decision to move explict call for the stop_spinning() to the TearDown() method?

I would prefer to have explicit call to the stop_spinning() in the tests where it is needed in pair with start_spin(node) or start_async_spin(node) with

 auto cleanup_process_handle = rcpputils::make_scope_exit([&]() {stop_spinning();});

to have a clean and concise logic in tests.

In some tests we have some checks after explictly calling stop_spinning() which could rely on the assumption that some handles (node and files) will be free and nothing else will arrive during those extra checks or cleanups.

IMO adding stop_spinning() to the TearDown() method is a last resort and a sort of hacky way to handle it and could potentially cause another flakiness in tests in the future.

clalancette · 2024-09-06T12:13:30Z

@clalancette Curious what was behind decision to move explict call for the stop_spinning() to the TearDown() method?

So the first thing to realize is that because we are putting the spin call in a thread, we have to call stop_spinning() (which calls thread.join()), otherwise we get an exception from the std::thread destructor.

With that in mind, I could have used rcpputils::make_scope_exit to stop the spinning. However, by default that is actually no different than calling it in TearDown, since we wouldn't call that until the end of the function anyway.

I have a prototype locally where I did exactly what you said and added in new scopes, with the rcpputils::make_scope_exit, and that works fine. But it also ends up in a far larger change, because of those additional scopes and the reindentation of a lot of code. Let me know if you still want me to do that, and I can make the change.

Signed-off-by: Chris Lalancette <[email protected]>

clalancette · 2024-09-06T17:21:35Z

As something of a compromise, 404c78a adds in rcpputils::make_scope_exit calls without changing the scopes. This isn't too different from putting it in TearDown (which I've left as a backup), but it would allow us to change scopes later. Let me know what you think.

MichaelOrlov

@clalancette Thanks for adding a safeguard catcher auto cleanup_process_handle = rcpputils::make_scope_exit([&]() {stop_spinning();});
However, my main concern from the previous comment was that the explicit call to the stop_spinning() was deleted in some tests. Could you please add them back?
I added comments in places where it was deleted.

rosbag2_tests/test/rosbag2_tests/test_rosbag2_cpp_get_service_info.cpp

rosbag2_transport/test/rosbag2_transport/test_record.cpp

rosbag2_transport/test/rosbag2_transport/test_record_all_use_sim_time.cpp

Signed-off-by: Chris Lalancette <[email protected]>

MichaelOrlov

LGTM now.

clalancette · 2024-09-10T16:16:40Z

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/1986838f6b0c6a14c9b1af14a8340640/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14549

Linux
Linux-aarch64
Linux-rhel
Windows

rosbag2_transport/test/rosbag2_transport/test_keyboard_controls.cpp

clalancette · 2024-09-11T14:26:47Z

@ros-pull-request-builder retest this please

rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp

MichaelOrlov · 2024-09-12T01:09:48Z

@clalancette I was able to reproduce freeze locally on Ubuntu by increasing discovery wait interval up to the 500mS

rosbag2_transport::RecordOptions record_options =
  {true, false, false, {}, {}, {}, {}, {}, {}, "rmw_format", 500ms};

and repeatedly run only those RecordIntegrationTestFixture, test_keyboard_controls test.

I see in the debugger that freeze occurs in our modified stop_spinning() right on the spin_thread_.join();, exactly as you were suspecting.

It seems the problem is in the underlying SingleThreadedExecutor spin() and cancel() routines. It looks like cancel doen't really work as expected and we are not exiting from the spin().

MichaelOrlov

@clalancette Uff, it was a nasty bug, but I've nailed it down. I've spent a day hunting for it.
I was even start thinking that we have a memory corruption or a race condition in the executor. But not. Everything is more simple.
It is just turns out that we spawn a new spin_thread_ for exec_->spin();.
However, in some rare cases a context switch happened and we ran throw all test scenario and even call exec_->cancel(); from the stop_spinning(), but context haven't yet got back to the spin_thread_ and exec_->spin().
The root cause of the freeze was that exec_->spin() was executing after we already called exec_->cancel();.

I added a comment with proposal to wait untill executor will start spinning before we will get back from the start_async_spin(node).
I was able to robustly reproduce this issue and verify fix on my local setup.

rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp

…xture.hpp Co-authored-by: Michael Orlov <[email protected]> Signed-off-by: Chris Lalancette <[email protected]>

Signed-off-by: Chris Lalancette <[email protected]>

clalancette · 2024-09-12T17:00:58Z

Pulls: #1796
Gist: https://gist.githubusercontent.com/clalancette/5271579611710ffed73686f516a6ad9e/raw/cb4f2fb2fb0fac274fbef12b77b13699d0b0e232/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14555

Linux
Linux-aarch64
Linux-rhel
Windows

clalancette · 2024-09-12T18:38:16Z

Yep, this seems much happier now. Thanks again @MichaelOrlov , just need approval again then we can merge this in.

MichaelOrlov

LGTM

MichaelOrlov · 2024-09-12T20:34:18Z

https://github.com/Mergifyio backport jazzy

mergify · 2024-09-12T20:34:26Z

backport jazzy

✅ Backports have been created

#1806 Improve the reliability of rosbag2 tests (backport #1796) has been created for branch jazzy but encountered conflicts

* Remove wait_until_shutdown. This has almost exactly the same functionality as wait_for_condition, except for two things: 1. It is templated on the Timeout type. 2. It calls rclcpp::shutdown after the loop completes. However, neither of those is necessary; all callers to it use a std::chrono::duration, and all of the test fixtures already call rclcpp::shutdown. Thus, just remove it and make all callers use wait_for_condition instead. Signed-off-by: Chris Lalancette <[email protected]> * Shutdown the async spinner node without rclcpp::shutdown. That is, we really don't actually want to do a full rclcpp shutdown here; we only want to stop spinning. Accomplish that with an executor, and timing out every 100 milliseconds to check if we are done yet. Signed-off-by: Chris Lalancette <[email protected]> * Small fixes to start_async_spin in rosbag2_tests. Make sure it only spins as long as we haven't shutdown, and that it wakes up every so often to check that fact. Signed-off-by: Chris Lalancette <[email protected]> * Wait for topics to be discovered during recorder->record(). The main reason for that is that these tests generally want to test certain expectations around how many messages were received. However, if discovery takes longer than we expect, then it could be the case that we "missed" messages at the beginning because discovery hadn't yet completed. Fix this by just waiting around for the recorder to get all the subscriptions it expects before moving on with the test. Signed-off-by: Chris Lalancette <[email protected]> * Feedback from review. Signed-off-by: Chris Lalancette <[email protected]> * Switch to using MockRecorder. Signed-off-by: Chris Lalancette <[email protected]> * Fixes from review. Signed-off-by: Chris Lalancette <[email protected]> * Feedback from review. Signed-off-by: Chris Lalancette <[email protected]> * Apply suggestions from code review Co-authored-by: Michael Orlov <[email protected]> Signed-off-by: Chris Lalancette <[email protected]> * Switch to using spin, rather than spin_some. That's because there is currently at least one bug associated with spin_some in rclcpp. However, it turns out that we don't even need to use it, as we can just as easily use spin() along with exec.cancel(). Signed-off-by: Chris Lalancette <[email protected]> * Make sure to stop_spinning when we tear down the test. Signed-off-by: Chris Lalancette <[email protected]> * Use scopes to shutdown spinning. Signed-off-by: Chris Lalancette <[email protected]> * Nested contexts just to explicitly cleanup the async spinners. Signed-off-by: Chris Lalancette <[email protected]> * Update rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp Co-authored-by: Michael Orlov <[email protected]> Signed-off-by: Chris Lalancette <[email protected]> * Apply the same fix to rosbag2_tests. Signed-off-by: Chris Lalancette <[email protected]> --------- Signed-off-by: Chris Lalancette <[email protected]> Co-authored-by: Michael Orlov <[email protected]> (cherry picked from commit 2d4d02f) # Conflicts: # rosbag2_transport/test/rosbag2_transport/test_record.cpp # rosbag2_transport/test/rosbag2_transport/test_record_all.cpp # rosbag2_transport/test/rosbag2_transport/test_record_all_ignore_leaf_topics.cpp # rosbag2_transport/test/rosbag2_transport/test_record_all_no_discovery.cpp # rosbag2_transport/test/rosbag2_transport/test_record_all_use_sim_time.cpp # rosbag2_transport/test/rosbag2_transport/test_record_regex.cpp

* Remove wait_until_shutdown. This has almost exactly the same functionality as wait_for_condition, except for two things: 1. It is templated on the Timeout type. 2. It calls rclcpp::shutdown after the loop completes. However, neither of those is necessary; all callers to it use a std::chrono::duration, and all of the test fixtures already call rclcpp::shutdown. Thus, just remove it and make all callers use wait_for_condition instead. Signed-off-by: Chris Lalancette <[email protected]> * Shutdown the async spinner node without rclcpp::shutdown. That is, we really don't actually want to do a full rclcpp shutdown here; we only want to stop spinning. Accomplish that with an executor, and timing out every 100 milliseconds to check if we are done yet. Signed-off-by: Chris Lalancette <[email protected]> * Small fixes to start_async_spin in rosbag2_tests. Make sure it only spins as long as we haven't shutdown, and that it wakes up every so often to check that fact. Signed-off-by: Chris Lalancette <[email protected]> * Wait for topics to be discovered during recorder->record(). The main reason for that is that these tests generally want to test certain expectations around how many messages were received. However, if discovery takes longer than we expect, then it could be the case that we "missed" messages at the beginning because discovery hadn't yet completed. Fix this by just waiting around for the recorder to get all the subscriptions it expects before moving on with the test. Signed-off-by: Chris Lalancette <[email protected]> * Feedback from review. Signed-off-by: Chris Lalancette <[email protected]> * Switch to using MockRecorder. Signed-off-by: Chris Lalancette <[email protected]> * Fixes from review. Signed-off-by: Chris Lalancette <[email protected]> * Feedback from review. Signed-off-by: Chris Lalancette <[email protected]> * Apply suggestions from code review Co-authored-by: Michael Orlov <[email protected]> Signed-off-by: Chris Lalancette <[email protected]> * Switch to using spin, rather than spin_some. That's because there is currently at least one bug associated with spin_some in rclcpp. However, it turns out that we don't even need to use it, as we can just as easily use spin() along with exec.cancel(). Signed-off-by: Chris Lalancette <[email protected]> * Make sure to stop_spinning when we tear down the test. Signed-off-by: Chris Lalancette <[email protected]> * Use scopes to shutdown spinning. Signed-off-by: Chris Lalancette <[email protected]> * Nested contexts just to explicitly cleanup the async spinners. Signed-off-by: Chris Lalancette <[email protected]> * Update rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp Co-authored-by: Michael Orlov <[email protected]> Signed-off-by: Chris Lalancette <[email protected]> * Apply the same fix to rosbag2_tests. Signed-off-by: Chris Lalancette <[email protected]> --------- Signed-off-by: Chris Lalancette <[email protected]> Co-authored-by: Michael Orlov <[email protected]>

* Remove wait_until_shutdown. This has almost exactly the same functionality as wait_for_condition, except for two things: 1. It is templated on the Timeout type. 2. It calls rclcpp::shutdown after the loop completes. However, neither of those is necessary; all callers to it use a std::chrono::duration, and all of the test fixtures already call rclcpp::shutdown. Thus, just remove it and make all callers use wait_for_condition instead. * Shutdown the async spinner node without rclcpp::shutdown. That is, we really don't actually want to do a full rclcpp shutdown here; we only want to stop spinning. Accomplish that with an executor, and timing out every 100 milliseconds to check if we are done yet. * Small fixes to start_async_spin in rosbag2_tests. Make sure it only spins as long as we haven't shutdown, and that it wakes up every so often to check that fact. * Wait for topics to be discovered during recorder->record(). The main reason for that is that these tests generally want to test certain expectations around how many messages were received. However, if discovery takes longer than we expect, then it could be the case that we "missed" messages at the beginning because discovery hadn't yet completed. Fix this by just waiting around for the recorder to get all the subscriptions it expects before moving on with the test. * Feedback from review. * Switch to using MockRecorder. * Fixes from review. * Feedback from review. * Apply suggestions from code review * Switch to using spin, rather than spin_some. That's because there is currently at least one bug associated with spin_some in rclcpp. However, it turns out that we don't even need to use it, as we can just as easily use spin() along with exec.cancel(). * Make sure to stop_spinning when we tear down the test. * Use scopes to shutdown spinning. * Nested contexts just to explicitly cleanup the async spinners. * Update rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp * Apply the same fix to rosbag2_tests. --------- Signed-off-by: Chris Lalancette <[email protected]> Co-authored-by: Chris Lalancette <[email protected]> Co-authored-by: Michael Orlov <[email protected]>

clalancette requested a review from a team as a code owner August 28, 2024 19:39

clalancette requested review from emersonknapp and james-rms and removed request for a team August 28, 2024 19:39

fujitatomoya reviewed Sep 1, 2024

View reviewed changes

rosbag2_transport/test/rosbag2_transport/test_record.cpp Show resolved Hide resolved

rosbag2_transport/test/rosbag2_transport/test_record_all.cpp Outdated Show resolved Hide resolved

fujitatomoya approved these changes Sep 3, 2024

View reviewed changes

MichaelOrlov requested changes Sep 3, 2024

View reviewed changes

rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp Outdated Show resolved Hide resolved

rosbag2_tests/test/rosbag2_tests/test_rosbag2_cpp_get_service_info.cpp Outdated Show resolved Hide resolved

clalancette added 5 commits September 4, 2024 15:40

Small fixes to start_async_spin in rosbag2_tests.

76eee02

Make sure it only spins as long as we haven't shutdown, and that it wakes up every so often to check that fact. Signed-off-by: Chris Lalancette <[email protected]>

Feedback from review.

c73fa80

Signed-off-by: Chris Lalancette <[email protected]>

Switch to using MockRecorder.

11bb864

Signed-off-by: Chris Lalancette <[email protected]>

clalancette force-pushed the clalancette/rosbag2-fix-tests branch from 02d5bba to 11bb864 Compare September 4, 2024 16:15

Fixes from review.

7991126

Signed-off-by: Chris Lalancette <[email protected]>

MichaelOrlov reviewed Sep 4, 2024

View reviewed changes

rosbag2_transport/test/rosbag2_transport/test_record_all.cpp Outdated Show resolved Hide resolved

MichaelOrlov reviewed Sep 4, 2024

View reviewed changes

rosbag2_transport/test/rosbag2_transport/test_record_all.cpp Show resolved Hide resolved

Feedback from review.

83b0b6b

Signed-off-by: Chris Lalancette <[email protected]>

MichaelOrlov reviewed Sep 4, 2024

View reviewed changes

rosbag2_transport/test/rosbag2_transport/test_record_all.cpp Outdated Show resolved Hide resolved

rosbag2_transport/test/rosbag2_transport/test_record_regex.cpp Outdated Show resolved Hide resolved

rosbag2_transport/test/rosbag2_transport/test_record_regex.cpp Outdated Show resolved Hide resolved

Apply suggestions from code review

0724e87

Co-authored-by: Michael Orlov <[email protected]> Signed-off-by: Chris Lalancette <[email protected]>

MichaelOrlov approved these changes Sep 4, 2024

View reviewed changes

MichaelOrlov reviewed Sep 5, 2024

View reviewed changes

Use scopes to shutdown spinning.

404c78a

Signed-off-by: Chris Lalancette <[email protected]>

MichaelOrlov reviewed Sep 6, 2024

View reviewed changes

Nested contexts just to explicitly cleanup the async spinners.

e63f8c7

Signed-off-by: Chris Lalancette <[email protected]>

MichaelOrlov approved these changes Sep 10, 2024

View reviewed changes

MichaelOrlov reviewed Sep 10, 2024

View reviewed changes

rosbag2_transport/test/rosbag2_transport/test_keyboard_controls.cpp Show resolved Hide resolved

mergify bot mentioned this pull request Sep 11, 2024

[DO NOT MERGE]: Debugging via the Rpr job for https://github.com/ros2/rosbag2/pull/1796 #1803

Closed

MichaelOrlov reviewed Sep 12, 2024

View reviewed changes

rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp Show resolved Hide resolved

MichaelOrlov requested changes Sep 12, 2024

View reviewed changes

rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp Outdated Show resolved Hide resolved

clalancette and others added 2 commits September 12, 2024 12:36

Update rosbag2_transport/test/rosbag2_transport/record_integration_fi…

63391ab

…xture.hpp Co-authored-by: Michael Orlov <[email protected]> Signed-off-by: Chris Lalancette <[email protected]>

Apply the same fix to rosbag2_tests.

275c10d

Signed-off-by: Chris Lalancette <[email protected]>

clalancette mentioned this pull request Sep 12, 2024

🧑‍🌾 test_rosbag2_play_end_to_end flaky on Windows #1437

Open

MichaelOrlov approved these changes Sep 12, 2024

View reviewed changes

MichaelOrlov merged commit 2d4d02f into rolling Sep 12, 2024
12 checks passed

mergify bot mentioned this pull request Sep 12, 2024

[jazzy] Improve the reliability of rosbag2 tests (backport #1796) #1806

Merged

MichaelOrlov deleted the clalancette/rosbag2-fix-tests branch September 12, 2024 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the reliability of rosbag2 tests #1796

Improve the reliability of rosbag2 tests #1796

clalancette commented Aug 28, 2024

clalancette commented Aug 28, 2024 •

edited

Loading

fujitatomoya left a comment

fujitatomoya left a comment

MichaelOrlov commented Sep 3, 2024

MichaelOrlov left a comment

clalancette commented Sep 4, 2024

clalancette commented Sep 4, 2024

clalancette commented Sep 4, 2024

MichaelOrlov left a comment

MichaelOrlov left a comment

clalancette commented Sep 4, 2024 •

edited

Loading

MichaelOrlov commented Sep 5, 2024

clalancette commented Sep 5, 2024

clalancette commented Sep 5, 2024

clalancette commented Sep 5, 2024

MichaelOrlov left a comment

clalancette commented Sep 6, 2024

clalancette commented Sep 6, 2024

MichaelOrlov left a comment

MichaelOrlov left a comment

clalancette commented Sep 10, 2024 •

edited

Loading

clalancette commented Sep 11, 2024

MichaelOrlov commented Sep 12, 2024

MichaelOrlov left a comment •

edited

Loading

clalancette commented Sep 12, 2024 •

edited

Loading

clalancette commented Sep 12, 2024

MichaelOrlov left a comment

MichaelOrlov commented Sep 12, 2024

mergify bot commented Sep 12, 2024 •

edited

Loading

	template<typename DurationRepT = int64_t, typename DurationT = std::milli>
	bool wait_for_topic_to_be_discovered(
	const std::string & topic_name_to_wait_for,
	std::chrono::duration<DurationRepT, DurationT> timeout = std::chrono::seconds(10))
	{
	bool discovered = false;
	using clock = std::chrono::steady_clock;
	auto start = clock::now();
	do {
	auto topic_names_and_types = this->get_topic_names_and_types();
	for (const auto &[topic_name, topic_types] : topic_names_and_types) {
	if (topic_name_to_wait_for == topic_name) {
	discovered = true;
	break;
	}
	}
	std::this_thread::sleep_for(std::chrono::milliseconds(20));
	} while (!discovered && (clock::now() - start) < timeout);
	return discovered;
	}

	bool topic_available_for_recording(const std::string & topic_name)
	{
	bool available_for_recording = false;
	auto topics_to_subscribe = this->get_requested_or_available_topics();
	for (const auto & topic_and_type : topics_to_subscribe) {
	if (topic_and_type.first == topic_name) {
	available_for_recording = true;
	break;
	}
	}
	return available_for_recording;
	}

Improve the reliability of rosbag2 tests #1796

Improve the reliability of rosbag2 tests #1796

Conversation

clalancette commented Aug 28, 2024

clalancette commented Aug 28, 2024 • edited Loading

fujitatomoya left a comment

Choose a reason for hiding this comment

fujitatomoya left a comment

Choose a reason for hiding this comment

MichaelOrlov commented Sep 3, 2024

MichaelOrlov left a comment

Choose a reason for hiding this comment

clalancette commented Sep 4, 2024

clalancette commented Sep 4, 2024

clalancette commented Sep 4, 2024

MichaelOrlov left a comment

Choose a reason for hiding this comment

MichaelOrlov left a comment

Choose a reason for hiding this comment

clalancette commented Sep 4, 2024 • edited Loading

MichaelOrlov commented Sep 5, 2024

clalancette commented Sep 5, 2024

clalancette commented Sep 5, 2024

clalancette commented Sep 5, 2024

MichaelOrlov left a comment

Choose a reason for hiding this comment

clalancette commented Sep 6, 2024

clalancette commented Sep 6, 2024

MichaelOrlov left a comment

Choose a reason for hiding this comment

MichaelOrlov left a comment

Choose a reason for hiding this comment

clalancette commented Sep 10, 2024 • edited Loading

clalancette commented Sep 11, 2024

MichaelOrlov commented Sep 12, 2024

MichaelOrlov left a comment • edited Loading

Choose a reason for hiding this comment

clalancette commented Sep 12, 2024 • edited Loading

clalancette commented Sep 12, 2024

MichaelOrlov left a comment

Choose a reason for hiding this comment

MichaelOrlov commented Sep 12, 2024

mergify bot commented Sep 12, 2024 • edited Loading

✅ Backports have been created

clalancette commented Aug 28, 2024 •

edited

Loading

clalancette commented Sep 4, 2024 •

edited

Loading

clalancette commented Sep 10, 2024 •

edited

Loading

MichaelOrlov left a comment •

edited

Loading

clalancette commented Sep 12, 2024 •

edited

Loading

mergify bot commented Sep 12, 2024 •

edited

Loading