Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jazzy] Improve the reliability of rosbag2 tests (backport #1796) #1806

Merged
merged 1 commit into from
Sep 18, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Sep 12, 2024

I initially started this series trying to track down a rare failing flakey test with test_record__rmw_cyclonedds_cpp. That particular flake seems to be able to happen because sometimes discovery takes longer than we expect, and it is possible that the tests "miss" the first publication. If that's the case, then the rest of the test may fail because it is expecting a certai number of messages. Along the way, we cleanup the tests a bit:

  1. Remove the unnecessary wait_until_shutdown method, which was almost exactly the same as wait_for_condition.
  2. Slightly revamp how the async spinner works, so it doesn't need to call rclcpp::shutdown.
  3. Do a similar revamp for the custom async spinner in rosbag2_tests.
  4. Wait for topics to be discovered right after calling recorder->record(). This ensures we can't get into the above situation.

With this series in place, the particular flake of test_record on rmw_cyclonedds_cpp is fixed (or, at least, I can no longer reproduce it). There are still other flakes that can happen under load, but I think fixes for those will have to come separately.


This is an automatic backport of pull request #1796 done by Mergify.

@mergify mergify bot requested a review from a team as a code owner September 12, 2024 20:34
@mergify mergify bot added the conflicts label Sep 12, 2024
@mergify mergify bot requested review from gbiggs and jhdcs and removed request for a team September 12, 2024 20:34
Copy link
Author

mergify bot commented Sep 12, 2024

Cherry-pick of 2d4d02f has failed:

On branch mergify/bp/jazzy/pr-1796
Your branch is up to date with 'origin/jazzy'.

You are currently cherry-picking commit 2d4d02fd.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   rosbag2_test_common/include/rosbag2_test_common/wait_for.hpp
	modified:   rosbag2_tests/test/rosbag2_tests/test_rosbag2_cpp_get_service_info.cpp
	modified:   rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp
	modified:   rosbag2_transport/test/rosbag2_transport/test_keyboard_controls.cpp
	modified:   rosbag2_transport/test/rosbag2_transport/test_record_all_include_unpublished_topics.cpp

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   rosbag2_transport/test/rosbag2_transport/test_record.cpp
	both modified:   rosbag2_transport/test/rosbag2_transport/test_record_all.cpp
	both modified:   rosbag2_transport/test/rosbag2_transport/test_record_all_ignore_leaf_topics.cpp
	both modified:   rosbag2_transport/test/rosbag2_transport/test_record_all_no_discovery.cpp
	both modified:   rosbag2_transport/test/rosbag2_transport/test_record_all_use_sim_time.cpp
	both modified:   rosbag2_transport/test/rosbag2_transport/test_record_regex.cpp

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@MichaelOrlov MichaelOrlov changed the title Improve the reliability of rosbag2 tests (backport #1796) [jazzy] Improve the reliability of rosbag2 tests (backport #1796) Sep 12, 2024
* Remove wait_until_shutdown.

This has almost exactly the same functionality as wait_for_condition,
except for two things:

1.  It is templated on the Timeout type.
2.  It calls rclcpp::shutdown after the loop completes.

However, neither of those is necessary; all callers to it use
a std::chrono::duration, and all of the test fixtures already
call rclcpp::shutdown.  Thus, just remove it and make all
callers use wait_for_condition instead.

Signed-off-by: Chris Lalancette <[email protected]>

* Shutdown the async spinner node without rclcpp::shutdown.

That is, we really don't actually want to do a full
rclcpp shutdown here; we only want to stop spinning.
Accomplish that with an executor, and timing out
every 100 milliseconds to check if we are done yet.

Signed-off-by: Chris Lalancette <[email protected]>

* Small fixes to start_async_spin in rosbag2_tests.

Make sure it only spins as long as we haven't shutdown,
and that it wakes up every so often to check that fact.

Signed-off-by: Chris Lalancette <[email protected]>

* Wait for topics to be discovered during recorder->record().

The main reason for that is that these tests generally want
to test certain expectations around how many messages were
received.  However, if discovery takes longer than we expect,
then it could be the case that we "missed" messages at the
beginning because discovery hadn't yet completed.

Fix this by just waiting around for the recorder to get all
the subscriptions it expects before moving on with the test.

Signed-off-by: Chris Lalancette <[email protected]>

* Feedback from review.

Signed-off-by: Chris Lalancette <[email protected]>

* Switch to using MockRecorder.

Signed-off-by: Chris Lalancette <[email protected]>

* Fixes from review.

Signed-off-by: Chris Lalancette <[email protected]>

* Feedback from review.

Signed-off-by: Chris Lalancette <[email protected]>

* Apply suggestions from code review

Co-authored-by: Michael Orlov <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>

* Switch to using spin, rather than spin_some.

That's because there is currently at least one bug
associated with spin_some in rclcpp.  However, it turns
out that we don't even need to use it, as we can just as
easily use spin() along with exec.cancel().

Signed-off-by: Chris Lalancette <[email protected]>

* Make sure to stop_spinning when we tear down the test.

Signed-off-by: Chris Lalancette <[email protected]>

* Use scopes to shutdown spinning.

Signed-off-by: Chris Lalancette <[email protected]>

* Nested contexts just to explicitly cleanup the async spinners.

Signed-off-by: Chris Lalancette <[email protected]>

* Update rosbag2_transport/test/rosbag2_transport/record_integration_fixture.hpp

Co-authored-by: Michael Orlov <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>

* Apply the same fix to rosbag2_tests.

Signed-off-by: Chris Lalancette <[email protected]>

---------

Signed-off-by: Chris Lalancette <[email protected]>
Co-authored-by: Michael Orlov <[email protected]>
@MichaelOrlov MichaelOrlov force-pushed the mergify/bp/jazzy/pr-1796 branch from b90c88d to 60f1042 Compare September 18, 2024 20:21
@MichaelOrlov
Copy link
Contributor

Pulls: #1806
Gist: https://gist.githubusercontent.com/MichaelOrlov/ab10b4f5b37005d0a57f1def84281c46/raw/6808c27bfb04669c663be100567fdb7199d995ca/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_test_common rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_test_common rosbag2_tests rosbag2_transport
ROS Distro: jazzy
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14580

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@MichaelOrlov
Copy link
Contributor

The redhat build finished with warnings no CONNEXTDDS_DIR nor NDDSHOME specified. However, this is unrelated to this PR. Merging then.

@MichaelOrlov MichaelOrlov merged commit 8b8b81d into jazzy Sep 18, 2024
12 checks passed
@mergify mergify bot deleted the mergify/bp/jazzy/pr-1796 branch September 18, 2024 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants