Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several system test failures with the latest dev/1.0.0 #19

Closed
evshary opened this issue Sep 12, 2024 · 10 comments
Closed

Several system test failures with the latest dev/1.0.0 #19

evshary opened this issue Sep 12, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@evshary
Copy link

evshary commented Sep 12, 2024

For examples, there are extra failure in rcl

69 - test_graph__rmw_zenoh_cpp (Failed)
70 - test_info_by_topic__rmw_zenoh_cpp (Failed)
74 - test_node__rmw_zenoh_cpp (Failed)
78 - test_publisher__rmw_zenoh_cpp (Failed)

The same situation happens on rclcpp, rcl_action, etc.

@imstevenpmwork
Copy link

imstevenpmwork commented Sep 13, 2024

I can confirm this happens even with zenoh-c 1.0.0.8 in #20 in raw-metal linux
The biggest regression seems to be in:

  • rcl
  • rcl_action
  • rclcpp
  • rclcpp_action
  • test_communication

I can provide the logs so we don't have to run them again. I believe we should solve these regressions before the PR in ros2 repository is reviewed.

One of the issues seems be related to reliability no longer available on the subscriber side (and potentially the test expecting so).

@imstevenpmwork imstevenpmwork added the bug Something isn't working label Sep 13, 2024
@YuanYuYuan
Copy link
Collaborator

The observation on my laptop with the latest dev/1.0.0 fce8a62.

RCL Failures

  9 - test_context__rmw_cyclonedds_cpp (Failed)
 15 - test_init__rmw_cyclonedds_cpp (Failed)
 16 - test_node__rmw_cyclonedds_cpp (Failed)
 19 - test_guard_condition__rmw_cyclonedds_cpp (Failed)

 38 - test_context__rmw_fastrtps_cpp (Failed)
 44 - test_init__rmw_fastrtps_cpp (Failed)
 45 - test_node__rmw_fastrtps_cpp (Failed)
 48 - test_guard_condition__rmw_fastrtps_cpp (Failed)

 67 - test_context__rmw_zenoh_cpp (Failed)
 73 - test_init__rmw_zenoh_cpp (Failed)
 74 - test_node__rmw_zenoh_cpp (Failed)
 77 - test_guard_condition__rmw_zenoh_cpp (Failed)

 69 - test_graph__rmw_zenoh_cpp (Failed)
[  PASSED  ] 18 tests.
[  FAILED  ] 4 tests, listed below:
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_subscriptions
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_publishers
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_services
[  FAILED  ] NodeGraphMultiNodeFixture.test_node_info_clients


 82 - test_events__rmw_zenoh_cpp (Failed)         # segmentation fault
 95 - test_time (Failed)                          # The test did not generate a result file

@imstevenpmwork
Copy link

imstevenpmwork commented Sep 16, 2024

I just run it again with fce8a62 (which uses zenoh-c 1.0.0.8) in my bare-metal linux and I got:
(Marked with an *️⃣ the tests that are failing now that are not failing in previous dev/1.0.0 or 0.11.0 rmw_zenoh)
RCL:

  • 69 *️⃣ (but flaky)
  • 71
  • 82

RCL_ACTION:

  • 1 *️⃣
  • 8 *️⃣
  • 9 *️⃣
  • 10 *️⃣
  • 16 *️⃣

RCLCPP:

  • 18 *️⃣
  • 31 *️⃣
  • 35
  • 43
  • 46
  • 55
  • 58
  • 61
  • 62
  • 64
  • 69
  • 72 *️⃣
  • 75
  • 88
  • 89
  • 99 *️⃣
  • 108
  • 111
  • 113
  • 114
  • 116
  • 128

RCLCPP_ACTION:

  • 1
  • 2 *️⃣

RCLCPP_COMPONENTS:

  • 1
  • 2

RCLCPP_LIFECYCLE:

  • 6
  • 8 *️⃣
  • 11
  • 12
  • 13

TEST_COMMUNICATION:

  • 75 *️⃣
  • 78 *️⃣
  • 81 *️⃣
  • 82 *️⃣
  • 84 *️⃣

TEST_QUALITY_OF_SERVICE:

  • 9
  • 10 *️⃣
  • 11
  • 12

TEST_RCLCPP:

  • 18
  • 70 *️⃣
  • 71 *️⃣
  • 74 *️⃣

Which makes a total of: 20 failing tests that were not failing before. (Mind that some of them might or have always been flaky). I believe the biggest regression is in RCL_ACTION in which we went from 0 to 5 and in TEST_COMMUNICATION in which we went from 0 to 5 (mind that this one have always been very flaky).

It is also interesting to notice that, TEST_QUALITY_OF_SERVICE has always been very consisten on failing only 3 tests (9,11,12) (this is well known because they are not implemented), but now we have also 10 failing which has never failed before

The logs file:
test_2024-09-16_13-07-53.zip
Logs file for test_communication panics

Important detail too:
Intrinsic has recently pushed a change into the rclcpp test suite. To be confirmed is this has any impact at all

@imstevenpmwork
Copy link

imstevenpmwork commented Sep 16, 2024

Just tested with 60b72f07a7d7254aeed8c39c11174bb52806ee62 and it shows an expected output. That is, 0 failing tests in rcl_action and 0 in test_communication.
There have only been a handful of commits in zenoh_rmw dev/1.0.0 not related to format/style since then. Leading us to think that the regression might indeed come from either zenoh-c or zenoh.
Next step is to test with 4f369c5b1cf3a9ce69d2bd4e3bd53846dc2ed2fb (one of the times the zenoh-c version was changed) to see if we have the regression by then

@imstevenpmwork
Copy link

Just tested with 86940a807987ab2c0f34a92f17b774a24892c912 (which uses ecad7f3 zenoh-c) and the regression is already there.

@imstevenpmwork
Copy link

From yesterday:

rmw_zenoh: ed51a81a6cd8f211ce9f843620aa2275f8489ca3 with zenoh-c: c3d60b7642d123137345f4aebce738f5210027d1 (August 30th) has also the regressions.
I suspect the commit 79b446883f46fa01ecc9cb4ffdb44bb69b1fb4c6 is the one to blame, TBC.

In addition to this, the recent log changes didn't help with getting information about the function panic

@evshary
Copy link
Author

evshary commented Sep 20, 2024

I tried with the branch https://github.com/ZettaScaleLabs/rmw_zenoh/tree/dev/1.0.0-cy-tmp,
which is diverted from b5cdd73 before adjusting coding style and merge the patch from upstream.

RCL:
69 *️⃣ (but flaky)
71
82

RCL_ACTION:
10 *️⃣

RCLCPP: (Note that some item number is changed... from 109 to 117)
18 *️⃣
35
43
46
55
58
61
62
64
69
75
88
89
99 *️⃣
109
112
114
115
117

RCLCPP_COMPONENTS:
1
2

RCLCPP_LIFECYCLE:
6
8 *️⃣
11
12
13

RMW_ZENOH: (Coding style issue, expected)
2 *️⃣
4 *️⃣

TEST_COMMUNICATION:
75 *️⃣
78 *️⃣
81 *️⃣
82 *️⃣
84 *️⃣

TEST_QUALITY_OF_SERVICE:
9
11
12

TEST_RCLCPP:
70 *️⃣
71 *️⃣
74 *️⃣

TL;DR: I believe TEST_COMMUNICATION and TEST_RCLCPP are introduced by upgrading zenoh-c. We can start from them first. RCL_ACTION, RCLCPP and RCLCPP_LIFECYCLE can be the next goal.

More information:

For TEST_COMMUNICATION, it started to panic after 13/9 commit of zenoh-c. But some of them failed at 11/9 and 12/9 commit.

test_2024-09-20_10-06-29(9-20 6357f723eb18788978b9f3475ab76b5a7cbdc80b).tar.gz

@imstevenpmwork
Copy link

imstevenpmwork commented Sep 24, 2024

23-09-2024 rmw_zenoh system tests result

RCL:

  • test_graph__rmw_zenoh_cpp
  • test_service__rmw_zenoh_cpp
  • test_service_event_publisher__rmw_zenoh_cpp
  • test_get_type_description_service__rmw_zenoh_cpp

RCL_ACTION:

  • test_action_communication__rmw_zenoh_cpp
  • test_action_interaction__rmw_zenoh_cpp
  • test_graph__rmw_zenoh_cpp

RCLCPP:

  • test_allocator_memory_strategy
  • test_generic_service
  • test_client_common
  • test_intra_process_manager
  • test_node_interfaces__get_node_interfaces
  • test_node_interfaces__node_graph
  • test_node_global_args
  • test_parameter_client
  • test_parameter_service
  • test_parameter
  • test_parameter_event_handler
  • test_publisher
  • test_qos
  • test_service
  • test_service_introspection
  • test_find_weak_nodes
  • test_externally_defined_services
  • test_time
  • test_time_source
  • test_wait_for_message
  • test_logger_service
  • test_executors_timer_cancel_behavior
  • test_multi_threaded_executor
  • test_events_executor
  • test_guard_condition
  • test_wait_set
  • test_subscription_options
  • test_dynamic_storage
  • test_static_storage
  • test_thread_safe_synchronization
  • test_qos_event__rmw_zenoh_cpp

RCLCPP_ACTION:

  • test_client
  • test_server

RCLCPP_COMPONENTS:

  • test_component_manager
  • test_component_manager_api

RCLCPP_LIFECYCLE:

  • test_lifecycle_service_client
  • test_state_machine_info
  • test_register_custom_callbacks
  • test_callback_exceptions

TEST_CLI_REMAP:

  • test_cli_remapping

TEST_COMMUNICATION:

  • test_publisher_subscriber_cpp__rmw_zenoh_cpp__UnboundedSequences
  • test_publisher_subscriber__rclpy__rmw_zenoh_cpp
  • test_requester_replier__rclpy__rmw_zenoh_cpp
  • test_action_client_server__rclpy__rmw_zenoh_cpp
  • test_requester_replier__rclcpp__rclpy__rmw_zenoh_cpp
  • test_action_client_server__rclcpp__rclpy__rmw_zenoh_cpp
  • test_publisher_subscriber__rclcpp__rmw_zenoh_cpp
  • test_requester_replier__rclcpp__rmw_zenoh_cpp
  • test_action_client_server__rclcpp__rmw_zenoh_cpp

TEST_QOS:

  • test_deadline__rmw_zenoh_cpp
  • test_lifespan__rmw_zenoh_cpp
  • test_liveliness__rmw_zenoh_cpp
  • test_best_available__rmw_zenoh_cpp

TEST_RCLCPP:

  • gtest_executor__rmw_zenoh_cpp
  • gtest_multiple_service_calls__rmw_zenoh_cpp
  • gtest_multithreaded__rmw_zenoh_cpp
  • gtest_local_parameters__rmw_zenoh_cpp
  • test_parameter_server_cpp__rmw_zenoh_cpp
  • test_client_scope_cpp__rmw_zenoh_cpp
  • test_client_scope_consistency_cpp__rmw_zenoh_cpp

A total of 67 failing tests (vs ~50 using dev/1.0.0 in jazzy, vs ~30 using 0.11.0 in jazzy).
Furthermore, another run was done, but this time in a CI containerised environment with the latest commit as of today of all repositories mentioned above and the results were very similar.

@Yadunund
Copy link
Collaborator

Yadunund commented Sep 24, 2024

We recently merged ros2/rclcpp#2633 which should get a lot of tests from the rclcpp repo (including rclcpp_lifecycle and rclcpp_action) to pass. None of the tests should segfault due to the shutdown issue at least. Additionally if you patch ros2/rclcpp#2626, more tests should pass.

The rcl/test_graph issue has been flagged and we should have it resolved shortly ros2/rcl#1189

@diogomatsubara
Copy link

Discussion is centralized in ros2#286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants