Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Jazzy / Rolling Transient-Local Publishers with IPC do not deliver messages to Inter-process subscribers #2704

Open
SteveMacenski opened this issue Dec 12, 2024 · 2 comments

Comments

@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Dec 12, 2024

Bug report

Required Info:

  • Operating System: Ubuntu 22.04, running Rolling docker images based on 24.04
  • Installation type: OSRF Docker
  • Version or commit hash:
  • DDS implementation: Fast-DDS
  • Client library (if applicable): rclcpp

Expected behavior

Late-joining subscriptions when Intra-Process Communications (IPC) is enabled in the NodeOptions are delivered, regardless of the location.

Actual behavior

When the subscriptions are outside of the process containing the IPC publisher of transient-local topics, the message is never delivered after the initial publication.

When the subscription is with in the process containing the IPC publisher of transient-local topics, the message is properly delivered. I believe that shows that IPC transient-local PR is missing an important case (#2303) of when the subscription is in another process and needs to be put over the network.

For example, I have a map publisher in the Nav2 map_server:

  occ_pub_ = create_publisher<nav_msgs::msg::OccupancyGrid>(
    topic_name,
    rclcpp::QoS(rclcpp::KeepLast(1)).transient_local().reliable());

That is composed into the same process as the rest of Nav2, localization, etc. If I inject a subscription to that information in some node that is running periodically, I see the log that a new map is received reliably. So, late-joining subscriptions within the IPC process are working (when IPC is enabled for that node as well).

  rclcpp::QoS map_qos(10);  // initialize to default
  map_qos.transient_local();
  map_qos.reliable();
  map_qos.keep_last(1);
  auto node = node_.lock();
  auto map_sub = node->create_subscription<nav_msgs::msg::OccupancyGrid>(
    map_topic_, map_qos, [this](nav_msgs::msg::OccupancyGrid::SharedPtr ) {
                // Lambda function to handle the message
                RCLCPP_INFO(logger_, "Received new map");
            });
  rclcpp::Rate r(1);
  r.sleep();

However, when I move the map_server into a new component container, this stops working immediately. Further, ROS 2 CLI and Rviz2 are unable to obtain the topic as well. The only exception to this is when the CLI, Rviz, or external process node is running before the transient-local publisher publishes a message, thereby getting it at publication time. After that point however, it is unobtainable.

Steps to reproduce issue

Create a transient local publisher / subscriber demo in a container with IPC enabled; it works. Move one into another container in another process, it fails to work.

Additional information

See nav2 ticket we're working on the IPC migration ros-navigation/navigation2#4691 and the rclcpp PR implementing transient-local IPC #2303

@jefferyyjhsu
Copy link
Contributor

Hi @SteveMacenski,

Thanks for sharing your findings. I think I have found the problem and put up my fix in PR #2708 for review.
Please feel free to give it a try and let me know if there are any problems.

Thanks!

@SteveMacenski
Copy link
Collaborator Author

Thanks for the ultra-fast fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants