Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rosbag conversion to the parquet format #1401

Closed
xecarlox94 opened this issue Jun 16, 2023 · 3 comments
Closed

Rosbag conversion to the parquet format #1401

xecarlox94 opened this issue Jun 16, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@xecarlox94
Copy link

xecarlox94 commented Jun 16, 2023

Description

I would like to have a function to convert a rosbag file into a parquet one. I want to load some rosbag datasets into pandas to use the existing data science ecosystem. I have tried to convert these datasets into csv files but I faced some issues decoding image data from some of the messages.

I think we could improve that by creating an extra conversion option, in the rosbag2 CLI, to convert and map the messages to the parquet format without loosing the encoding information, as it happened to me when I converted the rosbag messages to csv.

I believe that this feature could unlock a lot of value and potential for data engineers/scientists because it would be more convenient to work with ros datasets.

Related Issues

Not directly.

I should note that there is a similar issue but I have seen some interest in improving data processing using the parquet format, on ros2/ros2_tracing:
issue

Completion Criteria

No completion criteria at the moment.

Implementation Notes / Suggestions

I have no suggestion because I am a new user to ROS but I would be available to implement this feature.

Testing Notes / Suggestions

No suggestions at the moment.

@xecarlox94 xecarlox94 added the enhancement New feature or request label Jun 16, 2023
@emersonknapp
Copy link
Collaborator

emersonknapp commented Jun 16, 2023

My initial instinct for this would be to create a new StoragePlugin for parquet format. You wouldn't need to create any core rosbag2 code, the plugin could be independent. Then, you could use ros2 bag convert to convert any existing bag to the new storage format. I don't think you'd run into any technical blockers, it should be totally doable as a standalone plugin library. See https://github.com/ros2/rosbag2/blob/rolling/docs/storage_plugin_development.md for description on plugin development.

These might be a little bit out of date, they were written shortly after the Galactic release, but https://github.com/ros-tooling/rosbag2_sample_plugins provides some example plugins. They're probably mostly correct still though.

Note: while ReadWriteInterface is the way to implement a writer - if you don't want to support reading/playback, you could just throw exception or something from the reader interface, leave it empty as "This is a write-only plugin"

@xecarlox94
Copy link
Author

That is great! I had the same approach in mind.

Thank you for the resources and for the advice. I may comment on this post if I have any issues implementing this plugin.

Sure, I will implement the writer the way you advised me

Many thanks!

@emersonknapp
Copy link
Collaborator

It looks like this work can be tracked in a separate repository with a new parquet storage plugin. In that case, I'll close this issue as not work to be done in this repository. Feel free to comment on it, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants