-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph API for interacting with ros2_tracing events #1
Conversation
Signed-off-by: Michael Carroll <[email protected]>
I'm guessing you're going to bring |
While it is up to you, I think the better long term plan would be to add this as a "peer" API in tracetools analysis. It seems like a useful way of looking at the same data in a way that is associated with the relevant ROS computation graph pieces.
Not yet, any specific benchmark you are interested in? |
Signed-off-by: Michael Carroll <[email protected]>
I'm mostly referring to the utilities for actually processing the trace, event by event (i.e., I guess you don't really need this if you process everything at once, though. I was just wondering.
Nothing specific. Just wondering how performance would be affected for typical use-cases if we were to upgrade |
We discussed performance a little bit offline, the initial conclusion is that Here is a notebook that covers some of potential ways that this is used with performance measurements: https://gist.github.com/mjcarroll/34e7f06d761c8c6ce2cce36027900b34 |
So, I haven't really looked at the new API in depth. But one of the things I noticed is that you remove the "procname" field from the events. This is an example of things where, for your analysis, you can probably ignore, but for what I'm doing, I can't. And that is what I see a lot: Most people have some specific analysis in mind, and the code they design is tailored to that, and then it isn't re-usable. Another example of this is is that a lot of tracetools analysis code expects certain kernel events, but in our use case, we'd like to avoid the need for users to have all the permissions set up, and so our traces don't contain these events. I've been thinking a bit on how to avoid this or whether this is actually possible to avoid. One thing I noticed is that a particular problem is that ctf is not queryable, so the first part of each processing pipeline converts it into something which is queryable, usually either an internal data-structure, or a database. And then the second part of the pipeline puts some convenient (where "convenient" is very use-case specific) API on what ever the first part produced. Therefore, I've been wondering whether it would be useful for us an base API which makes very few, if any, assumptions beyond that it knows certain tracepoints (processing only those which are actually present) and they have certain fixed elements and certain variable elements (the context), but which is queryable and iterable efficiently. This can be implemented both for in-memory and for on-disk storage. Since our data is a time-series, I would suggest to make the base API time-series oriented. We can then put things like graph APIs on top. What do you guys think? |
Removing procname is a hold-over from me doing initial analysis with entirely composed nodes, so it wasn't particularly interesting at the time. I will add that back in.
I think this actually makes a lot of sense. To begin with here, I started from the API that @christophebedard had set up in tracetools_read, but ended up dropping it in the short term to try using Maybe it would make sense to have a separate meeting to discuss what this DataModel should look like? I imagine that it would look a lot like what is already implemented in tracetools_analysis, and I think I could likely rewrite the processing here to take advantage of that underlying structure. |
I would think that something like Apache Arrow may be a reasonable approach to storing in-memory and on-disk. It has pretty substantial compatibility across languages if we wanted to do the |
I agree. This is pretty similar to the way Trace Compass processes traces for performance/scaling purposes. Typically, "analyses" handle trace events one by one, and write some kind of higher-level result to what is basically a time series database. When the user zooms into a particular section of the trace, the corresponding range in the time series database is loaded from disk into RAM, transformed into UI elements, and displayed. That means that, while it is kind of limiting sometimes, if you follow this principle, you never need to load the whole trace (or the complete higher-level representation data) into RAM. Furthermore, if an analysis B depends on the result of an analysis A, it can simply process a trace event after analysis A has processed it. Then analysis B can query analysis A's time series database as it's getting built, but slightly delayed. This way, you only need to process trace events once. I tried to copy some of it for |
Count me in :-)
I've been looking at bit at parquet, which is part of Apache Arrow, to store data -- primarily because it is also supported by dask (the big data variant of pandas). When looking at the API, I noticed that Arrow also has some query functionality, but I couldn't find something similar to pandas merge, which at least I am using heavily to merge the metadata (what the functions are called, primarily) with the other trace data.
Is there interest in languages other than Python? Maybe for performance? If there are significant performance advantages, it might be worth writing the converter from ctf to parquet in C++. |
I think that it could make sense to do the converter in a non-interpreted language. I would probably want some evidence to prove that it would be worth the effort before starting, though. |
I want to continue iterating in this repo, so I'm going to branch this conversation over to here: ros2/ros2_tracing#35 |
This introduces an alternative mechanism for interacting with trace data introduced in [https://github.com/ros-tracing/tracetools_analysis].
The primary data structure is a graph that represents the ROS 2 computation graph. From this entrypoint, users can introspect on the runtime layout of a traced ROS 2 system as recorded via tracetools.
In addition to the individual events, the data is associated across elements via:
This allows users to inspect causal "chains" of events across a ROS 2 computation graph.
As an example, in the Mont Blanc test topology, we can trace the sequence of events to cause the
arequipa
subscription to be fired: