Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading the XTCE file slower than parsing #112

Open
greglucas opened this issue Nov 10, 2024 · 0 comments · May be fixed by #113
Open

Reading the XTCE file slower than parsing #112

greglucas opened this issue Nov 10, 2024 · 0 comments · May be fixed by #113

Comments

@greglucas
Copy link
Collaborator

Context

The initial XML reading to create objects can be slower than the actual packet parsing when there are a lot of parameters and items in the file. This is relevant for the CTIM file which has lots of parameters, a quick example script for testing:

from pathlib import Path
import time

from space_packet_parser import definitions

xtce_document = Path("tests/test_data/ctim/ctim_xtce_v1.xml")

start = time.time()
packet_definition = definitions.XtcePacketDefinition(xtce_document)
total_time = time.time() - start
print(f"It took {total_time:.2f} seconds to parse the XTCE document")

print("Number of containers:", len(packet_definition._sequence_container_cache))
print("Number of parameters:", len(packet_definition._parameter_cache))
print("Number of parameter types:", len(packet_definition._parameter_type_cache))

on main, it currently takes ~12.8s

It took 12.79 seconds to parse the XTCE document
Number of containers: 39
Number of parameters: 9491
Number of parameter types: 15

Profiling this, the slowdown in this case is in the _find_parameter() method. In there, an XML findall() call is issued for every parameter.

Implementation Plan

The findall() can be called just once to update the cache with all of the items rather than for each of the parameters iteratively as we go through the list. I propose to add a new _populate_parameter_cache() method that will get called during initialization, and then future lookups happen based on the cache rather than the _find_parameter() method.

@greglucas greglucas linked a pull request Nov 10, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant