Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Load, Save and other methods of OdfPackage performance #299

Open
Pierre-Lemaigre opened this issue Jun 5, 2024 · 3 comments
Open

Comments

@Pierre-Lemaigre
Copy link

Hi,

This issue might be related in some way to the #10 issue.

I'm currently working on a project heavily depending on your library. At the moment, I'm trying to speed up the process and I've noticed with the help of a profiler, that some odfpackage methods are quite slow (not horribly but somewhat a bit). All theses methods are related to the ZipArchiveOutputStream library. I read the code, and at some places, I think i could modify those methods, and uses the java NIO API more, in order to gain some performance.

An example, would be, on the org.odftoolkit.odfdom.pkg.OdfPackage#getBytes method where, instead of using it, but using the Filesystems API (part of NIO) on the Odt document saved on the disk, it would be ten times faster (depending on the ZipEntries size).

Also, all the performance would only benefit files on the disk, and not the one from inputStream, so their would be some branching to do, in order to only use NIO API with Odt backed up by the disk.

What do you think about that? I can help if you want, also I can help with the performances tests, as I have written some on my project.

Regards,
Pierre Lemaigre

@mistmist
Copy link
Contributor

mistmist commented Jun 7, 2024

hi Pierre,
i suspect a lot of this IO code was written around 2009 when it had to run on versions of Java that are long obsolete.
if you can measurably improve the performance there, please send PRs.
we could also upgrade the Apache zip library to a newer version, but we don't have any involvement with that project, so if you'd need to do changes there it could cause some delays...

@svanteschubert
Copy link
Contributor

@Pierre-Lemaigre I second what Michael has written, we are grateful for any help!
In addition, performance tests are especially most welcome. I always struggled with those, it worked if the before/after was tested on the same machine, but this did not work with automation... like GitHub regression tests with every commit...
Thanks in advance, Pierre!
Svante

@Pierre-Lemaigre
Copy link
Author

Hi,
I will work on this as soon I have some times on my end.
Thanks for the response
Pierre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants