Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvements #26

Open
vezaynk opened this issue Aug 29, 2017 · 4 comments
Open

Performance Improvements #26

vezaynk opened this issue Aug 29, 2017 · 4 comments

Comments

@vezaynk
Copy link
Owner

vezaynk commented Aug 29, 2017

Performance is below what is desired but it is hard to do anything beyond a certain point.

This issue should never be closed and will track whatever steps are taken towards a more optimised solution along with benchmarks.

@vezaynk
Copy link
Owner Author

vezaynk commented Aug 29, 2017

I have marked this issue as "help wanted" as I can always use a hand as well as "hacktoberfest" to take advantage of free labour.

@vezaynk
Copy link
Owner Author

vezaynk commented Jan 22, 2018

The last two patches (#67 and #68) resulted in an over 30% speed improvement on my laptop. It is less noticeable on more powerful machines.

@eugenzaharia
Copy link

eugenzaharia commented May 30, 2020

You could try multi_curl() extension instead of curl() because it has parallel requests and that would save some response time. Also don't forget having a SimpleXMLElement or DOMDocument object, build the XML then write to the file instead of having multiple IO operations.

@vezaynk
Copy link
Owner Author

vezaynk commented May 31, 2020

@eugenzaharia multi_curl is very awkward to work with. I mean PHP is awkward in general but that's a separate story.

I'd be cautious of using SimpleXMLElement/whatever else, as it can potentially introduce dependencies and break things for current users. The primary design goal of this script is for it to be able to run reliably in almost any php environment, including the weird ones.

Streaming to the file system instead of buffering the file is a result of a constraint, some websites turned out to have a lot of links and would run under low-memory conditions. Basically it runs out of memory less like this. Not really better or worse, design-wise.

I'd rather rewrite the entire thing into a dynamic PHP extension if we're rewriting things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants