-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harvester is slow #267
Comments
It would be important to know what is slowing down the script. Looking at the code it is likely the loops which slow down the code. Vectorised programming would probably improve this. Avoiding loops and if else statements where possible might improve speed substantially. You probably do not need to loop over each row on the data base as this is not a required sequential operation. It could be done at once. There are functions like Map() list-comprehension etc. See here: |
It would be wiser to understand the cause of the slowdown before proposing random solutions. There are less than 200 rows in the datasets table, looping through them is certainly not the cause. A programme without loops and ifs will not do much; the map function also produces a loop, just of a different kind. Also, keep in mind that Python is an interpreted language. |
It is a discussion not a solution. Profiling would certainly help to identify the causes. It is exactly because of interpreted languages that you need vectorized programming. That is why I suggested to look into it. It is the same in other languages, e.g. R. Anyway, profiling will probably tell you the bottleneck. |
I have never seen the term "vectorised programming" before, you are perhaps referring to array programming, but that is something completely unrelated to this issue. It is also unrelated to the fact that Python is an interpreted language. Profiling could help, but there are certainly easier ways to study this issue. |
Vectorised is a well established term and applies in particular to interpreted languages such as Matlab, R and Python etc. to operate on lists of strings or arrays. As interpreted languages are very slow in looping it is advised to avoid loops when possible by vectorised programming, in particular inner loops. The only problem is that you need to rethink the code implementation when avoiding loops. All above is quite relevant and it would be more productive to first study the literature before flagging terms unrelated and inappropriate. |
I clearly lack the knowledge for this task. |
The harverter.py script is taking 11 minutes to run, this might be an issue long term. Can it be optimised somehow?
The text was updated successfully, but these errors were encountered: