-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making pycodestyle faster #754
base: main
Are you sure you want to change the base?
Conversation
`run_check` works by iterating over a list of strings, then dynamically accessing the appropriate checker attributes. Dynamic attribute access is slow, and doing it in a tight loop is really slow. Speed can be improved significantly by passing the checker in as an argument to the check and calling the attributes directly. This requires some argument shuffling.
`init_checker_state` gets called a lot, but why? Performance can be improved by doing it just once.
Checks were being stored along with their "argument names", but only `run_check` and `init_checker_state` were being using them, so they can be cut.
While this is a fantastic improvement in performance, many tools use pycodestyle as a library with these checks as public APIs. That means this is a significant breaking change that will need to be very carefully reviewed and considered. |
Well, it looks like the times in that post are all bullshit. It turns Here are the real numbers: Before
After
Whoops! But hey, that's still 23% faster, right? I don't know, maybe Oh well, I still had fun spending the weekend poking around in the |
Could we get some of the the performance benefit by using a dictionary instead of |
Sorry to take so long to reply, I have been busy! Regarding the API change, I do think that this new API is simpler than Maybe a change like this would be appropriate for a 3.0 release? Then again, I have no idea how much outside code relies on this API. |
@sigmavirus24 I think I tried something like that, but I can't remember the details. What do you have in mind? |
I've been on a Python performance optimization kick recently (see
pylint-dev/astroid#497), and I'm a
pycodestyle
user, so I figured I would give it a look and see if its performance
could be improved at all.
Of course,
pycodestyle
is already pretty fast, so to give it somestress I'm testing it out by pointing it right at a large repo, namely
Zulip (https://github.com/zulip/zulip). In particular, my test command
is
time ~/pycodestyle/pycodestyle.py -qq ~/zulip
.Here are the times from three runs of master:
I used the
yappi
profiling library to see if there were any hotspotsin the code. There were. Take a look at the graph below. The brighter
the box, the hotter the spot. In more detail, each box represents a
function and has three numbers: 1) the percentage of total CPU time
spent in that function, 2) the percentage of total CPU time spent in
that function but not its subcalls, and 3) the number of times the
function was called.
The red box that sticks out is
Checker.run_check
. It is called wellover two million times, and 27.7 of total CPU time is spent there,
almost all over which is in the function itself. This seems like an
awful lot considering how short the function is:
So why does it suck up so much time?
I think I've worked out how it goes. When a check is registered (with
register_check
), its arguments are extracted with theinspect
library and stored as a list of strings. When a check is run,
run_check
iterates over its associated list of arguments,dynamically accesses those attributes of the
Checker
, and thenpasses those values to the check to actually run.
The problem here is that dynamic attribute access is slow, and doing
it in tight loops is really slow (see
pylint-dev/astroid#497 for a harrowing cautionary
tale on this subject). My idea was to see if there was a way to do
away with the dynamic attribute access, basically by "compiling" the
attribute access into the code.
It turns out that this can be accomplished by passing the checker
instance into the check as an argument, and then call the attributes
directly on the checker. Implementing this change involves a
large-scale shuffling of arguments and strings, but other than that
not much changes.
register_check
has to take the check's argumentnames as arguments now, since they are no longer the actual arguments.
run_check
itself can also be done away with, since all it would haveto do would be to call the check with the checker as an argument, and
that can be done inline.
This change resulted in a substantial speedup:
Here is the resulting
yappi
graph:This graph is a lot more colorful than the last one. This means that
the work is spread out more evenly among the various functions and
there isn't one overwhelmingly critical hotspot.
One function that stuck out to me was
Checker.init_checker_state
.After some experimentation, it appeared that despite taking up almost
6% of total CPU time, the function didn't do much. Cutting it provided
a non-negligible speed improvement:
A little further poking around revealed that
run_check
andinit_checker_state
were the only consumers of the "argument names",so I cut those out too. This led to some nice code simplification and
an ever-so-slight speedup:
Here is the
yappi
graph after these changes:The major hotspot is now
tokenize.tokenize
, which is part of thestandard library. This is good, as it suggests that
pycodestyle
isnearing the point of being as fast as it can be. After that, the next
most expensive functions are
check_logical
,generate_tokens
,build_tokens_line
,check_all
,maybe_check_physical
, and_is_eol_token_
.These functions all feel to me like they are doing something
inefficiently, but I don't understand them well enough to say what.
These measurements were all taken running master with