-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MacRuby performance when parsing file #37
Comments
There have been some issues with MacRuby before, so I find it encouraging that the parsing seems to work, albeit at this slow speed. I've been using two scripts for benchmarking and profiling that I'll add to the repository in a minute. Meanwhile, can you check whether or not the following gems are compatible with you version of MacRuby:
Also, do you have the |
Alright, the benchmark and profiling are already in the repository. Unfortunately, FIrst, see whether or not you can get benchmark.rb to run on your system. You may have to install Anyway, there are two options n and k currently set to 1001 and 20. This means that the script will execute with 20 entries, then 40, then 60 and so on up to 1001. This typically gives you a nice plot. Since your test with 67 entries already took half a minute, you will probably want to use a smaller number for n first (for example, start with 100). It will still take a few minutes, but do try to run it and let me know the results. |
Here is the output when I run the benchmark with MRI 1.9.2 on my system:
And here you can see the plotted output. As you can see both functions are (roughly) linear (that's good); but the function g executes considerably slower as the input size increases. This means that for large numbers of N, it takes longer to parse a bibliography containing N entries than to parse N individual entries. This is because the Bibliography class analyses stuff like resolving cross-references etc. Anyway, looking at the graph shows us that this should be optimized going forward. ;-) |
I just tried doing this. First, I had to modify lib/bibtex.rb to add require 'rubygems', otherwise it chokes on require 'multi_json' (I guess macruby doesn't have automatic rubygems support. Afterwards, it runs fine. However, the "problem" is that the output for MRI (1.9.2p180) and MacRuby (0.12, Ruby 1.9.2) are extremely similar. Here is the output for MRI:
and here is the output for MacRuby
I reran my testscript, thinking that perhaps there had been an update to MacRuby, bibtex-ruby, or one of the other libraries used, which caused a speedup, but the time difference between parsing a bibliography of 65 publications is still more than 150x. |
Stian, thanks for getting back to this! It is generally frowned upon in the Ruby community to force require rubygems in a library. If you don't want to edit the file, you can just run your script with Regarding the results I can see that a) you have a much faster computer than I ;-) and b) you're, right that the results are extremely similar. This means that the parser is not significantly slower in MacRuby and therefore we're losing performance somewhere else. Just to make sure, in your test script above you're printing the bibliography (the Since we cannot use ruby-prof we'll have to adapt the benchmark script to our needs. |
The rubygems thing is no biggie. I saw somewhere that macruby removed it to I removed the print line entirely, and still [folders2web] time ruby speedtest.rb I have a 2.7 GhZ i7 MacBook Pro 13" :) Stian |
More numbers: Only parsing bibtex-file, not printing Only parsing bibtex-file with 3 entries Only requiring rubygems and bibtex-ruby, not doing anything else Only printing "Hi", with no libraries required I did the last to check that there wasn't a longer startup time required for MacRuby, which it wasn't. It does take significantly longer to load bibtex-ruby, but even then, the time it spends on even just 3 entries is significantly longer. |
Please try the following: In benchmark.rb replace the entry with the contents of your bibtex file (the 3 entries). That is:
Next, change n, k to:
And rerun the test, make a note of the Results. Then, change the two lines inside the f/g tests from |
Did as you asked. Output before I changed the parse lines: [test] macruby ./benchmark.rb And here it is with the two lines changed
|
Interesting. Could you post the entries you're using as input? Are there by any chance cross references? |
Plus, as a reference, could you run the tests above with MRI too? |
input = <<-END %% Created for Stian Haklev at 2011-09-20 00:43:44 -0400 %% Saved with string encoding Unicode (UTF-8) @inproceedings{vladoiu2011open, @inproceedings{valjataga2011open, @Unpublished{resendes2011comprehensive, END [test] ruby -KU benchmark.rb 11:40:00 [test] ruby -KU benchmark.rb 11:39:53 |
Hmm, I think we're getting closer. The difference seems to be triggered by something in your entries. Could you try: stripping the comments and the file attachment fields (and if that does not make a difference) other fields until the execution time comes down? I suspect the problem has to do with parsing the file attachment entries which are quite long. If you could confirm this, I'll come up with a benchmark file that will help us pinpoint what's causing the problem in parser or lexer. |
New input: input = <<-END %% Created for Stian Haklev at 2011-09-20 00:43:44 -0400 %% Saved with string encoding Unicode (UTF-8) @inproceedings{vladoiu2011open, @inproceedings{valjataga2011open, @Unpublished{resendes2011comprehensive, END new result [test] macruby benchmark.rb (with Parser.new.parse) |
Another attempt input: %% Created for Stian Haklev at 2011-09-20 00:43:44 -0400 %% Saved with string encoding Unicode (UTF-8) @inproceedings{vladoiu2011open, @inproceedings{valjataga2011open, @Unpublished{resendes2011comprehensive, END result |
Can you keep trying removing fields from your entries (also you may want to try and remove the comments at the beginning of the file) and rerunning the benchmark? The thing is, when you ran my original test with this entry there was hardly any difference in speed between MRI and MacRuby. Using your entries there is; so there has got to be a difference that is triggered by some detail in your entries and we need to try and locate that detail. This is normally quite easy with ruby-prof because it gives you a summary of the time spent in individual methods, but for now our best bet is to consecutively remove data from your entries until there is a big jump in execution time. My test entry was a one book with author, address, date-added, date-modified, keywords, publisher, series, title and year fields. Now the odd thing is that your entries don't seem to contain anything peculiar. Perhaps you start by looking at each entry individually; use my test entry as a reference: each of your entries, when executed individually, should not take considerably longer than mine. If any of them do take longer, we can begin by removing individual fields to see which fields are causing the performance penalty. |
I am sorry, I must have made a mistake with the original run of benchmark, which might have set us back a bit. Rerunning it with the original settings in benchmark (as supplied with the gem) shows a huge difference between MRI and MacRuby. Not sure how I managed to get the numbers previously. First baseline load numbers for a file which only contains "require bibtex" MRI: 0.11 (and for fun, require 'citeproc') MRI: 0.33 Then the benchmarks (using the one Pickaxe publication) MRI MacRuby: |
Ha, so whilst this sidetracked us a little bit, at least it makes a lot more sense (would have been weird because there was nothing peculiar about your entries really). Anyway, so next things to try are:
|
Sorry for the sidetracking! :) With name parser
With lexer only:
|
(I take it you meant to say, without name parsing above, right?) Right, we're getting closer now. Judging on the basis of these results it seems that the lexical analysis is responsible for slow performance on MacRuby. In a way, that's great, because the lexer needs to rewritten anyway (see #26). I'm not sure I'll be able to write the single pass lexer in the next few days, but I'll try to take a quick stab at it and also come up with a better benchmarking script to allow us to see what part of the lexical analysis consumes so much time. |
I've never used Instruments, and there is no documentation available. I posted on StackOverflow asking for help. |
Hi, Come here from the MacRuby mail list. To profile MacRuby code, use DTrace, http://www.macruby.org/blog/2009/11/17/macruby05b2.html shows an example. MacRuby source tree is needed, and DTrace scripts can be found in Open two terminal windows, in the first one type, The output is long, but "Regexp#initialize" still can be singled out very easily. It seems that MRI "caches" compiled regexp, while MacRuby doesn't and re-creates every regexp literal constants again and again. Try timing following script, the difference is also huge. 1_000_000.times do
"123" =~ /^[0-9]+$/
end So caching all regexp literals to ivars might be a viable solution to #37. |
Awesome @dche, thanks a lot! I suggest we try moving the regular expressions to a class instance variable; this should give us a minimal overhead on MRI but simulate caching for MacRuby. |
@houshuang could you run your benchmarks against master and give us the numbers? |
Doesn't seem like it really helped :( ruby benchmark.rb 21:33:33 ☁ master ☀ [test] macruby benchmark.rb 21:34:36 ☁ master ☂ ⚡ |
I did a sample of benchmark running MacRuby and uploaded it here, not sure if you can make anything out of it. http://dl.dropbox.com/u/1341682/out.zip I also did the two other DTrace functions collected_objects and function_count in this file: http://dl.dropbox.com/u/1341682/more_dtrace.zip This is just over an arbitrary period of time running Benchmark. If any of this is useful and you want me to run it again over a specific time, with specific settings etc, let me know. These are the three DTrace scripts that are included with MacRuby. Thanks a lot for your input by the way, @dche! |
According to your results Regexp#initialize is most definitely the culprit. So why didn't this help?
|
I confirmed that I'm on rev 59ccee9, and that the @patterns are there. I changed to benchmark only the lexer. Again widely different results: [bibtex-ruby] ruby test/benchmark.rb 7:26:42 ☁ master ☂ ⚡ [bibtex-ruby] macruby test/benchmark.rb 7:27:01 ☁ master ☂ ⚡ Just to make sure this wasn't an artefact of the benchmarking script, I wrote a simple script that requires (the modified) bibtex, parses a file with about 60 publications, and prints the first one. MRI: 0.21, MacRuby: 24.16 |
If I read these results correctly (compared to the lexer-only ones you posted originally) it is now even slower than before. Can you confirm that?(Just MacRuby lexer – release / master). Weird! |
True, weird. These two are identical, except in the first I required 'bibtex', and in the second 'require File.expand_path('../../lib/bibtex.rb', FILE)' [test] macruby benchmark.rb 8:14:19 ☁ master ☂ ⚡ ✭ |
Can you try this again but having uninstalled the gem first? Alternatively, you can also print out (before the benchmark starts) the value of:
These should exist on master. If they don't we're somehow pulling in an old version. On Mar 5, 2012, at 2:16 PM, Stian Håklev wrote:
|
Man, I'm a horrible software tester :) I went into bibtex.rb and realized that the commands require 'bibtex/lexer' etc probably worked through RubyGems, even though I called the local bibtex.rb. I uninstalled the gem, then I got all kinds of errors from path mistakes, which I tried to fix. There were also several of the requires that didn't work, and I didn't find the matching files in lib/bibtex, parser and name_parser. Slightly rewritten to get around this I get release:
master
|
On Mar 5, 2012, at 3:40 PM, Stian Håklev wrote:
How did you get around this? I'm sorry, I forgot to mention this earlier, but on master, you will have to generate the parsers (that's why parser and name_parser were missing). You can generate the parsers like this:
After this, there should be the parser.rb and name_parser.rb files in the lib directory. |
This is really useful! Well I'm learning more and more about BibTex-ruby, but it doesn't seem to be getting us anywhere... Here's the latest master, with gem uninstalled, and parser generated. This is using parser:
and this using lexer
|
I just installed a macruby-nightly to finally get to the bottom of this, but get an error in the racc gem about a missing symbol (_rb_catch); have you run into this before? |
No, but I guess I ran racc using MRI Ruby. You only run it once to generate Thanks for all the time you spend on this! Stian |
It's true that we don't necessarily have to generate the parser with MacRuby (in fact, the gem ships with the generated parser). Problem is, I get the error when executing the parser. What MacRuby version are you using precisely? |
0.12 - I compiled it from nightly about a week ago. |
I finally had time for more detailed profiling and I believe that the problem lies with MacRuby's StringScanner implementation: they create a new regular expression object every time #scan is called. That's why our explicit pattern cache does not make big difference at all. |
Closing since we don't support macruby anymore |
There seems to be a huge difference in speed between MacRuby (v. 0.12) and Ruby (1.9.2). In this case, it is simply parsing a file with 67 publications (a small subset of my large publication file, used for testing). Ruby parses the file in 0.16 seconds, and MacRuby uses almost 25 seconds (100 times slower).
I'd be happy to help you debug this if you let me know what to do. I am thinking about switching to MacRuby in my project, so I am quite interested in improving this.
script:
The text was updated successfully, but these errors were encountered: