Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacRuby performance when parsing file #37

Closed
houshuang opened this issue Dec 15, 2011 · 40 comments
Closed

MacRuby performance when parsing file #37

houshuang opened this issue Dec 15, 2011 · 40 comments
Labels

Comments

@houshuang
Copy link
Contributor

There seems to be a huge difference in speed between MacRuby (v. 0.12) and Ruby (1.9.2). In this case, it is simply parsing a file with 67 publications (a small subset of my large publication file, used for testing). Ruby parses the file in 0.16 seconds, and MacRuby uses almost 25 seconds (100 times slower).

I'd be happy to help you debug this if you let me know what to do. I am thinking about switching to MacRuby in my project, so I am quite interested in improving this.

[macruby] time macruby bibtextest.rb                                                                                   
#<BibTeX::Bibliography data=[67]> 
macruby bibtextest.rb  24.45s user 0.66s system 119% cpu 20.964 total

[macruby] time ruby bibtextest.rb                                                                                      
#<BibTeX::Bibliography data=[67]>
ruby bibtextest.rb  0.16s user 0.03s system 98% cpu 0.198 total

[macruby] macruby --version                                                                                           
MacRuby 0.12 (ruby 1.9.2) [universal-darwin10.0, x86_64]

[macruby] wc ../folders2web/bibliography-short.bib                                                                      
 725    3358   89338 ../folders2web/bibliography-short.bib

script:

require 'rubygems'
require 'bibtex'

b=BibTeX.open("../folders2web/bibliography-short.bib") 
p b
@inukshuk
Copy link
Owner

There have been some issues with MacRuby before, so I find it encouraging that the parsing seems to work, albeit at this slow speed.

I've been using two scripts for benchmarking and profiling that I'll add to the repository in a minute.

Meanwhile, can you check whether or not the following gems are compatible with you version of MacRuby:

  • ruby-prof
  • gnuplot

Also, do you have the dot executable installed? (check via which dot) If you're using macports or homebrew you should be able to get it by installing the graphviz package.

@inukshuk
Copy link
Owner

Alright, the benchmark and profiling are already in the repository.

Unfortunately, ruby-prof does not seem to be available for MacRuby yet: this will make it more difficult to find out what the problem is, but we'll try to make do with the benchmark script.

FIrst, see whether or not you can get benchmark.rb to run on your system. You may have to install gnuplot (homebrew or macports and the ruby gem), or, alternatively, comment out everything after the line require 'gnuplot'. Basically, what the script does, is to execute and benchmark two functions f and g. The idea is that f works on an individual BibTeX entry repeatedly, while g works on a larger bibliography. We would expect the performance of f and g to be extremely similar (because it should not make a difference whether you parse a thousand entries consecutively or a bibliography containing a thousand entries). Furthermore, both f and g should be linear: to see that it's convenient to look at the gnuplot graphs.

Anyway, there are two options n and k currently set to 1001 and 20. This means that the script will execute with 20 entries, then 40, then 60 and so on up to 1001. This typically gives you a nice plot. Since your test with 67 entries already took half a minute, you will probably want to use a smaller number for n first (for example, start with 100). It will still take a few minutes, but do try to run it and let me know the results.

@inukshuk
Copy link
Owner

Here is the output when I run the benchmark with MRI 1.9.2 on my system:

$ ruby test/benchmark.rb 
                 user     system      total        real
      f(1):  0.000000   0.000000   0.000000 (  0.001007)
      g(1):  0.000000   0.000000   0.000000 (  0.000678)
     f(21):  0.020000   0.000000   0.020000 (  0.020126)
     g(21):  0.020000   0.010000   0.030000 (  0.017425)
     f(41):  0.040000   0.000000   0.040000 (  0.036070)
     g(41):  0.030000   0.000000   0.030000 (  0.034123)
     f(61):  0.050000   0.000000   0.050000 (  0.055657)
     g(61):  0.050000   0.000000   0.050000 (  0.047118)
     f(81):  0.060000   0.000000   0.060000 (  0.057583)
     g(81):  0.050000   0.000000   0.050000 (  0.055533)
    f(101):  0.070000   0.000000   0.070000 (  0.068648)
    g(101):  0.070000   0.000000   0.070000 (  0.074981)
    f(121):  0.080000   0.000000   0.080000 (  0.081537)
    g(121):  0.090000   0.000000   0.090000 (  0.087914)
    f(141):  0.100000   0.000000   0.100000 (  0.105223)
    g(141):  0.100000   0.000000   0.100000 (  0.100114)
    f(161):  0.110000   0.000000   0.110000 (  0.111240)
    g(161):  0.110000   0.000000   0.110000 (  0.111883)
    f(181):  0.140000   0.010000   0.150000 (  0.134396)
    g(181):  0.130000   0.000000   0.130000 (  0.138483)
    f(201):  0.140000   0.000000   0.140000 (  0.137307)
    g(201):  0.160000   0.000000   0.160000 (  0.158308)
    f(221):  0.150000   0.000000   0.150000 (  0.156125)
    g(221):  0.180000   0.000000   0.180000 (  0.175792)
    f(241):  0.170000   0.000000   0.170000 (  0.172716)
    g(241):  0.190000   0.000000   0.190000 (  0.200111)
    f(261):  0.170000   0.000000   0.170000 (  0.180065)
    g(261):  0.220000   0.010000   0.230000 (  0.222491)
    f(281):  0.190000   0.000000   0.190000 (  0.191242)
    g(281):  0.250000   0.000000   0.250000 (  0.244426)
    f(301):  0.200000   0.000000   0.200000 (  0.207241)
    g(301):  0.270000   0.000000   0.270000 (  0.270698)
    f(321):  0.220000   0.000000   0.220000 (  0.223200)
    g(321):  0.290000   0.010000   0.300000 (  0.290713)
    f(341):  0.230000   0.000000   0.230000 (  0.238174)
    g(341):  0.320000   0.000000   0.320000 (  0.324253)
    f(361):  0.250000   0.000000   0.250000 (  0.249256)
    g(361):  0.320000   0.010000   0.330000 (  0.326222)
    f(381):  0.250000   0.000000   0.250000 (  0.275628)
    g(381):  0.340000   0.000000   0.340000 (  0.366691)
    f(401):  0.270000   0.000000   0.270000 (  0.280956)
    g(401):  0.350000   0.010000   0.360000 (  0.358961)
    f(421):  0.290000   0.000000   0.290000 (  0.294395)
    g(421):  0.370000   0.000000   0.370000 (  0.382586)
    f(441):  0.310000   0.000000   0.310000 (  0.310237)
    g(441):  0.410000   0.010000   0.420000 (  0.468336)
    f(461):  0.320000   0.000000   0.320000 (  0.352464)
    g(461):  0.440000   0.000000   0.440000 (  0.459599)
    f(481):  0.320000   0.000000   0.320000 (  0.324716)
    g(481):  0.470000   0.010000   0.480000 (  0.480357)
    f(501):  0.340000   0.000000   0.340000 (  0.366143)
    g(501):  0.500000   0.010000   0.510000 (  0.521129)
    f(521):  0.360000   0.000000   0.360000 (  0.382934)
    g(521):  0.540000   0.010000   0.550000 (  0.585427)
    f(541):  0.380000   0.000000   0.380000 (  0.409161)
    g(541):  0.560000   0.010000   0.570000 (  0.569871)
    f(561):  0.380000   0.000000   0.380000 (  0.392196)
    g(561):  0.610000   0.000000   0.610000 (  0.647884)
    f(581):  0.390000   0.010000   0.400000 (  0.401241)
    g(581):  0.560000   0.000000   0.560000 (  0.566974)
    f(601):  0.420000   0.010000   0.430000 (  0.455462)
    g(601):  0.560000   0.010000   0.570000 (  0.574165)
    f(621):  0.410000   0.000000   0.410000 (  0.419186)
    g(621):  0.590000   0.000000   0.590000 (  0.608821)
    f(641):  0.420000   0.010000   0.430000 (  0.426153)
    g(641):  0.630000   0.000000   0.630000 (  0.635338)
    f(661):  0.440000   0.010000   0.450000 (  0.449898)
    g(661):  0.650000   0.000000   0.650000 (  0.654837)
    f(681):  0.440000   0.000000   0.440000 (  0.446963)
    g(681):  0.680000   0.010000   0.690000 (  0.688274)
    f(701):  0.470000   0.000000   0.470000 (  0.474683)
    g(701):  0.710000   0.010000   0.720000 (  0.719676)
    f(721):  0.480000   0.000000   0.480000 (  0.484895)
    g(721):  0.740000   0.000000   0.740000 (  0.748554)
    f(741):  0.490000   0.010000   0.500000 (  0.491958)
    g(741):  0.790000   0.000000   0.790000 (  0.790580)
    f(761):  0.480000   0.010000   0.490000 (  0.488490)
    g(761):  0.820000   0.000000   0.820000 (  0.826844)
    f(781):  0.510000   0.010000   0.520000 (  0.513018)
    g(781):  0.850000   0.000000   0.850000 (  0.859994)
    f(801):  0.520000   0.010000   0.530000 (  0.534045)
    g(801):  0.890000   0.000000   0.890000 (  0.899764)
    f(821):  0.550000   0.010000   0.560000 (  0.575607)
    g(821):  0.990000   0.010000   1.000000 (  1.046289)
    f(841):  0.560000   0.010000   0.570000 (  0.588986)
    g(841):  0.970000   0.010000   0.980000 (  0.988280)
    f(861):  0.580000   0.000000   0.580000 (  0.585552)
    g(861):  1.010000   0.010000   1.020000 (  1.051506)
    f(881):  0.580000   0.000000   0.580000 (  0.590940)
    g(881):  1.070000   0.010000   1.080000 (  1.084322)
    f(901):  0.610000   0.010000   0.620000 (  0.644608)
    g(901):  1.100000   0.010000   1.110000 (  1.121152)
    f(921):  0.600000   0.000000   0.600000 (  0.604363)
    g(921):  1.140000   0.010000   1.150000 (  1.153237)
    f(941):  0.610000   0.010000   0.620000 (  0.618322)
    g(941):  1.200000   0.000000   1.200000 (  1.203676)
    f(961):  0.640000   0.010000   0.650000 (  0.656267)
    g(961):  1.250000   0.010000   1.260000 (  1.277296)
    f(981):  0.650000   0.000000   0.650000 (  0.658052)
    g(981):  1.100000   0.020000   1.120000 (  1.127124)
   f(1001):  0.660000   0.000000   0.660000 (  0.675562)
   g(1001):  1.240000   0.010000   1.250000 (  1.250755)
    sum(f): 17.120000   0.140000  17.260000 ( 17.599892)
    sum(g): 26.980000   0.230000  27.210000 ( 27.599565)

And here you can see the plotted output. As you can see both functions are (roughly) linear (that's good); but the function g executes considerably slower as the input size increases. This means that for large numbers of N, it takes longer to parse a bibliography containing N entries than to parse N individual entries. This is because the Bibliography class analyses stuff like resolving cross-references etc. Anyway, looking at the graph shows us that this should be optimized going forward. ;-)

@houshuang
Copy link
Contributor Author

I just tried doing this. First, I had to modify lib/bibtex.rb to add require 'rubygems', otherwise it chokes on require 'multi_json' (I guess macruby doesn't have automatic rubygems support.

Afterwards, it runs fine. However, the "problem" is that the output for MRI (1.9.2p180) and MacRuby (0.12, Ruby 1.9.2) are extremely similar. Here is the output for MRI:

                 user     system      total        real
      f(1):  0.000000   0.000000   0.000000 (  0.000622)
      g(1):  0.000000   0.000000   0.000000 (  0.000403)
     f(21):  0.010000   0.000000   0.010000 (  0.011540)
     g(21):  0.010000   0.000000   0.010000 (  0.007310)
     f(41):  0.020000   0.000000   0.020000 (  0.018733)
     g(41):  0.020000   0.000000   0.020000 (  0.017633)
     f(61):  0.030000   0.000000   0.030000 (  0.029408)
     g(61):  0.020000   0.000000   0.020000 (  0.024942)
     f(81):  0.040000   0.000000   0.040000 (  0.037122)
     g(81):  0.040000   0.000000   0.040000 (  0.039202)
    f(101):  0.040000   0.000000   0.040000 (  0.046273)
    g(101):  0.050000   0.000000   0.050000 (  0.047190)
    f(121):  0.060000   0.000000   0.060000 (  0.056533)
    g(121):  0.060000   0.000000   0.060000 (  0.060450)
    f(141):  0.060000   0.000000   0.060000 (  0.064366)
    g(141):  0.070000   0.000000   0.070000 (  0.073773)
    f(161):  0.080000   0.000000   0.080000 (  0.078557)
    g(161):  0.090000   0.000000   0.090000 (  0.088499)
    f(181):  0.080000   0.000000   0.080000 (  0.081382)
    g(181):  0.090000   0.000000   0.090000 (  0.089241)
    f(201):  0.090000   0.000000   0.090000 (  0.087167)
    g(201):  0.090000   0.000000   0.090000 (  0.097179)
    f(221):  0.100000   0.000000   0.100000 (  0.092566)
    g(221):  0.110000   0.000000   0.110000 (  0.111960)
    f(241):  0.100000   0.010000   0.110000 (  0.105829)
    g(241):  0.120000   0.000000   0.120000 (  0.121754)
    f(261):  0.120000   0.000000   0.120000 (  0.113211)
    g(261):  0.130000   0.000000   0.130000 (  0.135470)
    f(281):  0.120000   0.000000   0.120000 (  0.118759)
    g(281):  0.150000   0.000000   0.150000 (  0.146476)
    f(301):  0.130000   0.000000   0.130000 (  0.134218)
    g(301):  0.170000   0.000000   0.170000 (  0.168745)
    f(321):  0.140000   0.000000   0.140000 (  0.140407)
    g(321):  0.180000   0.000000   0.180000 (  0.183246)
    f(341):  0.150000   0.000000   0.150000 (  0.147753)
    g(341):  0.190000   0.000000   0.190000 (  0.189896)
    f(361):  0.150000   0.000000   0.150000 (  0.153986)
    g(361):  0.210000   0.000000   0.210000 (  0.204915)
    f(381):  0.160000   0.000000   0.160000 (  0.166869)
    g(381):  0.200000   0.010000   0.210000 (  0.202102)
    f(401):  0.180000   0.000000   0.180000 (  0.182246)
    g(401):  0.230000   0.000000   0.230000 (  0.229675)
    f(421):  0.190000   0.000000   0.190000 (  0.186329)
    g(421):  0.230000   0.000000   0.230000 (  0.234171)
    f(441):  0.200000   0.000000   0.200000 (  0.196082)
    g(441):  0.260000   0.000000   0.260000 (  0.266406)
    f(461):  0.200000   0.000000   0.200000 (  0.203138)
    g(461):  0.260000   0.000000   0.260000 (  0.258192)
    f(481):  0.210000   0.010000   0.220000 (  0.208168)
    g(481):  0.270000   0.000000   0.270000 (  0.271574)
    f(501):  0.210000   0.000000   0.210000 (  0.212898)
    g(501):  0.290000   0.000000   0.290000 (  0.294501)
    f(521):  0.230000   0.000000   0.230000 (  0.224249)
    g(521):  0.330000   0.000000   0.330000 (  0.332794)
    f(541):  0.230000   0.010000   0.240000 (  0.236670)
    g(541):  0.310000   0.000000   0.310000 (  0.307345)
    f(561):  0.240000   0.000000   0.240000 (  0.245936)
    g(561):  0.330000   0.000000   0.330000 (  0.329643)
    f(581):  0.260000   0.000000   0.260000 (  0.254318)
    g(581):  0.330000   0.000000   0.330000 (  0.339464)
    f(601):  0.260000   0.000000   0.260000 (  0.261145)
    g(601):  0.380000   0.010000   0.390000 (  0.376544)
    f(621):  0.260000   0.000000   0.260000 (  0.263120)
    g(621):  0.390000   0.000000   0.390000 (  0.390627)
    f(641):  0.270000   0.000000   0.270000 (  0.271359)
    g(641):  0.410000   0.000000   0.410000 (  0.407785)
    f(661):  0.280000   0.000000   0.280000 (  0.283797)
    g(661):  0.410000   0.010000   0.420000 (  0.410276)
    f(681):  0.290000   0.000000   0.290000 (  0.299164)
    g(681):  0.430000   0.000000   0.430000 (  0.422964)
    f(701):  0.290000   0.000000   0.290000 (  0.299465)
    g(701):  0.440000   0.000000   0.440000 (  0.440954)
    f(721):  0.310000   0.010000   0.320000 (  0.312829)
    g(721):  0.480000   0.000000   0.480000 (  0.474714)
    f(741):  0.320000   0.000000   0.320000 (  0.327411)
    g(741):  0.500000   0.000000   0.500000 (  0.494715)
    f(761):  0.320000   0.000000   0.320000 (  0.323732)
    g(761):  0.500000   0.010000   0.510000 (  0.505016)
    f(781):  0.340000   0.000000   0.340000 (  0.343639)
    g(781):  0.520000   0.000000   0.520000 (  0.522267)
    f(801):  0.350000   0.000000   0.350000 (  0.350007)
    g(801):  0.550000   0.010000   0.560000 (  0.552032)
    f(821):  0.350000   0.000000   0.350000 (  0.352463)
    g(821):  0.580000   0.000000   0.580000 (  0.579625)
    f(841):  0.370000   0.000000   0.370000 (  0.374373)
    g(841):  0.600000   0.000000   0.600000 (  0.600615)
    f(861):  0.360000   0.010000   0.370000 (  0.364205)
    g(861):  0.620000   0.000000   0.620000 (  0.617375)
    f(881):  0.380000   0.000000   0.380000 (  0.380935)
    g(881):  0.620000   0.000000   0.620000 (  0.632298)
    f(901):  0.400000   0.010000   0.410000 (  0.394550)
    g(901):  0.660000   0.000000   0.660000 (  0.661010)
    f(921):  0.400000   0.000000   0.400000 (  0.406453)
    g(921):  0.690000   0.000000   0.690000 (  0.688062)
    f(941):  0.400000   0.010000   0.410000 (  0.405330)
    g(941):  0.690000   0.000000   0.690000 (  0.691564)
    f(961):  0.430000   0.000000   0.430000 (  0.429863)
    g(961):  0.740000   0.010000   0.750000 (  0.750301)
    f(981):  0.420000   0.000000   0.420000 (  0.418148)
    g(981):  0.760000   0.000000   0.760000 (  0.770694)
   f(1001):  0.440000   0.000000   0.440000 (  0.436495)
   g(1001):  0.800000   0.010000   0.810000 (  0.805949)
    sum(f): 11.070000   0.070000  11.140000 ( 11.133818)
    sum(g): 16.610000   0.070000  16.680000 ( 16.669537)

and here is the output for MacRuby

                 user     system      total        real
      f(1):  0.000000   0.000000   0.000000 (  0.000597)
      g(1):  0.000000   0.000000   0.000000 (  0.000403)
     f(21):  0.010000   0.000000   0.010000 (  0.008394)
     g(21):  0.010000   0.000000   0.010000 (  0.010537)
     f(41):  0.020000   0.000000   0.020000 (  0.018743)
     g(41):  0.020000   0.010000   0.030000 (  0.017602)
     f(61):  0.030000   0.000000   0.030000 (  0.026597)
     g(61):  0.020000   0.000000   0.020000 (  0.028474)
     f(81):  0.040000   0.000000   0.040000 (  0.037569)
     g(81):  0.040000   0.000000   0.040000 (  0.034700)
    f(101):  0.040000   0.000000   0.040000 (  0.047319)
    g(101):  0.050000   0.000000   0.050000 (  0.046107)
    f(121):  0.060000   0.000000   0.060000 (  0.055845)
    g(121):  0.060000   0.000000   0.060000 (  0.062644)
    f(141):  0.070000   0.000000   0.070000 (  0.070522)
    g(141):  0.070000   0.000000   0.070000 (  0.067645)
    f(161):  0.070000   0.000000   0.070000 (  0.079994)
    g(161):  0.090000   0.000000   0.090000 (  0.081004)
    f(181):  0.080000   0.000000   0.080000 (  0.082702)
    g(181):  0.080000   0.000000   0.080000 (  0.086279)
    f(201):  0.090000   0.000000   0.090000 (  0.087983)
    g(201):  0.100000   0.000000   0.100000 (  0.102436)
    f(221):  0.100000   0.000000   0.100000 (  0.097282)
    g(221):  0.110000   0.000000   0.110000 (  0.108936)
    f(241):  0.110000   0.000000   0.110000 (  0.107562)
    g(241):  0.120000   0.000000   0.120000 (  0.123231)
    f(261):  0.120000   0.000000   0.120000 (  0.117738)
    g(261):  0.130000   0.000000   0.130000 (  0.138497)
    f(281):  0.130000   0.000000   0.130000 (  0.121723)
    g(281):  0.160000   0.000000   0.160000 (  0.163618)
    f(301):  0.140000   0.010000   0.150000 (  0.146252)
    g(301):  0.170000   0.000000   0.170000 (  0.167022)
    f(321):  0.150000   0.000000   0.150000 (  0.149100)
    g(321):  0.160000   0.000000   0.160000 (  0.166145)
    f(341):  0.150000   0.000000   0.150000 (  0.147195)
    g(341):  0.190000   0.000000   0.190000 (  0.187156)
    f(361):  0.150000   0.000000   0.150000 (  0.155126)
    g(361):  0.190000   0.000000   0.190000 (  0.190667)
    f(381):  0.170000   0.000000   0.170000 (  0.166908)
    g(381):  0.210000   0.000000   0.210000 (  0.211386)
    f(401):  0.170000   0.000000   0.170000 (  0.176037)
    g(401):  0.220000   0.010000   0.230000 (  0.218084)
    f(421):  0.180000   0.000000   0.180000 (  0.184280)
    g(421):  0.230000   0.000000   0.230000 (  0.227744)
    f(441):  0.190000   0.000000   0.190000 (  0.193176)
    g(441):  0.260000   0.000000   0.260000 (  0.258051)
    f(461):  0.210000   0.000000   0.210000 (  0.206932)
    g(461):  0.250000   0.000000   0.250000 (  0.254047)
    f(481):  0.210000   0.000000   0.210000 (  0.207435)
    g(481):  0.290000   0.000000   0.290000 (  0.291690)
    f(501):  0.210000   0.010000   0.220000 (  0.218241)
    g(501):  0.290000   0.000000   0.290000 (  0.284853)
    f(521):  0.220000   0.000000   0.220000 (  0.225706)
    g(521):  0.320000   0.000000   0.320000 (  0.321912)
    f(541):  0.230000   0.000000   0.230000 (  0.232489)
    g(541):  0.310000   0.000000   0.310000 (  0.308900)
    f(561):  0.250000   0.000000   0.250000 (  0.247742)
    g(561):  0.350000   0.010000   0.360000 (  0.347809)
    f(581):  0.250000   0.000000   0.250000 (  0.251443)
    g(581):  0.420000   0.000000   0.420000 (  0.430320)
    f(601):  0.290000   0.000000   0.290000 (  0.287576)
    g(601):  0.450000   0.000000   0.450000 (  0.450228)
    f(621):  0.270000   0.000000   0.270000 (  0.274842)
    g(621):  0.390000   0.010000   0.400000 (  0.394948)
    f(641):  0.280000   0.000000   0.280000 (  0.279406)
    g(641):  0.420000   0.000000   0.420000 (  0.417165)
    f(661):  0.290000   0.000000   0.290000 (  0.300725)
    g(661):  0.450000   0.000000   0.450000 (  0.451651)
    f(681):  0.300000   0.010000   0.310000 (  0.297297)
    g(681):  0.470000   0.000000   0.470000 (  0.476887)
    f(701):  0.340000   0.000000   0.340000 (  0.337852)
    g(701):  0.480000   0.000000   0.480000 (  0.482324)
    f(721):  0.310000   0.010000   0.320000 (  0.314285)
    g(721):  0.540000   0.000000   0.540000 (  0.546801)
    f(741):  0.340000   0.000000   0.340000 (  0.337292)
    g(741):  0.490000   0.010000   0.500000 (  0.497936)
    f(761):  0.380000   0.000000   0.380000 (  0.381007)
    g(761):  0.610000   0.000000   0.610000 (  0.611881)
    f(781):  0.350000   0.010000   0.360000 (  0.348679)
    g(781):  0.580000   0.000000   0.580000 (  0.585341)
    f(801):  0.340000   0.000000   0.340000 (  0.345329)
    g(801):  0.610000   0.000000   0.610000 (  0.620062)
    f(821):  0.360000   0.010000   0.370000 (  0.355276)
    g(821):  0.540000   0.000000   0.540000 (  0.545332)
    f(841):  0.360000   0.000000   0.360000 (  0.366477)
    g(841):  0.630000   0.000000   0.630000 (  0.631926)
    f(861):  0.440000   0.010000   0.450000 (  0.448708)
    g(861):  0.650000   0.000000   0.650000 (  0.648266)
    f(881):  0.380000   0.000000   0.380000 (  0.377770)
    g(881):  0.580000   0.010000   0.590000 (  0.591761)
    f(901):  0.390000   0.000000   0.390000 (  0.390795)
    g(901):  0.650000   0.000000   0.650000 (  0.653867)
    f(921):  0.390000   0.010000   0.400000 (  0.392912)
    g(921):  0.710000   0.000000   0.710000 (  0.709803)
    f(941):  0.390000   0.000000   0.390000 (  0.396474)
    g(941):  0.710000   0.000000   0.710000 (  0.711333)
    f(961):  0.410000   0.010000   0.420000 (  0.411303)
    g(961):  0.750000   0.000000   0.750000 (  0.756501)
    f(981):  0.420000   0.000000   0.420000 (  0.424169)
    g(981):  0.800000   0.010000   0.810000 (  0.793700)
   f(1001):  0.410000   0.000000   0.410000 (  0.419748)
   g(1001):  0.760000   0.010000   0.770000 (  0.762972)
    sum(f): 11.290000   0.090000  11.380000 ( 11.356555)
    sum(g): 17.190000   0.080000  17.270000 ( 17.278584)

I reran my testscript, thinking that perhaps there had been an update to MacRuby, bibtex-ruby, or one of the other libraries used, which caused a speedup, but the time difference between parsing a bibliography of 65 publications is still more than 150x.

@inukshuk
Copy link
Owner

Stian, thanks for getting back to this!

It is generally frowned upon in the Ruby community to force require rubygems in a library. If you don't want to edit the file, you can just run your script with ruby -rrubygems or, alternatively, add the require to the benchmark script.

Regarding the results I can see that a) you have a much faster computer than I ;-) and b) you're, right that the results are extremely similar. This means that the parser is not significantly slower in MacRuby and therefore we're losing performance somewhere else.

Just to make sure, in your test script above you're printing the bibliography (the p b); could you change that to something simpler, for instance, just print b.length or something like that. This is just to make sure that it is opening the bibliography which is in fact slower and not printing it.

Since we cannot use ruby-prof we'll have to adapt the benchmark script to our needs.

@houshuang
Copy link
Contributor Author

The rubygems thing is no biggie. I saw somewhere that macruby removed it to
remove overhead. Anyway I can fix that easily.

I removed the print line entirely, and still

[folders2web] time ruby speedtest.rb
4:16:02 ☁ master
☂ ⚡ ✭
ruby speedtest.rb 0.17s user 0.05s system 70% cpu 0.303 total
[folders2web] time macruby speedtest.rb
4:16:38 ☁ master ☂
⚡ ✭
macruby speedtest.rb 23.05s user 0.66s system 118% cpu 20.008 total

I have a 2.7 GhZ i7 MacBook Pro 13" :)

Stian

@houshuang
Copy link
Contributor Author

More numbers:

Only parsing bibtex-file, not printing
MRI: 0.17 MacRuby: 23.05

Only parsing bibtex-file with 3 entries
MRI: 0.15 MacRuby: 3.45

Only requiring rubygems and bibtex-ruby, not doing anything else
MRI: 0.13 MacRuby: 2.74

Only printing "Hi", with no libraries required
MRI: 0.06 MacRuby: 0.01

I did the last to check that there wasn't a longer startup time required for MacRuby, which it wasn't. It does take significantly longer to load bibtex-ruby, but even then, the time it spends on even just 3 entries is significantly longer.

@inukshuk
Copy link
Owner

Please try the following:

In benchmark.rb replace the entry with the contents of your bibtex file (the 3 entries). That is:

input = <<-END
  < ... the 3 entries ... >
END

Next, change n, k to:

n, k = 51, 10

And rerun the test, make a note of the Results.

Then, change the two lines inside the f/g tests from BibTeX::Parser.new.parse(input) to BibTeX.parse(input), and try again.

@houshuang
Copy link
Contributor Author

Did as you asked. Output before I changed the parse lines:

[test] macruby ./benchmark.rb
4:41:17
user system total real
f(1): 0.550000 0.010000 0.560000 ( 0.521162)
g(1): 0.200000 0.000000 0.200000 ( 0.174278)
f(11): 2.130000 0.050000 2.180000 ( 1.855182)
g(11): 6.640000 0.120000 6.760000 ( 5.462189)
f(21): 4.390000 0.090000 4.480000 ( 3.594807)
g(21): 21.710000 0.700000 22.410000 ( 17.545464)
f(31): 6.860000 0.130000 6.990000 ( 5.537885)
g(31): 45.450000 2.160000 47.610000 ( 36.967703)
f(41): 9.000000 0.170000 9.170000 ( 7.142162)
g(41): 77.760000 4.220000 81.980000 ( 62.952831)
f(51): 11.680000 0.230000 11.910000 ( 9.023000)
g(51):121.490000 6.940000 128.430000 ( 99.794021)
sum(f): 34.610000 0.680000 35.290000 ( 27.674197)
sum(g):273.250000 14.140000 287.390000 (222.896485)

And here it is with the two lines changed

                 user     system      total        real
      f(1):  0.590000   0.000000   0.590000 (  0.569724)
      g(1):  0.210000   0.010000   0.220000 (  0.176379)
     f(11):  2.330000   0.050000   2.380000 (  2.051605)
     g(11):  6.690000   0.130000   6.820000 (  5.505299)
     f(21):  4.580000   0.090000   4.670000 (  3.763134)
     g(21): 21.850000   0.700000  22.550000 ( 17.717141)
     f(31):  6.750000   0.130000   6.880000 (  5.419445)
     g(31): 45.630000   2.170000  47.800000 ( 37.105453)
     f(41):  9.050000   0.170000   9.220000 (  7.159262)
     g(41): 77.960000   4.240000  82.200000 ( 63.075123)
     f(51): 11.890000   0.240000  12.130000 (  9.149539)
     g(51):119.660000   6.780000 126.440000 ( 96.518764)
    sum(f): 35.190000   0.680000  35.870000 ( 28.112709)
    sum(g):272.000000  14.030000 286.030000 (220.098160)

@inukshuk
Copy link
Owner

Interesting. Could you post the entries you're using as input? Are there by any chance cross references?

@inukshuk
Copy link
Owner

Plus, as a reference, could you run the tests above with MRI too?

@houshuang
Copy link
Contributor Author

input = <<-END
%% This BibTeX bibliography file was created using BibDesk.
%% http://bibdesk.sourceforge.net/

%% Created for Stian Haklev at 2011-09-20 00:43:44 -0400

%% Saved with string encoding Unicode (UTF-8)

@inproceedings{vladoiu2011open,
Author = {Vladoiu, M.},
Booktitle = {Roedunet International Conference (RoEduNet), 2011 10th},
Date-Added = {2011-09-13 13:34:16 +0000},
Date-Modified = {2011-09-13 13:34:46 +0000},
Keywords = {cites-me; OER/open ed},
Organization = {IEEE},
Pages = {1--6},
Title = {Open courseware initiatives-after 10 years},
Bdsk-File-1 =
{YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAYgAAAAAAYgAAgAABEhvbWUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMmm5vlIKwAAABLivBN2bGFkb2l1MjAxMW9wZW4ucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALu7lypTT2gAAAAAAAAAAAAIAAwAACQAAAAAAAAAAAAAAAAAAAAAHQmliZGVzawAAEAAIAADJpx85AAAAEQAIAADKlQwaAAAAAQAMABLivAAAAF8AAABXAAIAM0hvbWU6c3RpYW46AERvY3VtZW50czoAQmliZGVzazoAdmxhZG9pdTIwMTFvcGVuLnBkZgAADgAoABMAdgBsAGEAZABvAGkAdQAyADAAMQAxAG8AcABlAG4ALgBwAGQAZgAPAAoABABIAG8AbQBlABIALC9zdGlhbi9Eb2N1bWVudHMvQmliZGVzay92bGFkb2l1MjAxMW9wZW4ucGRmABMADS9Wb2x1bWVzL0hvbWUAABUAAgAT//8AAIAF0hwdHh9YJGNsYXNzZXNaJGNsYXNzbmFtZaMfICFdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3RfECsuLi8uLi9Eb2N1bWVudHMvQmliZGVzay92bGFkb2l1MjAxMW9wZW4ucGRm0hwdJCWiJSFcTlNEaWN0aW9uYXJ5EgABhqBfEA9OU0tleWVkQXJjaGl2ZXIACAARABYAHwAoADIANQA6ADwARQBLAFIAXQBlAGwAbwBxAHMAdgB4AHoAfACGAJMAmACgAiwCLgIzAjwCRwJLAlkCYAJpApcCnAKfAqwCsQAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAALD}}

@inproceedings{valjataga2011open,
Author = {Väljataga, T. and Põldoja, Hans and Laanpere, M.},
Booktitle = {Proceedings of the 4th International Network-Based Education
2011 Conference The Social Media in the Middle of Nowhere},
Date-Added = {2011-08-09 10:54:02 +0000},
Date-Modified = {2011-08-09 11:20:02 +0000},
Editor = {Ruokamo, H. and Eriksson, M. and Pekkala, L. and Vuojärvi, H.},
Keywords = {open courses},
Organization = {Rovaniemi: University of Lapland},
Pages = {68-7},
Title = {Open Online Courses: Responding to Design Challenges},
Year = {2011},
Bdsk-File-1 =
{YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAZAAAAAAAZAAAgAABEhvbWUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMmm5vlIKwAAABLivBV2YWxqYXRhZ2EyMDExb3Blbi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALLykymaMLgAAAAAAAAAAAAIAAwAACQAAAAAAAAAAAAAAAAAAAAAHQmliZGVzawAAEAAIAADJpx85AAAAEQAIAADKZsRuAAAAAQAMABLivAAAAF8AAABXAAIANUhvbWU6c3RpYW46AERvY3VtZW50czoAQmliZGVzazoAdmFsamF0YWdhMjAxMW9wZW4ucGRmAAAOACwAFQB2AGEAbABqAGEAdABhAGcAYQAyADAAMQAxAG8AcABlAG4ALgBwAGQAZgAPAAoABABIAG8AbQBlABIALi9zdGlhbi9Eb2N1bWVudHMvQmliZGVzay92YWxqYXRhZ2EyMDExb3Blbi5wZGYAEwANL1ZvbHVtZXMvSG9tZQAAFQACABP//wAAgAXSHB0eH1gkY2xhc3Nlc1okY2xhc3NuYW1lox8gIV1OU011dGFibGVEYXRhVk5TRGF0YVhOU09iamVjdF8QLS4uLy4uL0RvY3VtZW50cy9CaWJkZXNrL3ZhbGphdGFnYTIwMTFvcGVuLnBkZtIcHSQloiUhXE5TRGljdGlvbmFyeRIAAYagXxAPTlNLZXllZEFyY2hpdmVyAAgAEQAWAB8AKAAyADUAOgA8AEUASwBSAF0AZQBsAG8AcQBzAHYAeAB6AHwAhgCTAJgAoAI0AjYCOwJEAk8CUwJhAmgCcQKhAqYCqQK2ArsAAAAAAAACAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAACzQ==}}

@Unpublished{resendes2011comprehensive,
Author = {Resendes, Monica},
Date-Added = {2011-08-08 01:10:03 +0000},
Date-Modified = {2011-08-08 01:10:38 +0000},
Title = {Comprehensive Exam},
Year = {2011},
Bdsk-File-1 =
{YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAbAAAAAAAbAAAgAABEhvbWUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMmm5vlIKwAAABLivB1yZXNlbmRlczIwMTFjb21wcmVoZW5zaXZlLnBkZgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALKEbymSvkAAAAAAAAAAAAAIAAwAACQAAAAAAAAAAAAAAAAAAAAAHQmliZGVzawAAEAAIAADJpx85AAAAEQAIAADKZOfQAAAAAQAMABLivAAAAF8AAABXAAIAPUhvbWU6c3RpYW46AERvY3VtZW50czoAQmliZGVzazoAcmVzZW5kZXMyMDExY29tcHJlaGVuc2l2ZS5wZGYAAA4APAAdAHIAZQBzAGUAbgBkAGUAcwAyADAAMQAxAGMAbwBtAHAAcgBlAGgAZQBuAHMAaQB2AGUALgBwAGQAZgAPAAoABABIAG8AbQBlABIANi9zdGlhbi9Eb2N1bWVudHMvQmliZGVzay9yZXNlbmRlczIwMTFjb21wcmVoZW5zaXZlLnBkZgATAA0vVm9sdW1lcy9Ib21lAAAVAAIAE///AACABdIcHR4fWCRjbGFzc2VzWiRjbGFzc25hbWWjHyAhXU5TTXV0YWJsZURhdGFWTlNEYXRhWE5TT2JqZWN0XxA1Li4vLi4vRG9jdW1lbnRzL0JpYmRlc2svcmVzZW5kZXMyMDExY29tcHJlaGVuc2l2ZS5wZGbSHB0kJaIlIVxOU0RpY3Rpb25hcnkSAAGGoF8QD05TS2V5ZWRBcmNoaXZlcgAIABEAFgAfACgAMgA1ADoAPABFAEsAUgBdAGUAbABvAHEAcwB2AHgAegB8AIYAkwCYAKACVAJWAlsCZAJvAnMCgQKIApECyQLOAtEC3gLjAAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAAvU=}}

END

[test] ruby -KU benchmark.rb

11:40:00
user system total real
f(1): 0.000000 0.000000 0.000000 ( 0.002150)
g(1): 0.010000 0.000000 0.010000 ( 0.001420)
f(11): 0.010000 0.000000 0.010000 ( 0.016743)
g(11): 0.020000 0.000000 0.020000 ( 0.017404)
f(21): 0.030000 0.000000 0.030000 ( 0.030529)
g(21): 0.040000 0.000000 0.040000 ( 0.036849)
f(31): 0.040000 0.000000 0.040000 ( 0.046699)
g(31): 0.050000 0.000000 0.050000 ( 0.044248)
f(41): 0.060000 0.000000 0.060000 ( 0.063777)
g(41): 0.070000 0.000000 0.070000 ( 0.064466)
f(51): 0.070000 0.000000 0.070000 ( 0.077645)
g(51): 0.080000 0.000000 0.080000 ( 0.078850)
sum(f): 0.210000 0.000000 0.210000 ( 0.237543)
sum(g): 0.270000 0.000000 0.270000 ( 0.243237)

[test] ruby -KU benchmark.rb

11:39:53
user system total real
f(1): 0.000000 0.000000 0.000000 ( 0.002288)
g(1): 0.000000 0.000000 0.000000 ( 0.001501)
f(11): 0.020000 0.000000 0.020000 ( 0.018536)
g(11): 0.020000 0.010000 0.030000 ( 0.021100)
f(21): 0.030000 0.000000 0.030000 ( 0.033027)
g(21): 0.040000 0.000000 0.040000 ( 0.035162)
f(31): 0.050000 0.000000 0.050000 ( 0.050134)
g(31): 0.050000 0.000000 0.050000 ( 0.050362)
f(41): 0.070000 0.000000 0.070000 ( 0.070757)
g(41): 0.070000 0.000000 0.070000 ( 0.074642)
f(51): 0.090000 0.000000 0.090000 ( 0.087376)
g(51): 0.090000 0.000000 0.090000 ( 0.096798)
sum(f): 0.260000 0.000000 0.260000 ( 0.262118)
sum(g): 0.270000 0.010000 0.280000 ( 0.279564)

@inukshuk
Copy link
Owner

Hmm, I think we're getting closer. The difference seems to be triggered by something in your entries. Could you try: stripping the comments and the file attachment fields (and if that does not make a difference) other fields until the execution time comes down?

I suspect the problem has to do with parsing the file attachment entries which are quite long. If you could confirm this, I'll come up with a benchmark file that will help us pinpoint what's causing the problem in parser or lexer.

@houshuang
Copy link
Contributor Author

New input:

input = <<-END
%% This BibTeX bibliography file was created using BibDesk.
%% http://bibdesk.sourceforge.net/

%% Created for Stian Haklev at 2011-09-20 00:43:44 -0400

%% Saved with string encoding Unicode (UTF-8)

@inproceedings{vladoiu2011open,
Author = {Vladoiu, M.},
Booktitle = {Roedunet International Conference (RoEduNet), 2011 10th},
Pages = {1--6},
Title = {Open courseware initiatives-after 10 years}}

@inproceedings{valjataga2011open,
Author = {Väljataga, T. and Põldoja, Hans and Laanpere, M.},
Booktitle = {Proceedings of the 4th International Network-Based Education
2011 Conference The Social Media in the Middle of Nowhere},
Editor = {Ruokamo, H. and Eriksson, M. and Pekkala, L. and Vuojärvi, H.},
Organization = {Rovaniemi: University of Lapland},
Pages = {68-7},
Title = {Open Online Courses: Responding to Design Challenges},
Year = {2011}}

@Unpublished{resendes2011comprehensive,
Author = {Resendes, Monica},
Title = {Comprehensive Exam},
Year = {2011}}

END

new result

[test] macruby benchmark.rb
12:11:26
user system total real
f(1): 0.420000 0.000000 0.420000 ( 0.423830)
g(1): 0.130000 0.000000 0.130000 ( 0.110229)
f(11): 1.260000 0.030000 1.290000 ( 1.171771)
g(11): 1.840000 0.040000 1.880000 ( 1.622123)
f(21): 2.190000 0.040000 2.230000 ( 2.051199)
g(21): 4.630000 0.100000 4.730000 ( 3.795607)
f(31): 3.100000 0.060000 3.160000 ( 2.861643)
g(31): 9.500000 0.190000 9.690000 ( 7.664800)
f(41): 5.180000 0.090000 5.270000 ( 4.807926)
g(41): 15.220000 0.270000 15.490000 ( 12.194606)
f(51): 6.360000 0.130000 6.490000 ( 5.832457)
g(51): 20.350000 0.350000 20.700000 ( 15.984544)
sum(f): 18.510000 0.350000 18.860000 ( 17.148826)
sum(g): 51.670000 0.950000 52.620000 ( 41.371907)

(with Parser.new.parse)

@houshuang
Copy link
Contributor Author

Another attempt

input:
input = <<-END
%% This BibTeX bibliography file was created using BibDesk.
%% http://bibdesk.sourceforge.net/

%% Created for Stian Haklev at 2011-09-20 00:43:44 -0400

%% Saved with string encoding Unicode (UTF-8)

@inproceedings{vladoiu2011open,
Author = {Vladoiu, M.},
Booktitle = {Roedunet International Conference (RoEduNet), 2011 10th},
Pages = {1--6},
Title = {Open courseware initiatives-after 10 years}}

@inproceedings{valjataga2011open,
Author = {Väljataga, T. and Põldoja, Hans and Laanpere, M.},
Title = {Open Online Courses: Responding to Design Challenges},
Year = {2011}}

@Unpublished{resendes2011comprehensive,
Author = {Resendes, Monica},
Title = {Comprehensive Exam},
Year = {2011}}

END

result
[test] macruby benchmark.rb
12:12:52
user system total real
f(1): 0.390000 0.000000 0.390000 ( 0.396010)
g(1): 0.070000 0.010000 0.080000 ( 0.067129)
f(11): 0.800000 0.010000 0.810000 ( 0.732907)
g(11): 1.060000 0.020000 1.080000 ( 0.963519)
f(21): 1.550000 0.030000 1.580000 ( 1.474796)
g(21): 3.010000 0.060000 3.070000 ( 2.498192)
f(31): 2.220000 0.040000 2.260000 ( 2.080213)
g(31): 5.210000 0.100000 5.310000 ( 4.185519)
f(41): 3.110000 0.050000 3.160000 ( 2.877646)
g(41): 8.230000 0.150000 8.380000 ( 6.489485)
f(51): 3.640000 0.060000 3.700000 ( 3.352817)
g(51): 11.750000 0.210000 11.960000 ( 9.281206)
sum(f): 11.710000 0.190000 11.900000 ( 10.914390)
sum(g): 29.330000 0.550000 29.880000 ( 23.485050)

@inukshuk
Copy link
Owner

Can you keep trying removing fields from your entries (also you may want to try and remove the comments at the beginning of the file) and rerunning the benchmark? The thing is, when you ran my original test with this entry there was hardly any difference in speed between MRI and MacRuby. Using your entries there is; so there has got to be a difference that is triggered by some detail in your entries and we need to try and locate that detail.

This is normally quite easy with ruby-prof because it gives you a summary of the time spent in individual methods, but for now our best bet is to consecutively remove data from your entries until there is a big jump in execution time.

My test entry was a one book with author, address, date-added, date-modified, keywords, publisher, series, title and year fields. Now the odd thing is that your entries don't seem to contain anything peculiar. Perhaps you start by looking at each entry individually; use my test entry as a reference: each of your entries, when executed individually, should not take considerably longer than mine. If any of them do take longer, we can begin by removing individual fields to see which fields are causing the performance penalty.

@houshuang
Copy link
Contributor Author

I am sorry, I must have made a mistake with the original run of benchmark, which might have set us back a bit. Rerunning it with the original settings in benchmark (as supplied with the gem) shows a huge difference between MRI and MacRuby. Not sure how I managed to get the numbers previously.

First baseline load numbers for a file which only contains "require bibtex"

MRI: 0.11
MacRuby: 2.76

(and for fun, require 'citeproc')

MRI: 0.33
MacRuby: 6.52

Then the benchmarks (using the one Pickaxe publication)

MRI
user system total real
f(1): 0.000000 0.000000 0.000000 ( 0.000743)
g(1): 0.000000 0.000000 0.000000 ( 0.000504)
f(11): 0.010000 0.000000 0.010000 ( 0.008548)
g(11): 0.010000 0.000000 0.010000 ( 0.008101)
f(21): 0.010000 0.000000 0.010000 ( 0.007892)
g(21): 0.010000 0.000000 0.010000 ( 0.013536)
f(31): 0.020000 0.000000 0.020000 ( 0.017215)
g(31): 0.010000 0.000000 0.010000 ( 0.010945)
f(41): 0.020000 0.000000 0.020000 ( 0.021488)
g(41): 0.020000 0.000000 0.020000 ( 0.017821)
f(51): 0.030000 0.000000 0.030000 ( 0.029464)
g(51): 0.020000 0.000000 0.020000 ( 0.022280)
sum(f): 0.090000 0.000000 0.090000 ( 0.085350)
sum(g): 0.070000 0.000000 0.070000 ( 0.073187)

MacRuby:
user system total real
f(1): 0.390000 0.000000 0.390000 ( 0.369663)
g(1): 0.050000 0.000000 0.050000 ( 0.046697)
f(11): 0.560000 0.010000 0.570000 ( 0.520116)
g(11): 0.590000 0.010000 0.600000 ( 0.532049)
f(21): 1.020000 0.020000 1.040000 ( 0.965179)
g(21): 1.200000 0.030000 1.230000 ( 1.031975)
f(31): 1.510000 0.020000 1.530000 ( 1.427880)
g(31): 1.860000 0.040000 1.900000 ( 1.525376)
f(41): 2.040000 0.030000 2.070000 ( 1.904556)
g(41): 2.590000 0.060000 2.650000 ( 2.085736)
f(51): 2.590000 0.050000 2.640000 ( 2.436998)
g(51): 3.250000 0.080000 3.330000 ( 2.611425)
sum(f): 8.110000 0.130000 8.240000 ( 7.624393)
sum(g): 9.540000 0.220000 9.760000 ( 7.833259)

@inukshuk
Copy link
Owner

Ha, so whilst this sidetracked us a little bit, at least it makes a lot more sense (would have been weird because there was nothing peculiar about your entries really). Anyway, so next things to try are:

  • Try disabling the name parser Parser.new(:parse_names => false)
  • Try use the lexer only. The corresponding commands should be in the file already. Uncomment line 23: lexer = BibTeX::Lexer.new and in the two benchmarking functions uncomment the two lexer lines.

@houshuang
Copy link
Contributor Author

Sorry for the sidetracking! :)

With name parser

                 user     system      total        real
      f(1):  0.320000   0.000000   0.320000 (  0.312330)
      g(1):  0.040000   0.000000   0.040000 (  0.040179)
     f(11):  0.540000   0.010000   0.550000 (  0.506932)
     g(11):  0.580000   0.020000   0.600000 (  0.525592)
     f(21):  0.940000   0.010000   0.950000 (  0.906421)
     g(21):  1.100000   0.030000   1.130000 (  0.952472)
     f(31):  1.480000   0.020000   1.500000 (  1.412313)
     g(31):  1.710000   0.040000   1.750000 (  1.406194)
     f(41):  1.930000   0.030000   1.960000 (  1.850456)
     g(41):  2.330000   0.050000   2.380000 (  1.901826)
     f(51):  2.360000   0.040000   2.400000 (  2.257148)
     g(51):  2.940000   0.070000   3.010000 (  2.367208)
    sum(f):  7.570000   0.110000   7.680000 (  7.245601)
    sum(g):  8.700000   0.210000   8.910000 (  7.193470)

With lexer only:

                 user     system      total        real
      f(1):  0.180000   0.000000   0.180000 (  0.167365)
      g(1):  0.040000   0.000000   0.040000 (  0.044349)
     f(11):  0.530000   0.010000   0.540000 (  0.486000)
     g(11):  0.540000   0.020000   0.560000 (  0.490285)
     f(21):  0.940000   0.010000   0.950000 (  0.908252)
     g(21):  1.060000   0.030000   1.090000 (  0.920382)
     f(31):  1.430000   0.020000   1.450000 (  1.361777)
     g(31):  1.680000   0.040000   1.720000 (  1.379683)
     f(41):  1.900000   0.030000   1.930000 (  1.836740)
     g(41):  2.240000   0.060000   2.300000 (  1.829948)
     f(51):  2.330000   0.030000   2.360000 (  2.254141)
     g(51):  2.900000   0.060000   2.960000 (  2.346938)
    sum(f):  7.310000   0.100000   7.410000 (  7.014275)
    sum(g):  8.460000   0.210000   8.670000 (  7.011585)

@inukshuk
Copy link
Owner

(I take it you meant to say, without name parsing above, right?)

Right, we're getting closer now. Judging on the basis of these results it seems that the lexical analysis is responsible for slow performance on MacRuby. In a way, that's great, because the lexer needs to rewritten anyway (see #26).

I'm not sure I'll be able to write the single pass lexer in the next few days, but I'll try to take a quick stab at it and also come up with a better benchmarking script to allow us to see what part of the lexical analysis consumes so much time.

@houshuang
Copy link
Contributor Author

I've never used Instruments, and there is no documentation available. I posted on StackOverflow asking for help.

@dche
Copy link

dche commented Mar 3, 2012

Hi,

Come here from the MacRuby mail list.

To profile MacRuby code, use DTrace, http://www.macruby.org/blog/2009/11/17/macruby05b2.html shows an example.

MacRuby source tree is needed, and DTrace scripts can be found in sample-macruby/DTrace/.

Open two terminal windows, in the first one type,
$ your-benchmark-script.rb &,
remember the PID, and then in second window type,
$ sudo dtrace -s path-to-macruby-source/sample-macruby/DTrace/methods_duration.d -p pid.
You may need to add several seconds pause at the beginning of the benchmark script, so you have time to type in the second window.

The output is long, but "Regexp#initialize" still can be singled out very easily.

It seems that MRI "caches" compiled regexp, while MacRuby doesn't and re-creates every regexp literal constants again and again. Try timing following script, the difference is also huge.

1_000_000.times do
  "123" =~ /^[0-9]+$/
end

So caching all regexp literals to ivars might be a viable solution to #37.

@inukshuk
Copy link
Owner

inukshuk commented Mar 4, 2012

Awesome @dche, thanks a lot!

I suggest we try moving the regular expressions to a class instance variable; this should give us a minimal overhead on MRI but simulate caching for MacRuby.

@inukshuk
Copy link
Owner

inukshuk commented Mar 4, 2012

@houshuang could you run your benchmarks against master and give us the numbers?

@houshuang
Copy link
Contributor Author

Doesn't seem like it really helped :(

ruby benchmark.rb 21:33:33 ☁ master ☀
user system total real
f(1): 0.000000 0.000000 0.000000 ( 0.000640)
g(1): 0.000000 0.000000 0.000000 ( 0.000421)
f(21): 0.010000 0.000000 0.010000 ( 0.011741)
g(21): 0.010000 0.000000 0.010000 ( 0.007213)
f(41): 0.020000 0.000000 0.020000 ( 0.025018)
g(41): 0.020000 0.000000 0.020000 ( 0.017973)
f(61): 0.030000 0.000000 0.030000 ( 0.029438)
g(61): 0.030000 0.000000 0.030000 ( 0.027517)
f(81): 0.040000 0.000000 0.040000 ( 0.043976)
g(81): 0.050000 0.000000 0.050000 ( 0.047991)
f(101): 0.050000 0.000000 0.050000 ( 0.048558)
g(101): 0.050000 0.000000 0.050000 ( 0.051856)
f(121): 0.060000 0.000000 0.060000 ( 0.063068)
g(121): 0.070000 0.000000 0.070000 ( 0.063596)
f(141): 0.060000 0.000000 0.060000 ( 0.058472)
g(141): 0.060000 0.000000 0.060000 ( 0.062874)
f(161): 0.080000 0.000000 0.080000 ( 0.079034)
g(161): 0.070000 0.000000 0.070000 ( 0.076821)
f(181): 0.090000 0.000000 0.090000 ( 0.084116)
g(181): 0.090000 0.000000 0.090000 ( 0.091101)
f(201): 0.090000 0.000000 0.090000 ( 0.093750)
g(201): 0.100000 0.000000 0.100000 ( 0.100371)
f(221): 0.100000 0.000000 0.100000 ( 0.097935)
g(221): 0.110000 0.000000 0.110000 ( 0.110750)
f(241): 0.100000 0.010000 0.110000 ( 0.105077)
g(241): 0.130000 0.000000 0.130000 ( 0.125688)
f(261): 0.120000 0.000000 0.120000 ( 0.122992)
g(261): 0.140000 0.000000 0.140000 ( 0.138203)
f(281): 0.140000 0.000000 0.140000 ( 0.139820)
g(281): 0.150000 0.000000 0.150000 ( 0.148650)
f(301): 0.150000 0.000000 0.150000 ( 0.153027)
g(301): 0.170000 0.000000 0.170000 ( 0.173788)
f(321): 0.150000 0.000000 0.150000 ( 0.147638)
g(321): 0.180000 0.000000 0.180000 ( 0.183194)
f(341): 0.160000 0.000000 0.160000 ( 0.157463)
g(341): 0.200000 0.000000 0.200000 ( 0.202536)

[test] macruby benchmark.rb 21:34:36 ☁ master ☂ ⚡
user system total real
f(1): 0.380000 0.000000 0.380000 ( 0.374224)
g(1): 0.040000 0.000000 0.040000 ( 0.044956)
f(21): 1.060000 0.020000 1.080000 ( 0.962675)
g(21): 1.100000 0.030000 1.130000 ( 0.990855)
f(41): 2.030000 0.040000 2.070000 ( 1.916846)
g(41): 2.510000 0.060000 2.570000 ( 2.068640)
f(61): 2.960000 0.050000 3.010000 ( 2.783313)
g(61): 4.100000 0.100000 4.200000 ( 3.287772)
f(81): 4.650000 0.090000 4.740000 ( 4.485962)
g(81): 5.630000 0.140000 5.770000 ( 4.453925)
f(101): 5.280000 0.090000 5.370000 ( 4.918456)
g(101): 7.020000 0.180000 7.200000 ( 5.580300)
f(121): 6.240000 0.100000 6.340000 ( 5.785481)
g(121): 8.710000 0.220000 8.930000 ( 6.843820)
f(141): 7.560000 0.140000 7.700000 ( 6.965423)
g(141): 10.810000 0.260000 11.070000 ( 8.324514)
f(161): 8.520000 0.160000 8.680000 ( 7.783367)
g(161): 12.760000 0.320000 13.080000 ( 9.747104)
f(181): 9.820000 0.180000 10.000000 ( 8.946036)
g(181): 14.630000 0.440000 15.070000 ( 11.234789)
f(201): 11.190000 0.210000 11.400000 ( 10.065997)
g(201): 16.940000 0.660000 17.600000 ( 13.150330)
f(221): 12.310000 0.230000 12.540000 ( 11.014733)
g(221): 18.410000 0.830000 19.240000 ( 14.411930)
f(241): 13.650000 0.250000 13.900000 ( 12.084216)
g(241): 20.820000 1.080000 21.900000 ( 16.397737)
f(261): 15.020000 0.270000 15.290000 ( 13.133269)
g(261): 23.820000 1.350000 25.170000 ( 19.101470)
f(281): 16.660000 0.300000 16.960000 ( 14.486580)
g(281): 25.540000 1.710000 27.250000 ( 20.518307)
f(301): 17.600000 0.320000 17.920000 ( 15.103449)
g(301): 27.690000 2.090000 29.780000 ( 22.433962)
f(321): 19.280000 0.360000 19.640000 ( 16.325466)
g(321): 29.380000 2.450000 31.830000 ( 24.070024)
f(341): 20.880000 0.390000 21.270000 ( 17.473757)
g(341): 33.470000 3.160000 36.630000 ( 28.516775)

@houshuang
Copy link
Contributor Author

I did a sample of benchmark running MacRuby and uploaded it here, not sure if you can make anything out of it. http://dl.dropbox.com/u/1341682/out.zip

I also did the two other DTrace functions collected_objects and function_count in this file: http://dl.dropbox.com/u/1341682/more_dtrace.zip

This is just over an arbitrary period of time running Benchmark. If any of this is useful and you want me to run it again over a specific time, with specific settings etc, let me know.

These are the three DTrace scripts that are included with MacRuby.

Thanks a lot for your input by the way, @dche!

@inukshuk
Copy link
Owner

inukshuk commented Mar 5, 2012

According to your results Regexp#initialize is most definitely the culprit. So why didn't this help?

  • Just to check my own sanity: you tested against master including the two commits from last night, right? You could double-check by looking at lib/bibtex/lexer.rb and see if it has the @patterns cache defined at the top.
  • Can you make sure you're really benchmarking the lexer only. I've described this in a previous comment above. Basically, can you check that in you benchmarking script targets lexer.analyse.

@houshuang
Copy link
Contributor Author

I confirmed that I'm on rev 59ccee9, and that the @patterns are there.

I changed to benchmark only the lexer. Again widely different results:

[bibtex-ruby] ruby test/benchmark.rb 7:26:42 ☁ master ☂ ⚡
user system total real
f(1): 0.000000 0.000000 0.000000 ( 0.000313)
g(1): 0.000000 0.000000 0.000000 ( 0.000231)
f(21): 0.000000 0.000000 0.000000 ( 0.003490)
g(21): 0.010000 0.000000 0.010000 ( 0.006004)
f(41): 0.010000 0.000000 0.010000 ( 0.006657)
g(41): 0.000000 0.000000 0.000000 ( 0.008503)
f(61): 0.020000 0.000000 0.020000 ( 0.015250)
g(61): 0.020000 0.000000 0.020000 ( 0.016912)
f(81): 0.010000 0.000000 0.010000 ( 0.015483)
g(81): 0.020000 0.000000 0.020000 ( 0.017838)
f(101): 0.020000 0.000000 0.020000 ( 0.020783)
g(101): 0.020000 0.000000 0.020000 ( 0.020176)

[bibtex-ruby] macruby test/benchmark.rb 7:27:01 ☁ master ☂ ⚡
user system total real
f(1): 0.140000 0.000000 0.140000 ( 0.142504)
g(1): 0.040000 0.000000 0.040000 ( 0.038974)
f(21): 0.910000 0.020000 0.930000 ( 0.840123)
g(21): 0.940000 0.030000 0.970000 ( 0.859722)
f(41): 1.680000 0.020000 1.700000 ( 1.615430)
g(41): 2.160000 0.060000 2.220000 ( 1.764287)
f(61): 2.570000 0.040000 2.610000 ( 2.467326)
g(61): 3.560000 0.080000 3.640000 ( 2.895678)
f(81): 3.630000 0.060000 3.690000 ( 3.485160)
g(81): 4.610000 0.120000 4.730000 ( 3.720300)
f(101): 4.530000 0.070000 4.600000 ( 4.375221)
g(101): 6.060000 0.160000 6.220000 ( 4.877677)

Just to make sure this wasn't an artefact of the benchmarking script, I wrote a simple script that requires (the modified) bibtex, parses a file with about 60 publications, and prints the first one. MRI: 0.21, MacRuby: 24.16

@inukshuk
Copy link
Owner

inukshuk commented Mar 5, 2012

If I read these results correctly (compared to the lexer-only ones you posted originally) it is now even slower than before. Can you confirm that?(Just MacRuby lexer – release / master). Weird!

@houshuang
Copy link
Contributor Author

True, weird. These two are identical, except in the first I required 'bibtex', and in the second 'require File.expand_path('../../lib/bibtex.rb', FILE)'

[test] macruby benchmark.rb 8:14:19 ☁ master ☂ ⚡ ✭
user system total real
f(1): 0.130000 0.000000 0.130000 ( 0.136986)
g(1): 0.040000 0.000000 0.040000 ( 0.038374)
f(21): 0.880000 0.010000 0.890000 ( 0.813287)
g(21): 0.950000 0.030000 0.980000 ( 0.849334)
f(41): 1.640000 0.020000 1.660000 ( 1.570212)
g(41): 2.110000 0.060000 2.170000 ( 1.719145)
f(61): 2.450000 0.030000 2.480000 ( 2.348409)
g(61): 3.300000 0.080000 3.380000 ( 2.665291)
f(81): 3.270000 0.050000 3.320000 ( 3.121622)
g(81): 4.520000 0.120000 4.640000 ( 3.619892)
f(101): 4.100000 0.060000 4.160000 ( 3.900480)
g(101): 5.730000 0.150000 5.880000 ( 4.561672)
f(121):^C
[test] macruby benchmark.rb 8:14:55 ☁ master ☂ ⚡ ✭
user system total real
f(1): 0.140000 0.000000 0.140000 ( 0.140768)
g(1): 0.040000 0.000000 0.040000 ( 0.038383)
f(21): 0.890000 0.010000 0.900000 ( 0.813669)
g(21): 0.940000 0.030000 0.970000 ( 0.847099)
f(41): 1.650000 0.020000 1.670000 ( 1.586052)
g(41): 2.110000 0.050000 2.160000 ( 1.724443)
f(61): 2.460000 0.040000 2.500000 ( 2.347957)
g(61): 3.220000 0.080000 3.300000 ( 2.619925)
f(81): 3.280000 0.050000 3.330000 ( 3.114385)
g(81): 4.480000 0.110000 4.590000 ( 3.596411)
f(101): 4.080000 0.060000 4.140000 ( 3.895985)
g(101): 5.760000 0.150000 5.910000 ( 4.611965)

@inukshuk
Copy link
Owner

inukshuk commented Mar 5, 2012

Can you try this again but having uninstalled the gem first? Alternatively, you can also print out (before the benchmark starts) the value of:

BibTeX::Lexer.patterns

These should exist on master. If they don't we're somehow pulling in an old version.

On Mar 5, 2012, at 2:16 PM, Stian Håklev wrote:

True, weird. These two are identical, except in the first I required 'bibtex', and in the second 'require File.expand_path('../../lib/bibtex.rb', FILE)'

[test] macruby benchmark.rb 8:14:19 ☁ master ☂ ⚡ ✭
user system total real
f(1): 0.130000 0.000000 0.130000 ( 0.136986)
g(1): 0.040000 0.000000 0.040000 ( 0.038374)
f(21): 0.880000 0.010000 0.890000 ( 0.813287)
g(21): 0.950000 0.030000 0.980000 ( 0.849334)
f(41): 1.640000 0.020000 1.660000 ( 1.570212)
g(41): 2.110000 0.060000 2.170000 ( 1.719145)
f(61): 2.450000 0.030000 2.480000 ( 2.348409)
g(61): 3.300000 0.080000 3.380000 ( 2.665291)
f(81): 3.270000 0.050000 3.320000 ( 3.121622)
g(81): 4.520000 0.120000 4.640000 ( 3.619892)
f(101): 4.100000 0.060000 4.160000 ( 3.900480)
g(101): 5.730000 0.150000 5.880000 ( 4.561672)
f(121):^C
[test] macruby benchmark.rb 8:14:55 ☁ master ☂ ⚡ ✭
user system total real
f(1): 0.140000 0.000000 0.140000 ( 0.140768)
g(1): 0.040000 0.000000 0.040000 ( 0.038383)
f(21): 0.890000 0.010000 0.900000 ( 0.813669)
g(21): 0.940000 0.030000 0.970000 ( 0.847099)
f(41): 1.650000 0.020000 1.670000 ( 1.586052)
g(41): 2.110000 0.050000 2.160000 ( 1.724443)
f(61): 2.460000 0.040000 2.500000 ( 2.347957)
g(61): 3.220000 0.080000 3.300000 ( 2.619925)
f(81): 3.280000 0.050000 3.330000 ( 3.114385)
g(81): 4.480000 0.110000 4.590000 ( 3.596411)
f(101): 4.080000 0.060000 4.140000 ( 3.895985)
g(101): 5.760000 0.150000 5.910000 ( 4.611965)


Reply to this email directly or view it on GitHub:
#37 (comment)

@houshuang
Copy link
Contributor Author

Man, I'm a horrible software tester :) I went into bibtex.rb and realized that the commands require 'bibtex/lexer' etc probably worked through RubyGems, even though I called the local bibtex.rb. I uninstalled the gem, then I got all kinds of errors from path mistakes, which I tried to fix. There were also several of the requires that didn't work, and I didn't find the matching files in lib/bibtex, parser and name_parser. Slightly rewritten to get around this I get

release:

         user     system      total        real
      f(1):  0.140000   0.000000   0.140000 (  0.136864)
      g(1):  0.050000   0.000000   0.050000 (  0.040198)
     f(21):  0.880000   0.020000   0.900000 (  0.828929)
     g(21):  0.950000   0.020000   0.970000 (  0.849521)
     f(41):  1.690000   0.030000   1.720000 (  1.619234)
     g(41):  2.110000   0.050000   2.160000 (  1.740376)
     f(61):  2.470000   0.040000   2.510000 (  2.362543)
     g(61):  3.270000   0.080000   3.350000 (  2.652986)

master

                 user     system      total        real
      f(1):  0.150000   0.000000   0.150000 (  0.143398)
      g(1):  0.070000   0.000000   0.070000 (  0.052197)
     f(21):  0.910000   0.020000   0.930000 (  0.853703)
     g(21):  0.940000   0.020000   0.960000 (  0.853563)
     f(41):  1.690000   0.030000   1.720000 (  1.622108)
     g(41):  2.130000   0.050000   2.180000 (  1.741724)
     f(61):  2.440000   0.040000   2.480000 (  2.343833)
     g(61):  3.290000   0.080000   3.370000 (  2.669190)

@inukshuk
Copy link
Owner

inukshuk commented Mar 6, 2012

On Mar 5, 2012, at 3:40 PM, Stian Håklev wrote:

Man, I'm a horrible software tester :) I went into bibtex.rb and realized that the commands require 'bibtex/lexer' etc probably worked through RubyGems, even though I called the local bibtex.rb. I uninstalled the gem, then I got all kinds of errors from path mistakes, which I tried to fix. There were also several of the requires that didn't work, and I didn't find the matching files in lib/bibtex, parser and name_parser. Slightly rewritten to get around this I get

How did you get around this? I'm sorry, I forgot to mention this earlier, but on master, you will have to generate the parsers (that's why parser and name_parser were missing). You can generate the parsers like this:

$ bundle install
$ bundle exec rake racc

After this, there should be the parser.rb and name_parser.rb files in the lib directory.

@houshuang
Copy link
Contributor Author

This is really useful! Well I'm learning more and more about BibTex-ruby, but it doesn't seem to be getting us anywhere... Here's the latest master, with gem uninstalled, and parser generated. This is using parser:

                 user     system      total        real
      f(1):  0.510000   0.010000   0.520000 (  0.492472)
      g(1):  0.060000   0.000000   0.060000 (  0.055886)
     f(21):  1.370000   0.030000   1.400000 (  1.282683)
     g(21):  1.370000   0.030000   1.400000 (  1.243531)
     f(41):  2.530000   0.050000   2.580000 (  2.428568)
     g(41):  3.150000   0.080000   3.230000 (  2.638907)
     f(61):  3.550000   0.060000   3.610000 (  3.359150)
     g(61):  4.740000   0.120000   4.860000 (  3.822768)

and this using lexer

                 user     system      total        real
      f(1):  0.510000   0.000000   0.510000 (  0.521181)
      g(1):  0.050000   0.000000   0.050000 (  0.056028)
     f(21):  2.390000   0.040000   2.430000 (  2.246512)
     g(21):  1.420000   0.040000   1.460000 (  1.261098)
     f(41):  4.510000   0.080000   4.590000 (  4.288429)
     g(41):  3.290000   0.080000   3.370000 (  2.695208)
     f(61):  7.040000   0.130000   7.170000 (  6.704729)
     g(61):  4.650000   0.110000   4.760000 (  3.736637)

@inukshuk
Copy link
Owner

inukshuk commented Mar 6, 2012

I just installed a macruby-nightly to finally get to the bottom of this, but get an error in the racc gem about a missing symbol (_rb_catch); have you run into this before?

@houshuang
Copy link
Contributor Author

No, but I guess I ran racc using MRI Ruby. You only run it once to generate
right, you don't require it when executing bibtex?

Thanks for all the time you spend on this!

Stian

@inukshuk
Copy link
Owner

inukshuk commented Mar 6, 2012

It's true that we don't necessarily have to generate the parser with MacRuby (in fact, the gem ships with the generated parser). Problem is, I get the error when executing the parser.

What MacRuby version are you using precisely?

@houshuang
Copy link
Contributor Author

0.12 - I compiled it from nightly about a week ago.

inukshuk added a commit that referenced this issue Apr 18, 2012
@inukshuk
Copy link
Owner

I finally had time for more detailed profiling and I believe that the problem lies with MacRuby's StringScanner implementation: they create a new regular expression object every time #scan is called. That's why our explicit pattern cache does not make big difference at all.

@mapreal19
Copy link
Collaborator

Closing since we don't support macruby anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants