-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance, possible improvements #53
Comments
Hmm, something close to this may almost work already. It's actually: chicago_note_bibliography = CSL::Style.load("chicago-note-bibliography")
locale = CSL::Locale.load("en-US") If I pass those in to the locale is still loaded again though, somewhere it turns the locale I pass in back to a string and then it's loaded again using that string as input. Hmm. |
It helps to distinguish between what I call the citation 'renderer' and 'processor' part. The renderer implements CSL and is fairly complete; the processor has many other concerns that will vary a great deal based on your application: parses input, processes output, keeps track of which items have been cited before, and their order (for stuff like 'ibid.'), decides if disambiguation is necessary etc. You'll notice that the renderer, in theory, can be almost stateless, whereas the processor is inherently stateful. At the moment, the processor part is very limited in citeproc-ruby, because my own use cases have always been limited to generating bibliographies in one go as opposed to managing all the citations in a document and the corresponding bibliography, based on dynamic input. That said, you're absolutely right: parsing styles and locales takes a long time and it obviously makes sense to keep instances around. In the context of the library, this would happen in the processor: you create a processor instance with style and locales and use for all your processing needs. If your use case is inherently a stateless one, like you say, you're in the position I usually find myself in as well: you just need to render single (or sometimes multiple) citations and then forget about them (i.e., you're not concerned about disambiguation, ibidems and all that stuff). In this case, typically you don't need a processor at all: this is basically the conclusion you arrived at, I just wanted to give you my rationale for it. So you'd load the styles and locales you need, create a renderer, and use those instances. If you do not modify the styles or locales they should be thread safe; the renderer needs to track some state while it renders, so I would definitely put use a thread local instance for it (to be on the safe side, it would probably be best to keep style and locale in thread local too, especially if you don't do any dynamic modifications, it's safer and only a little memory overhead). With the caveat that I'll have to familiarize myself with the code again, I would say that there is no reason for the renderer to keep loading the same locale again, but that's something we can figure out. |
As an example, here is how we create the renderer in jekyll-scholar and then use it to render bibliographies. |
I found the code where it insists on reloading the locale, which I can change to be analagous to the code for loading the style, which does not. I'll make a PR. As far as your example from jekyll-scholar (which is helpful, thanks! And I think would be a helpful example in the README for one of the involved gems maybe?), my concern is that a The solution "cache it in Thread.local, not globally" doens't really work, because depending on server environment, you can't neccesarily depend on threads being re-used at all. Caching things in Thread.local can end up being equivalent to re-creating it for every request, if every request ends up getting a brand new thread. It's not a super reliable solution. |
The renderer is definitely not thread-safe, because it keeps some state during the rendition: to render an item, the renderer basically walks down the CSL nodes; some state is stored in the citation item as the renderer visits each node; this is probably thread safe even for the more complicated CSL features like groups, because you can make sure not to share citation items between threads. But there is also some state kept in the renderer instance. To make rendering thread safe we'd have to pass down the state to every node which may not be easy to do conceptually, because order is really important: e.g., if you have a style that replaces an author's name in subsequent references with something like ----. So I think we'd have to use a mutex for the render method to make it thread safe. Looking at some of the render methods, you see the state being stored and then at the end cleared again. It's this state, I think which is a conceptual problem so those methods would have to be protected by a mutex. |
OK, thanks. Presumably that also means a By making sure to re-use Locale and Style (which we think are probably thread-safe, so long as you aren't intentionally mutating them), creating a new I will PR a change to citeproc-ruby to allow you to re-use an already loaded locale with a new renderer/ |
@inukshuk hmm, in your example of using the Renderer directly -- what data structure has to be passed in to the renderer, in your code what sort of object is Ah, okay, I see in your code I need to create a I wonder if it would be convenient to have a simpler more convenient api for "render a single citation from csl-data hash". |
got it working. hmm, making it work for "re-use already loaded locale/style" is actually harder at this lower level using a renderer directly, than it was using the higher level CiteProc::Processor. But I'm gonna figure it out. This code gets a bit... tangled. |
While I blush a bit at the 20ms, I'll say that the library has not been optimized for performance yet at all -- there is much room for improvement (but with CSL/CiteProc there are also many rabbit holes, in my experience!). I do agree that it would be nice to have a simple API to 'just render this item' -- it is the use case that I think a lot of people want. PRs much appreciated, if you want to take a stab at it. |
Actually, using the lower-level Renderer API and using a patch to let it use already-loaded locale, I'm getting down to 7-9ms. So much better than the 100-200ms I started out with! Trying to figure out the best place to actually PR the patch. Definitely either the README needs an example of how to do this in a performant way, and/or it needs a simpler API. Maybe both. I'll see what I can do once I get the first patch in. |
Ref inukshuk#53 CiteProc::Ruby::Engine class already does it here for styles: https://github.com/inukshuk/citeproc-ruby/blob/c6b81ea04868ad2afc704d9a38148df053cf4898/lib/citeproc/ruby/engine.rb#L117-L121 And CiteProc::Ruby::Format does it here for formats: https://github.com/inukshuk/citeproc-ruby/blob/c6b81ea04868ad2afc704d9a38148df053cf4898/lib/citeproc/ruby/format.rb#L31 Loading (or not) of all three types of objects is done in kind of different non-parallel ways, but that was kind of there in the code when I found it. With this change for locale, I can avoid loading both locales AND styles (as well as formats while we’re at it although it probably doesn’t make as much of a difference) with EITHER of these APIs: ```ruby csl_style # assume exists a CSL::Style, prob previously loaded with CSL::Style.load csl_locale # assume exists a CSL::Style, prob previously loaded with CSL::Locale.load cp = CiteProc::Processor.new style: csl_style, locale: csl_locale, format: CiteProc::Ruby::Formats::Html.new cp.import csl cp.render :bibliography, id: whatever csl_hash # assume exists, a single hash (not array) that’s csl-data.json format citation_item = CiteProc::CitationItem.new(id: csl_hash[“id"]) do |c| c.data = CiteProc::Item.new(csl) end renderer = CiteProc::Ruby::Renderer.new :format => CiteProc::Ruby::Formats::Html.new, :style => csl_style, :locale => csl_locale renderer.render citation_item, csl_style.bibliography ``` The latter is giving me pretty good performance, 7-9ms on my macbook, with already loaded style/locale. The former still gives better performance than it did before an already loaded locale could be used. There’s other code here: https://github.com/inukshuk/citeproc-ruby/blob/c6b81ea04868ad2afc704d9a38148df053cf4898/lib/citeproc/ruby/renderer/locale.rb#L11 …that at first I was looking at to avoid the load if it already has a locale…. I’m not totally sure when that code is used vs the code I _did_ patch, but the thing I’m PR’ing here is the thing that seemed to work for both CiteProc::Processor use and CiteProc::Renderer use. I think. None of these approaches were obvious from the docs. Perhaps a README PR giving examples of how to do this now with current code? Better convenience API for this use case might also be nice, but I’m not totally sure how to do it sensibly. Maybe just a new method on Renderer that lets you just pass in a csl_hash, without having to make the CitationItem yourself, and lets you pass in :bibliography if you want. (And why do I have to tell CitationItem the `id`, when that’s already in the hash?) Or maybe the convenience API should actually be in terms of `Engine`, create an engine and just pass that in to render each time. Which maybe is possible now. I gotta figure out how to work with Engine directly. Is `Engine` thread-safe do you think? If not, then nevermind on this last paragraph.
Ref inukshuk#53 CiteProc::Ruby::Engine class already does it here for styles: https://github.com/inukshuk/citeproc-ruby/blob/c6b81ea04868ad2afc704d9a38148df053cf4898/lib/citeproc/ruby/engine.rb#L117-L121 And CiteProc::Ruby::Format does it here for formats: https://github.com/inukshuk/citeproc-ruby/blob/c6b81ea04868ad2afc704d9a38148df053cf4898/lib/citeproc/ruby/format.rb#L31 Loading (or not) of all three types of objects is done in kind of different non-parallel ways, but that was kind of there in the code when I found it. With this change for locale, I can avoid loading both locales AND styles (as well as formats while we’re at it although it probably doesn’t make as much of a difference) with EITHER of these APIs: ```ruby csl_style # assume exists a CSL::Style, prob previously loaded with CSL::Style.load csl_locale # assume exists a CSL::Style, prob previously loaded with CSL::Locale.load cp = CiteProc::Processor.new style: csl_style, locale: csl_locale, format: CiteProc::Ruby::Formats::Html.new cp.import csl cp.render :bibliography, id: whatever csl_hash # assume exists, a single hash (not array) that’s csl-data.json format citation_item = CiteProc::CitationItem.new(id: csl_hash[“id"]) do |c| c.data = CiteProc::Item.new(csl) end renderer = CiteProc::Ruby::Renderer.new :format => CiteProc::Ruby::Formats::Html.new, :style => csl_style, :locale => csl_locale renderer.render citation_item, csl_style.bibliography ``` The latter is giving me pretty good performance, 7-9ms on my macbook, with already loaded style/locale. The former still gives better performance than it did before an already loaded locale could be used. There’s other code here: https://github.com/inukshuk/citeproc-ruby/blob/c6b81ea04868ad2afc704d9a38148df053cf4898/lib/citeproc/ruby/renderer/locale.rb#L11 …that at first I was looking at to avoid the load if it already has a locale…. I’m not totally sure when that code is used vs the code I _did_ patch, but the thing I’m PR’ing here is the thing that seemed to work for both CiteProc::Processor use and CiteProc::Renderer use. I think. None of these approaches were obvious from the docs. Perhaps a README PR giving examples of how to do this now with current code? Better convenience API for this use case might also be nice, but I’m not totally sure how to do it sensibly. Maybe just a new method on Renderer that lets you just pass in a csl_hash, without having to make the CitationItem yourself, and lets you pass in :bibliography if you want. (And why do I have to tell CitationItem the `id`, when that’s already in the hash?) Or maybe the convenience API should actually be in terms of `Engine`, create an engine and just pass that in to render each time. Which maybe is possible now. I gotta figure out how to work with Engine directly. Is `Engine` thread-safe do you think? If not, then nevermind on this last paragraph.
@inukshuk in the example you showed me before, you pass a
But as far as I can tell, that Instead, when you actually call render on the renderer, you need to pass in Does this seem like I have it right? Thank you! |
Oh, yes, that sounds right -- you actually don't pass a default style to the renderer. In CSL there are two root rendering nodes: citation and bibliography (for citations and references which go into a bibliography); you pass one or the other to the render function based on what you want to render. |
So I'm not clear from this thread, is there a way to pass in the locale/style with the existing code? Any performance gains would be welcome. |
@mattluce style, yes. locale, no. Pretty much anywhere in existing code you pass in a style as a local file path or URL, you can also pass in a CSL::Style (that you loaded previously with CSL::Style.load), and it will use it, without having to load it again. By "pretty much anywhere", I mean the places I was using/testing, and the places I found in the code. :) Locale, you can pass in a CSL::Locale -- but the existing code will end up trying to load it again anyway (often succeeding, if you loaded it from local path, so you won't see an error, but it'll be extra time to (re)-load). #54 fixes that. |
Unfortunately, I'm not seeing any performance gains when reusing a loaded style object. Anyway, performance improvements would be greatly welcomed. I'm not a Ruby developer, otherwise I'd be happy to contribute. |
@mattluce interesting, I definitely saw big improvements, and benchmarked it in my local test sandbox app. (Of course, you've still got to load the style once, so if you benchmark loading the style in advance plus doing ONE processor instantiation, it will be exactly the same as one processor instantiation loading the style itself. Perf gains are only there if you load a style and then re-use it with multiple processors and/or renderers). Let me check my test code to see what API's I was using... yep, it was the same one as you |
Hey @inukshuk, any chance of getting a release with the CSL::Locale re-use perf improvement in there from #54? I was thinking it might make sense to get some of the other PR's I suggested in before a release, but I guess those are more controversial/challenging/need-thought. #54 is already merged, and I could really use it in a release, and don't want to have my production app depend on a fork! What do you think? |
Yes, sorry, I'm running late here. I'll merge #54 and release it tomorrow! |
Ha, indeed! I actually pushed a release with this as well -- 1.1.10 should include #54 already. For the other changes, I want to finish date ranges first and then release those together. |
Oh, okay, great, thanks! The other stuff is added bonus, but #54 is all I need to put my feature in production. For now, I'm formatting date ranges myself and adding them to a I'm going to close this ticket, thanks! And thanks for your advice and help @inukshuk! |
I wrote up technical details on our citation implementation in a sufia app, using citeproc-ruby, if anyone is interested. https://bibwild.wordpress.com/2018/04/04/another-round-of-citation-features-in-a-sufia-app/ |
I'm evaluating using this in a server side (web) application. For my use case, I just actually need to take a single citation (which I already have in a csl-data-json hash), and output it in one of a handful of styles. Following the directions in the README, I get:
This works, but it's slow. On my macbook, 100-200ms. Too slow for me.
However, good news, profiling reveals that it's slow mostly only because each time it loads and parses the style and locale files. The slow is in
CSL::Loader.load
, which is called twice, once withchicago-note-bibliography
and once with the (default) localeen-US
.If just to play around, I modify CSL::Loader.load to cache in memory, it saves an order of magnitude of time, down to 10-20ms to process.
One probably wouldn't want to actually do that, because then any program using would cache whether it wanted to or not, eventually holding all styles and locales ever used in memory. Plus there could be concurrency issues.
But it would be good if there were some way for the caller to load the style/locale manually, and pass it in. Perhaps something like:
That's not quite right,
CSL::Loader
is actually a module, I got to figure out where it is used, plus of course this just doesn't work at present. It could probably be made to work, I started looking at the code, and got a bit lost in all the objects across three gems involved, but perhaps it could be done fairly straightforwardly. I'm still investigating, toward a potential PR (in one or more of the involved gems).Or maybe there's other existing API not mentioned in the README that would be better to use for this, either as-is, or with some modification.
Do you have any thoughts? Thanks!
The text was updated successfully, but these errors were encountered: