-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr-specific query optimizations #96
Comments
The first step is a Solr-specific implementation of OpBGP and corresponding execution plan. An idea (that I'm testing) is:
In this way the total number of operations needed should be smaller than the current (default) implementation. |
A great step ahead: I created the first working version of the Jena StageGenerator, which is in charge to execute and resolve Basic Graph Patterns (BGPs), the SPARQL building blocks. It leverages low-level Solr / Lucene stuff in order to speed up and optimize the patterns execution. At a first glance, I see good results so it seems the idea could work. However, I need
|
The stuff above has been committed in a dedicated branch - issue_89 - so it's not in the master |
Still a lot of things to do. I'm trying to build a bridge between the Jena Op / OpExecutor framework and the Solr world. The general and overall iterator behaviour of Jena classes (i.e. QueryIterator) sometimes doesn't fit very well with the Solr logic especially when a lot of members participate in the query execution plan. Something, for example, like this:
So what I'm trying to do is a new set of classes that act as reducers from a given algebra expression to a Solr DocSet. These classes also needs to implement the Jena QueryIterator interface in a lazy way....that is: when Jena asks for Bindings or QuerySolutions they will produce them on-demand. Before of that, they will work only with Solr / Lucene data model, optimizing and compacting the operations according with the corresponding query parser capabilities. |
A first implementation of Basic Graph Pattern execution seems working. It works directly at Lucene low-level, executing subsequent joins between docsets (resulting from each triple pattern in the graph). Again, the underlying idea seems working but needs some more time: I tried running the integration suite and there are some expected failures (but also a lot of green tests) so the issue_89 branch is definitely unstable. |
failures / 8 errors, mainly expected ClassCastException)
The issue_89 branch contains a rough implementation of
There are still 14 failures and 8 errors in the SELECT tests. They are mainly
|
The first implementation step of the Solr-Jena bridge has been actually completed: as suggested by Jena devs, that is basically a Solr-specific implementation of the Jena graph and dataset domain model.
Now, it's time to go ahead with non-functional requirements, efficiency first of all: the default behaviour of Op and related classes (in general I think a lot of things that are in charge to manage the query algebra and execution) needs to be adapted / specialized in order to provide Solr-specific optimizations.
As I almost ignorant about those topics, I'm trying to study them, but I believe it will take me a bit of time. If there's someone who is more expert than me (very easy) or simply wants to join this adventure, feel free to give me a shout ;)
The text was updated successfully, but these errors were encountered: