Solr-specific query optimizations #96

agazzarini · 2015-07-14T09:02:07Z

The first implementation step of the Solr-Jena bridge has been actually completed: as suggested by Jena devs, that is basically a Solr-specific implementation of the Jena graph and dataset domain model.

Now, it's time to go ahead with non-functional requirements, efficiency first of all: the default behaviour of Op and related classes (in general I think a lot of things that are in charge to manage the query algebra and execution) needs to be adapted / specialized in order to provide Solr-specific optimizations.

As I almost ignorant about those topics, I'm trying to study them, but I believe it will take me a bit of time. If there's someone who is more expert than me (very easy) or simply wants to join this adventure, feel free to give me a shout ;)

agazzarini · 2015-07-24T06:24:12Z

The first step is a Solr-specific implementation of OpBGP and corresponding execution plan.

An idea (that I'm testing) is:

run a (separate) filter query for each triple pattern
take the DocSet with the lowest cardinality
intersects that one with the second pattern, the result with the third pattern and so on

In this way the total number of operations needed should be smaller than the current (default) implementation.

agazzarini · 2015-07-26T18:42:07Z

A great step ahead: I created the first working version of the Jena StageGenerator, which is in charge to execute and resolve Basic Graph Patterns (BGPs), the SPARQL building blocks.

It leverages low-level Solr / Lucene stuff in order to speed up and optimize the patterns execution. At a first glance, I see good results so it seems the idea could work. However, I need

to structure / refactor the whole thing in order to end with a decent design
to make working the whole integration suite (standalone and SolrCloud mode)
to run some benchmarks with a consistent set of triples.

agazzarini · 2015-07-26T20:02:31Z

The stuff above has been committed in a dedicated branch - issue_89 - so it's not in the master

(not yet completed)

agazzarini · 2015-08-11T15:18:09Z

Still a lot of things to do. I'm trying to build a bridge between the Jena Op / OpExecutor framework and the Solr world. The general and overall iterator behaviour of Jena classes (i.e. QueryIterator) sometimes doesn't fit very well with the Solr logic especially when a lot of members participate in the query execution plan. Something, for example, like this:

(project (?first ?last ?workTel)
  (conditional
    (filter (> ?amount 10000)
      (bgp
        (triple ?s <http://learningsparql.com/ns/addressbook#firstName> ?first)
        (triple ?s <http://learningsparql.com/ns/addressbook#lastName> ?last)
        (triple ?s <http://learningsparql.com/ns/addressbook#portfolio> ?amount)
      ))
    (bgp (triple ?s <http://learningsparql.com/ns/addressbook#workTel> ?workTel))))

project (?first ?last ?workTel)
  (filter (> ?amount 10000)
    (leftjoin
      (bgp
        (triple ?s <http://learningsparql.com/ns/addressbook#firstName> ?first)
        (triple ?s <http://learningsparql.com/ns/addressbook#lastName> ?last)
        (triple ?s <http://learningsparql.com/ns/addressbook#portfolio> ?amount)
      )
      (bgp (triple ?s <http://learningsparql.com/ns/addressbook#workTel> ?workTel)))))

So what I'm trying to do is a new set of classes that act as reducers from a given algebra expression to a Solr DocSet. These classes also needs to implement the Jena QueryIterator interface in a lazy way....that is: when Jena asks for Bindings or QuerySolutions they will produce them on-demand. Before of that, they will work only with Solr / Lucene data model, optimizing and compacting the operations according with the corresponding query parser capabilities.

mainly related with functions)

agazzarini · 2015-08-15T16:51:18Z

A first implementation of Basic Graph Pattern execution seems working. It works directly at Lucene low-level, executing subsequent joins between docsets (resulting from each triple pattern in the graph).

Again, the underlying idea seems working but needs some more time: I tried running the integration suite and there are some expected failures (but also a lot of green tests) so the issue_89 branch is definitely unstable.

test

failures / 8 errors, mainly expected ClassCastException)

agazzarini · 2015-08-23T16:35:23Z

The issue_89 branch contains a rough implementation of

BGP executor (QueryIterator) that works with the most part of BGP integration tests
Filter executor that injects the filter directly in the BGP executor instead of decorating it
Conditional executor, which compares two lazy BGP (not really satisfied about the implementation, but it's working)

There are still 14 failures and 8 errors in the SELECT tests. They are mainly

ClassCastException as I haven't implemented all Op* so sometimes I (wrongly) assume the concrete instance of a given Operation
related with filters and functions: I need a more general bridge betweem the Jena functions and the Solr filters.

agazzarini self-assigned this Jul 14, 2015

agazzarini added the enhancement label Jul 14, 2015

agazzarini mentioned this issue Jul 14, 2015

The count query will trigger heavy looping #89

Closed

agazzarini pushed a commit that referenced this issue Jul 27, 2015

[ issue #96 ] Minor bug fix on query builder

30cd9bc

agazzarini pushed a commit that referenced this issue Jul 27, 2015

[ issue #96 ] Another minor refactoring on Query Optimization classes

6651b87

(not yet completed)

agazzarini pushed a commit that referenced this issue Jul 30, 2015

[ issue #96 ] First working draft of (Solr) BGP handler

d1778de

agazzarini pushed a commit that referenced this issue Jul 30, 2015

[ issue #96 ] Small improvemement of BGP execution with NullObjects

e3dbf78

agazzarini pushed a commit that referenced this issue Aug 5, 2015

[ issue #96 ] BGP + Filter execution sounds working

6fd99e6

agazzarini pushed a commit that referenced this issue Aug 11, 2015

[ issue #96 ] tmp commit (LPT exchange)

1fa1d18

agazzarini mentioned this issue Aug 13, 2015

Migration to Java8 #104

Closed

agazzarini pushed a commit that referenced this issue Aug 14, 2015

[ issue #96 ] Some tmp stuff (including parallel queries execution)

0aff61c

agazzarini pushed a commit that referenced this issue Aug 15, 2015

[ issue #96 ] Working BGP execution with cartesian product

875ef4d

agazzarini pushed a commit that referenced this issue Aug 15, 2015

[ issue #96 ] OpExecutor defaults to classic impl (0 err 30 failures

bd37018

mainly related with functions)

agazzarini pushed a commit that referenced this issue Aug 15, 2015

[ issue #96 ] SELECT Integration Tests: 0 err, 23 failures

c7a8938

agazzarini changed the title ~~Query Solr-specific optimizations~~ Solr-specific query optimizations Aug 15, 2015

agazzarini pushed a commit that referenced this issue Aug 20, 2015

[ issue #96 ] tmp commit

2ffaf5a

agazzarini pushed a commit that referenced this issue Aug 20, 2015

[ issue #96 ] New lazy implementation of BGP execution (not yet working)

fbe3bc7

agazzarini pushed a commit that referenced this issue Aug 21, 2015

[ issue #96 ] *Only* 21 (almost expected) failures on SELECT integration

a2f3fc0

test

agazzarini pushed a commit that referenced this issue Aug 21, 2015

[ issue #96 ] BGP + Filter + Sequence executor: 8 errors, 17 failures

3b58912

agazzarini pushed a commit that referenced this issue Aug 22, 2015

[ issue #96 ] Minor changes

d5e6588

agazzarini pushed a commit that referenced this issue Aug 23, 2015

[ issue #96 ] First (working) draft of optional executor

898f596

agazzarini pushed a commit that referenced this issue Aug 23, 2015

[ issue #96 ] Rough implementation of langMatches function (still 14

0d6ab8f

failures / 8 errors, mainly expected ClassCastException)

agazzarini mentioned this issue Sep 7, 2015

Hybrd query returning incorrect "numFound" #108

Open

agazzarini pushed a commit that referenced this issue Sep 28, 2015

[ issue #96 ] tmp submit

2330a01

agazzarini mentioned this issue Apr 18, 2016

Possible memory leak in SPARQL endpoint? #123

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solr-specific query optimizations #96

Solr-specific query optimizations #96

agazzarini commented Jul 14, 2015

agazzarini commented Jul 24, 2015

agazzarini commented Jul 26, 2015

agazzarini commented Jul 26, 2015

agazzarini commented Aug 11, 2015

agazzarini commented Aug 15, 2015

agazzarini commented Aug 23, 2015

Solr-specific query optimizations #96

Solr-specific query optimizations #96

Comments

agazzarini commented Jul 14, 2015

agazzarini commented Jul 24, 2015

agazzarini commented Jul 26, 2015

agazzarini commented Jul 26, 2015

agazzarini commented Aug 11, 2015

agazzarini commented Aug 15, 2015

agazzarini commented Aug 23, 2015