-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiments in SPARQL, or how I learned to stop worrying and name the graph. #6
Comments
Here's a sample SPARQL query, it's generated which gives it these weirdly-names labels: PREFIX dc: <http://purl.org/dc/terms/>
PREFIX gd: <http://data.grano.cc/v1/>
PREFIX gf: <http://ns.grano.cc/v1/fields/>
SELECT ?root ?status_f66de9cdcc ?schemata_048d504a3d ?hidden_427d6a6016 ?name_fd4d44795e
?label_4a212b3078 ?_any_d01be58ea7_name ?_any_d01be58ea7_value ?_any_d01be58ea7_graph
?_any_d01be58ea7_source_url ?id_b79420d01c
WHERE {
?root gf:inProject <http://data.grano.cc/v1/projects/opennews2> . ?root a gd:entities .
OPTIONAL { ?root gf:status ?status_f66de9cdcc }
GRAPH ?_any_d01be58ea7_graph { ?root ?_any_d01be58ea7_attr ?_any_d01be58ea7_value } ?_any_d01be58ea7_graph gf:isActive true . OPTIONAL { ?_any_d01be58ea7_graph dc:source ?_any_d01be58ea7_source_url } ?_any_d01be58ea7_attr a gd:attributes . ?_any_d01be58ea7_attr dc:identifier ?_any_d01be58ea7_name . ?root a ?schemata_048d504a3d . ?schemata_048d504a3d a gd:schemata . OPTIONAL { ?schemata_048d504a3d gf:isHidden ?hidden_427d6a6016 } OPTIONAL { ?schemata_048d504a3d dc:identifier ?name_fd4d44795e } OPTIONAL { ?schemata_048d504a3d <http://www.w3.org/2000/01/rdf-schema#label> ?label_4a212b3078 } OPTIONAL { ?root gf:id ?id_b79420d01c }
{ SELECT DISTINCT ?root
WHERE { ?root gf:inProject <http://data.grano.cc/v1/projects/opennews2> . ?root a gd:entities . OPTIONAL { ?root gf:status ?status_f66de9cdcc } GRAPH ?_any_d01be58ea7_graph { ?root ?_any_d01be58ea7_attr ?_any_d01be58ea7_value } ?_any_d01be58ea7_graph gf:isActive true . OPTIONAL { ?_any_d01be58ea7_graph dc:source ?_any_d01be58ea7_source_url } ?_any_d01be58ea7_attr a gd:attributes . ?_any_d01be58ea7_attr dc:identifier ?_any_d01be58ea7_name . ?root a ?schemata_048d504a3d . ?schemata_048d504a3d a gd:schemata . OPTIONAL { ?schemata_048d504a3d gf:isHidden ?hidden_427d6a6016 } OPTIONAL { ?schemata_048d504a3d dc:identifier ?name_fd4d44795e } OPTIONAL { ?schemata_048d504a3d <http://www.w3.org/2000/01/rdf-schema#label> ?label_4a212b3078 } OPTIONAL { ?root gf:id ?id_b79420d01c } }
LIMIT 25 } } |
Thanks for sharing your adventures! I think Jena is not meant for speed. Also we’re definitely reaching the limits of my practical experience! Maybe an index thing? It might be related to named graphs, not all stores are optimised for that. Funny enough when looking into this on StackOverflow I found out that Virtuoso’s Quad store is based on SQL ?! http://stackoverflow.com/questions/17719341/difference-between-virtuoso-native-rdf-quad-store-and-virtuoso-sql-based-rdf-tri/17720682#17720682. Also some interesting stuff there : Benchmark related stuff:
From when I looked, the only very pretty good tooling with RDF was Ruby (Spira in particular I really liked : https://github.com/ruby-rdf/spira). I wouldn’t be surprised if stuff starts coming up in the Javascript arena too. I have an irrational dislike of Java… :) Maybe @elf-pavlik or @lisp could help with the performance question? |
i have looked closer at your query. there are two issues.
to specify that intent. second, we are working on changes to our control structures, with the unfortunate consequence that, at the moment, caches are disabled and the query set-up time is much higher than it should be. in this case a query (with the inclusive dataset specification) which has an actual execution time under 200ms has a set-up time ten times that. |
@lisp many thanks for that analysis! For your reference, here's the actual COUNT query I was referring to: PREFIX dc: <http://purl.org/dc/terms/>
PREFIX gd: <http://data.grano.cc/v1/>
PREFIX gf: <http://ns.grano.cc/v1/fields/>
SELECT COUNT(DISTINCT(?root))
WHERE { ?root gf:inProject <http://data.grano.cc/v1/projects/opennews2> . ?root a gd:entities . GRAPH ?_any_5b726eb44c_graph { ?root ?_any_5b726eb44c_attr ?_any_5b726eb44c_value } ?_any_5b726eb44c_graph gf:isActive true . OPTIONAL { ?_any_5b726eb44c_graph dc:source ?_any_5b726eb44c_source_url } ?_any_5b726eb44c_attr a gd:attributes . ?_any_5b726eb44c_attr dc:identifier ?_any_5b726eb44c_name } |
On 2014-08-25, at 20:34, Friedrich Lindenberg [email protected] wrote:
i expect, this would need to declare the dataset as follows, as it intends to both incorporate the named graphs into the default graph and match each one separately PREFIX dc: http://purl.org/dc/terms/ still, i am not clear, what you intend. it looks like you want to restrict the graphs, but somehow that restriction eliminates everything, PREFIX dc: http://purl.org/dc/terms/ in that, for your current dataset, the count here is zero, despite the respective statement pattern cardinality. |
I've bloggered about this whole thing here: http://pudo.org/blog/2014/09/01/grano-linked-data.html |
@pudo Still interested in resolving this issue? |
So I’ve had the worst possible weekend, implementing a version of the grano API that is based on RDF/SPARQL. The RDF tooling for anything other than Java is rotten. If you want to use RDF, I would seriously look at something that runs on the JVM for server-side processing (Clojure, Scala…?).
All of that would be a nice challenge, but the result is incredibly slow: running a simple count query on my network entities on Jena Fuseki now takes 300-400ms, and that’s not even a large dataset (5k entities, something like 3k relationships). This remains pretty much the same if I use an in-memory server. It’s 3 seconds on dydra (the fuck?). I must be doing something seriously wrong, but I can’t figure out what - perhaps it’s related to named graphs.
In any case, I thought you might be interested in playing with the raw data - It’s a quarter million quads, modelled along the lines of what we discussed on in #2 and #3. Provenance graphs are UUIDs, everything else is in http://example/update-base/default.
The text was updated successfully, but these errors were encountered: