-
Notifications
You must be signed in to change notification settings - Fork 15
/
rdflib.doc
451 lines (377 loc) · 16.1 KB
/
rdflib.doc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
\section{Managing RDF input files}
\label{sec:rdflib}
Complex projects require RDF resources from many locations and typically
wish to load these in different combinations. For example loading a
small subset of the data for debugging purposes or load a different set
of files for experimentation. The library \pllib{semweb/rdf_library.pl}
manages sets of RDF files spread over different locations, including
file and network locations. The original version of this library
supported metadata about collections of RDF sources in an RDF file
called \jargon{Manifest}. The current version supports both the
\href{http://www.w3.org/TR/void/}{VoID} format and the original format.
VoID files (typically named \file{void.ttl}) can use elements from the
RDF Manifest vocabulary to support features that are not supported by
VoID.
\subsection{The Manifest file}
\label{sec:semweb-rdf-manifest}
A manifest file is an RDF file, often in
\href{http://www.w3.org/TeamSubmission/turtle/}{Turtle} format, that
provides meta-data about RDF resources. Often, a manifest will describe
RDF files in the current directory, but it can also describe RDF
resources at arbitrary URL locations. The RDF schema for RDF library
meta-data can be found in \file{rdf_library.ttl}. The namespace for the
RDF library format is defined as
\url{http://www.swi-prolog.org/rdf/library/} and abbreviated as
\const{lib}.
The schema defines three root classes: lib:Namespace, lib:Ontology and
lib:Virtual, which we describe below.
\begin{description}
\resitem{lib:Ontology}
This is a subclass of owl:Ontology. It has two subclasses, lib:Schema
and lib:Instances. These three classes are currently processed equally.
The following properties are recognised on lib:Ontology:
\begin{description}
\resitem {dc:title}
Title of the ontology. Displayed by rdf_list_library/0.
\resitem {owl:versionInfo}
Version of the ontology. Displayed by rdf_list_library/0.
\resitem {owl:imports}
Ontologies imported. If rdf_load_library/2 is used to load this
ontology, the ontologies referenced here are loaded as well. There
are two subProperties: lib:schema and lib:instances with the obvious
meaning.
\resitem {lib:source}
Defines the named graph into which the resource is loaded. If this
ends in a \const{/}, the basename of each loaded file is appended to
the given source. Defaults to the URL the RDF is loaded from.
\resitem {lib:baseURI}
Defines the base for processing the RDF data. If not provided this
defaults to the named graph, which in turn defaults to the URL the
RDF is loaded from.
\end{description}
\resitem{lib:Virtual}
Virtual ontologies do not refer to an RDF resource themselves. They
only import other resources. For example the W3C WordNet manifest
defines \const{wn-basic} and \const{wn-full} as virtual resources.
The lib:Virtual resource is used as a second rdf:type:
\begin{code}
<wn-basic>
a lib:Ontology ;
a lib:Virtual ;
...
\end{code}
\resitem{lib:CloudNode}
Used by ClioPatria to combine this ontology and all data it imports into
a node in the automatically generated datacloud.
\resitem{lib:Namespace}
Defines a URL to be a namespace. The definition provides the preferred
mnemonic and can be referenced in the lib:providesNamespace and
lib:usesNamespace properties. The rdf_load_library/2 predicates
registers encountered namespace mnemonics with rdf-db using
rdf_register_ns/2. Typically namespace declarations use @{prefix}
declarations. E.g.\
\begin{code}
@prefix lib: <http://www.swi-prolog.org/rdf/library/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
[ a lib:Namespace ;
lib:mnemonic "rdfs" ;
lib:namespace rdfs:
] .
\end{code}
\end{description}
\subsubsection{Support for the VoID and VANN vocabularies}
\label{sec:semweb-void}
The \href{http://www.w3.org/TR/void/}{VoID} aims at resolving the same
problem as the Manifest files described here. In addition, the
\href{http://vocab.org/vann/}{VANN} vocabulary provides the information
about preferred namepaces prefixes. The RDF library manager can deal
with VoID files. The following relations apply:
\begin{itemize}
\item VoID \const{Dataset} and \const{Linkset} are similar to
\const{lib:Ontology}, but a VoID resource is always
\jargon{Virtual}. I.e., the VoID URI itself never refers to
an RDF document.
\item The \const{owl:imports} and its lib specializations are
replaced by \const{void:subset} (referring to another VoID
dataset) and \const{void:dataDump} (referring to a concrete
document).
\item A description of the dataset is given using \const{dcterms:description}
rather than \const{rdfs:comment}
\item The RDF library recognises \const{lib:source}, \const{lib:baseURI}
and \const{lib:Cloudnode}, which have no equivalent in VoID.
\item The RDF library recognises \const{vann:preferredNamespacePrefix} and
\const{vann:preferredNamespaceUri} as alternatives to its
proprietary way for defining prefixes. The domain of these
predicates is unclear. The library recognises them regardless of the
domain. Note that the range of \const{vann:preferredNamespaceUri} is
a \emph{literal}. A disadvantage of that is that the Turtle prefix
declaration cannot be reused.
\end{itemize}
Currently, the RDF metadata is \emph{not} stored in the RDF database. It
is processed by low-level primitives that do \emph{not} perform RDFS
reasoning. In particular, this means that rdfs:supPropertyOf and
rdfs:subClassOf cannot be used to specialise the RDF meta vocabulary.
\subsubsection{Finding manifest files}
\label{sec:semweb-find-manifest}
The initial metadata file(s) are loaded into the system using
rdf_attach_library/1.
\begin{description}
\predicate{rdf_attach_library}{1}{+FileOrDirectory}
Load meta-data on RDF repositories from \arg{FileOrDirectory}. If the
argument is a directory, this directory is processed recursively and
each for each directory, a file named \file{void.ttl},
\file{Manifest.ttl} or \file{Manifest.rdf} is loaded (in this order of
preference).
Declared namespaces are added to the rdf-db namespace list. Encountered
ontologies are added to a private database of
\file{rdf_list_library.pl}. Each ontology is given an
\jargon{identifier}, derived from the basename of the URL without the
extension. This, using the declaration below, the identifier of the
declared ontology is \const{wn-basic}.
\begin{code}
<wn-basic>
a void:Dataset ;
dcterms:title "Basic WordNet" ;
...
\end{code}
\predicate{rdf_list_library}{0}{}
List the available resources in the library. Currently only lists
resources that have a dcterms:title property. See \secref{usage} for
an example.
\end{description}
It is possible for the initial set of manifests to refer to RDF files
that are not covered by a manifest. If such a reference is encountered
while loading or listing a library, the library manager will look for a
manifest file in the directory holding the referenced RDF file and load
this manifest. If a manifest is found that covers the referenced file,
the directives found in the manifest will be followed. Otherwise the RDF
resource is simply loaded using the current defaults.
Further exploration of the library is achieved using rdf_list_library/1
or rdf_list_library/2:
\begin{description}
\predicate{rdf_list_library}{1}{+Id}
Same as \term{rdf_list_library}{Id, []}.
\predicate{rdf_list_library}{2}{+Id, +Options}
Lists the resources that will be loaded if \arg{Id} is handed to
rdf_load_library/2. See rdf_attach_library/1 for how ontology
identifiers are generated. In addition it checks the existence of each
resource to help debugging library dependencies. Before doing its work,
rdf_list_library/2 reloads manifests that have changed since they were
loaded the last time. For HTTP resources it uses the HEAD method to
verify existence and last modification time of resources.
\predicate{rdf_load_library}{2}{+Id, +Options}
Load the given library. First rdf_load_library/2 will establish what
resources need to be loaded and whether all resources exist. Than it
will load the resources.
\end{description}
\subsection{Usage scenarios}
\label{sec:usage}
Typically, a project will use a single file using the same format as a
manifest file that defines alternative configurations that can be
loaded. This file is loaded at program startup using
rdf_attach_library/1. Users can now list the available libraries
using rdf_list_library/0 and rdf_list_library/1:
\begin{code}
1 ?- rdf_list_library.
ec-core-vocabularies E-Culture core vocabularies
ec-all-vocabularies All E-Culture vocabularies
ec-hacks Specific hacks
ec-mappings E-Culture ontology mappings
ec-core-collections E-Culture core collections
ec-all-collections E-Culture all collections
ec-medium E-Culture medium sized data (artchive+aria)
ec-all E-Culture all data
\end{code}
Now we can list a specific category using rdf_list_library/1. Note this
loads two additional manifests referenced by resources encountered in
\const{ec-mappings}. If a resource does not exist is is flagged using
\const{[NOT FOUND]}.
\begin{code}
2 ?- rdf_list_library('ec-mappings').
% Loaded RDF manifest /home/jan/src/eculture/vocabularies/mappings/Manifest.ttl
% Loaded RDF manifest /home/jan/src/eculture/collections/aul/Manifest.ttl
<file:///home/jan/src/eculture/src/server/ec-mappings>
. <file:///home/jan/src/eculture/vocabularies/mappings/mappings>
. . <file:///home/jan/src/eculture/vocabularies/mappings/interface>
. . . file:///home/jan/src/eculture/vocabularies/mappings/interface_class_mapping.ttl
. . . file:///home/jan/src/eculture/vocabularies/mappings/interface_property_mapping.ttl
. . <file:///home/jan/src/eculture/vocabularies/mappings/properties>
. . . file:///home/jan/src/eculture/vocabularies/mappings/ethnographic_property_mapping.ttl
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_properties.ttl
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_property_semantics.ttl
. . <file:///home/jan/src/eculture/vocabularies/mappings/situations>
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_situations.ttl
. <file:///home/jan/src/eculture/collections/aul/aul>
. . file:///home/jan/src/eculture/collections/aul/aul.rdfs
. . file:///home/jan/src/eculture/collections/aul/aul.rdf
. . file:///home/jan/src/eculture/collections/aul/aul9styles.rdf
. . file:///home/jan/src/eculture/collections/aul/extractedperiods.rdf
. . file:///home/jan/src/eculture/collections/aul/manual-periods.rdf
\end{code}
\subsubsection{Referencing resources}
\label{sec:semweb-manifest-resources}
Resources and manifests are located either on the local filesystem or on
a network resource. The initial manifest can also be loaded from a file
or a URL. This defines the initial \jargon{base URL} of the document.
The base URL can be overruled using the Turtle @{base} directive. Other
documents can be referenced relative to this base URL by exploiting
Turtle's URI expansion rules. Turtle resources can be specified in three
ways, as absolute URLs (e.g.\
\verb$<http://www.example.com/rdf/ontology.rdf$>), as relative URL to
the base (e.g.\ \verb$<../rdf/ontology.rdf$>) or following a
\jargon{prefix} (e.g.\ prefix:ontology).
The prefix notation is powerful as we can define multiple of them and
define resources relative to them. Unfortunately, prefixes can only be
defined as absolute URLs or URLs relative to the base URL. Notably, they
cannot be defined relative to other prefixes. In addition, a prefix can
only be followed by a Qname, which excludes \verb$.$ and \verb$/$.
Easily relocatable manifests must define all resources relative to the
base URL. Relocation is automatic if the manifest remains in the same
hierarchy as the resources it references. If the manifest is copied
elsewhere (i.e.\ for creating a local version) it can use @{base} to
refer to the resource hierarchy. We can point to directories holding
manifest files using @{prefix} declarations. There, we can reference
\jargon{Virtual} resources using prefix:name. Here is an example, were
we first give some line from the initial manifest followed by the
definition of the virtual RDFS resource.
\begin{code}
@base <http://gollem.science.uva.nl/e-culture/rdf/> .
@prefix base: <base_ontologies/> .
<ec-core-vocabularies>
a lib:Ontology ;
a lib:Virtual ;
dc:title "E-Culture core vocabularies" ;
owl:imports
base:rdfs ,
base:owl ,
base:dc ,
base:vra ,
...
\end{code}
\begin{code}
<rdfs>
a lib:Schema ;
a lib:Virtual ;
rdfs:comment "RDF Schema" ;
lib:source rdfs: ;
lib:schema <rdfs.rdfs> .
\end{code}
\subsection{Putting it all together}
\label{sec:semweb-rdflib-example}
In this section we provide skeleton code for filling the RDF database
from a password protected HTTP repository. The first line loads the
application. Next we include modules that enable us to manage the RDF
library, RDF database caching and HTTP connections. Then we setup the
HTTP authentication, enable caching of processed RDF files and load the
initial manifest. Finally load_data/0 loads all our RDF data.
\begin{code}
:- use_module(server).
:- use_module(library(http/http_open)).
:- use_module(library(semweb/rdf_library)).
:- use_module(library(semweb/rdf_cache)).
:- http_set_authorization('http://www.example.org/rdf',
basic(john, secret)).
:- rdf_set_cache_options([ global_directory('RDF-Cache'),
create_global_directory(true)
]).
:- rdf_attach_library('http://www.example.org/rdf/Manifest.ttl').
%% load_data
%
% Load our RDF data
load_data :-
rdf_load_library('all').
\end{code}
\subsection{Example: A metadata file for W3C WordNet}
\label{sec:w3cmanifest}
The VoID metadata below allows for loading WordNet in the two
predefined versions using one of
\begin{code}
?- rdf_load_library('wn-basic', []).
?- rdf_load_library('wn-full', []).
\end{code}
\begin{code}
@prefix void: <http://rdfs.org/ns/void#> .
@prefix vann: <http://purl.org/vocab/vann/> .
@prefix lib: <http://www.swi-prolog.org/rdf/library/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix wn20s: <http://www.w3.org/2006/03/wn/wn20/schema/> .
@prefix wn20i: <http://www.w3.org/2006/03/wn/wn20/instances/> .
[ vann:preferredNamespacePrefix "wn20i" ;
vann:preferredNamespaceUri "http://www.w3.org/2006/03/wn/wn20/instances/"
] .
[ vann:preferredNamespacePrefix "wn20s" ;
vann:preferredNamespaceUri "http://www.w3.org/2006/03/wn/wn20/schema/"
] .
<wn20-common>
a void:Dataset ;
dc:description "Common files between full and basic version" ;
lib:source wn20i: ;
void:dataDump
<wordnet-attribute.rdf.gz> ,
<wordnet-causes.rdf.gz> ,
<wordnet-classifiedby.rdf.gz> ,
<wordnet-entailment.rdf.gz> ,
<wordnet-glossary.rdf.gz> ,
<wordnet-hyponym.rdf.gz> ,
<wordnet-membermeronym.rdf.gz> ,
<wordnet-partmeronym.rdf.gz> ,
<wordnet-sameverbgroupas.rdf.gz> ,
<wordnet-similarity.rdf.gz> ,
<wordnet-synset.rdf.gz> ,
<wordnet-substancemeronym.rdf.gz> ,
<wordnet-senselabels.rdf.gz> .
<wn20-skos>
a void:Dataset ;
void:subset <wnskosmap> ;
void:dataDump <wnSkosInScheme.ttl.gz> .
<wnskosmap>
a lib:Schema ;
lib:source wn20s: ;
void:dataDump
<wnskosmap.rdfs> .
<wnbasic-schema>
a void:Dataset ;
lib:source wn20s: ;
void:dataDump
<wnbasic.rdfs> .
<wn20-basic>
a void:Dataset ;
a lib:CloudNode ;
dc:title "Basic WordNet" ;
dc:description "Light version of W3C WordNet" ;
owl:versionInfo "2.0" ;
lib:source wn20i: ;
void:subset
<wnbasic-schema> ,
<wn20-skos> ,
<wn20-common> .
<wnfull-schema>
a void:Dataset ;
lib:source wn20s: ;
void:dataDump
<wnfull.rdfs> .
<wn20-full>
a void:Dataset ;
a lib:CloudNode ;
dc:title "Full WordNet" ;
dc:description "Full version of W3C WordNet" ;
owl:versionInfo "2.0" ;
lib:source wn20i: ;
void:subset
<wnfull-schema> ,
<wn20-skos> ,
<wn20-common> ;
void:dataDump
<wordnet-antonym.rdf.gz> ,
<wordnet-derivationallyrelated.rdf.gz> ,
<wordnet-participleof.rdf.gz> ,
<wordnet-pertainsto.rdf.gz> ,
<wordnet-seealso.rdf.gz> ,
<wordnet-wordsensesandwords.rdf.gz> ,
<wordnet-frame.rdf.gz> .
\end{code}
%%