Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle SDO-style schemas #108

Open
VladimirAlexiev opened this issue Feb 5, 2021 · 7 comments
Open

Handle SDO-style schemas #108

VladimirAlexiev opened this issue Feb 5, 2021 · 7 comments

Comments

@VladimirAlexiev
Copy link

schema.org doesn't use rdfs:domain and range.
Some other important ontologies have adopted this style. eg SSN, SOSA (according to @dr-shorthair), WOT TD.
Our euBusinessGraph SWJ paper describes the benefits of such lighter way approach.

Can you support this style in PyLODE?
See essepuntato/LODE#12 for details, and essepuntato/LODE#13 for more details, and an implementation as a patch to LODE's XSL (doing this patching made me appreciate you decided to rewrite without using XSL).

You already have included two examples to test it: SSN, SOSA (I currently don't see any domain/range). Also test on schema.org, as it's pretty big.

@VladimirAlexiev
Copy link
Author

https://github.com/RDFLib/pyLODE#annotations says that SDO props are supported:

  • domains - rdfs:domain or schema:domainIncludes
  • ranges - rdfs:range or schema:rangeIncludes

(as well as probably all the additional annotation props from my patch above).

But there's a bug then, compare:

Note: it turns out that SSN doesn't use SDO constructs, SOSA does.

@VladimirAlexiev
Copy link
Author

VladimirAlexiev commented Feb 5, 2021

pylode bails on SDO:

pylode -u https://schema.org/version/latest/schemaorg-current-http.ttl -c true -o schema.html
"Your RDF file does not define an ontology"

@VladimirAlexiev
Copy link
Author

After removing all unicode chars:

time pylode -i schema-with-added-ontology.ttl -c true -o schema.html
Finished. ontdoc documentation in schema.html

real    1m39.494s <<< but it's a big ontology (970k ttl)
user    0m0.015s
sys     0m0.094s

It looks about ok, with the following fixes needed:

  • handle schema:domainIncludes, rangeIncludes
  • treat classes that are also schema:DataType as datatypes:
schema:Boolean a schema:DataType, rdfs:Class ;
  • handle dct:source, eg
schema:Brand a rdfs:Class ;
    dct:source <http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_GoodRelationsClass> ;
  • handle embedded HTML better. eg
A BreadcrumbList is an ItemList consisting of a chain of linked Web pages, 
typically described using at least their URL and their name, and typically ending with the current page.<br/><br/>

The <a class="localLink" href="http://schema.org/position">position</a> property 
is used to reconstruct the order of the items in a BreadcrumbList 
The convention is that a breadcrumb list has an <a class="localLink" href="http://schema.org/itemListOrder">itemListOrder</a> of 
<a class="localLink" href="http://schema.org/ItemListOrderAscending">ItemListOrderAscending</a> 

is rendered as

A BreadcrumbList is an ItemList consisting of a chain of linked Web pages, 
typically described using at least their URL and their name, and typically ending with the current page.\n\n
The [[position]] property is used to reconstruct the order of the items in a BreadcrumbList 
The convention is that a breadcrumb list has an [[itemListOrder]] of [[ItemListOrderAscending]] 

In other words it parses the HTML and turns it to internal markdown, but then the markdown is not translated to corresponding html

  • handle internal links: eg [[itemListOrder]] should become <a href='#itemListOrder'>itemListOrder</a>.
    • Note: this HTML link is handled correctly: See also the <a href="/docs/hotels.html">dedicated document on the use of schema.org for marking up hotels and other forms of accommodations</a>
    • I think it's also better to have pylode-generated links to point to internal anchors,
      rather than to the semantic URL (eg https://schema.org/itemListOrder), because this means the links are broken until the file is published officially.
  • handle markdown links, eg This corresponds to the [YearBuilt field in RESO](https://ddwiki.reso.org/display/DDW17/YearBuilt+Field) is rendered as the same plain text, rather than generating a HTML link
    • Note: this markdown link is handled correctly: (Source: Wikipedia see [https://en.wikipedia.org/wiki/Campsite](https://en.wikipedia.org/wiki/Campsite)). Here the name and the link are the same...

PS: Let me know if you want schema-with-removed-UTF8.ttl (the fixed input) and schema.html (the output)

@nicholascar
Copy link
Member

I'm looking in to this Issue now

@nicholascar
Copy link
Member

@VladimirAlexiev can you try the latest versions of pyLODE (v2.9.x) for this task? It should correctly handle domainIncludes and rangeIncludes. It won't handle a missing owl:Ontology declaration, so I think your best bet is just to add statements like that to the data before sending it to pyLODE. pyLODE does do some ontology building to cater for various property options, like different forms of class labels/descriptions, but I'm not keen to cater for no owl:Ontology as lots of things (i.e. all the metadata) are dependent on this Ontology declaration, so I'd rather a user specifically set the ontology in a pre-pyLODE step.

@VladimirAlexiev
Copy link
Author

eccenca/jod#15 asks the same props to be handled in 3 namespaces: schema, dcam, dcid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants