Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research how to integrate DCAT in Gobierto Data #2671

Open
amiedes opened this issue Nov 13, 2019 · 8 comments · May be fixed by #3846
Open

Research how to integrate DCAT in Gobierto Data #2671

amiedes opened this issue Nov 13, 2019 · 8 comments · May be fixed by #3846

Comments

@amiedes
Copy link
Contributor

amiedes commented Nov 13, 2019

@amiedes amiedes self-assigned this Nov 13, 2019
@furilo furilo assigned entantoencuanto and unassigned amiedes Nov 18, 2019
@furilo
Copy link
Member

furilo commented Nov 18, 2019

@amiedes: @entantoencuanto will be looking at some of these things this week.

@furilo
Copy link
Member

furilo commented Nov 21, 2019

Issue updated with link to DCAT-AP in EU site https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe

@entantoencuanto
Copy link
Member

I've inspected the DCAT of datos.madrid.es. And I think we can generate similar data by adding some extra attributes to both custom fields and vocabularies terms. For example, a dataset appears in the catalog in this way:

<dct:identifier>···</dct:identifier>
<dct:title xml:lang="es">···</dct:title>
<dct:description xml:lang="es">···</dct:description>
<dcat:theme rdf:resource="http://···"/>
<dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">···</dct:issued>
<dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">···</dct:modified>
<dc:language>···</dc:language>
<dct:publisher rdf:resource="http://···"/>
<dct:license rdf:resource="https://···l"/>
<dcat:distribution>
  <dcat:Distribution>
    <dcat:accessURL rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI"></dcat:accessURL>
    <dcat:mediaType>···</dcat:mediaType>
    <dcat:byteSize>···</dcat:byteSize>
  </dcat:Distribution>
</dcat:distribution>

For the custom fields:

  • Each one can have an additional attribute rdf_decorator which knows how to represent the resource in DCAT format as string. For example, if the description internally is stored as:
{
    "es": "Parques Nacionales",
    "en": "National Parks"
}

once decorated this information can be included as:

<dct:title xml:lang="es">Parques Nacionales</dct:title>
<dct:title xml:lang="es">National Parks</dct:title>

For the vocabulary terms:

  • A vocabulary term should include an extra attribute with associated metadata. For example, if there is a custom field of type vocabulary named theme, internally is stored as:
{
    "theme": [1]
}

The 1 is the id of a vocabulary term which includes a meta:

{
    "rdf:resource": "http://datos.gob.es/kos/sector-publico/sector/medio-ambiente"
}

With a vocabulary decorator with source for the custom field the result would be:

<dcat:theme rdf:resource="http://datos.gob.es/kos/sector-publico/sector/medio-ambiente"/>

Other type of vocabulary fields may use different decorators with an output like this (in this case it's a vocabulary field with multiple selection allowed):

<dcat:keyword xml:lang="es">Medio Ambiente</dcat:keyword>
<dcat:keyword xml:lang="es">Impacto ambiental</dcat:keyword>

@stbnrivas
Copy link
Contributor

stbnrivas commented Apr 26, 2021

== WIP ==

before to create a filled rdf dcat it is necessary map some values in any part of application.

Also I'd confirm my thought of a Catalog is dependant of a site (in any way) and a site only have a catalog

dcat:Catalog

values possibly related with a site:

attribute name example of value explanation
dct:title open dcat data catalog #{city}
dct:description open data catalog for #{city} with data into years 2019 until 2021 with formats ...
dct:identifier #465234646344
dct:issued site.created_at
dct:modified GobiertoData::Dataset.maximum(:updated_at)
dct:license link to license
dct:keyword stats create a new keyworks into dataset model
dct:keyword contract
dct:modified site.datasets.max(:updated_at)
dct:creator site.organization.name
dct:publisher site.organization.name
dct:contributor empty
dct:accrualPeriodicity (daily, what values fit here?) https://www.w3.org/TR/vocab-dcat-3/#temporal-properties
foaf:homepage some url
dcat:themeTaxonomy
dct:hasPart unused by us
dcat:dataset contain the dcat:Dataset
dcat:service
dcat:catalog ?
dcat:record ?

dcat:Dataset

of course there another associated to a dataset that probably should be added as custom fields

attribute name example of value comments
dct:identifier gobierto_data_datasets_url(id: slug)
dct:title
dct:description
dct:keyword can be multiples keywords
dct:issued
dct:modifed
dct:language
dct:license
dct:publisher site.organization.name
dct:distribution contain the 0+ dcat:Distribution

dcat:Distribution

a distribution belongs to dataset and it is a specific representation of a dataset like csv, xml ...

attribute name example of value
dct:identifier
dct:title
dct:description
dct:accessURL
dct:format application/csv

dcat:DataService (UNUSED BY NOW)

a data service: is a collection of operations through an interface (ex API) to access to one or more datasets

attribute name example of value
identifier

WIP

@furilo
Copy link
Member

furilo commented Apr 26, 2021

For creator I'd just use site_name

@ferblape
Copy link
Member

ferblape commented Apr 27, 2021 via email

@stbnrivas stbnrivas linked a pull request Apr 27, 2021 that will close this issue
@furilo
Copy link
Member

furilo commented May 4, 2021

@stbnrivas please use https://www.itb.ec.europa.eu/shacl/dcat-ap/upload or other validator to validate the XML.

@stbnrivas
Copy link
Contributor

stbnrivas commented May 5, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants