Skip to content
This repository has been archived by the owner on Feb 13, 2023. It is now read-only.
Javier Arias edited this page Dec 12, 2019 · 10 revisions

Extract-citations-from-book.xsl allows to automatically extract bibliographic citations from content encoded in XML TEI. This data is then recorded in a separate XML file and can be deposited to the CrossRef database as part of the Cited-by service (https://www.crossref.org/services/cited-by/).

The program individuates and extracts every element that has been tagged as a bibliographic entry (<bibl>), except when it is a child of the <figure> element.

When the reference contains a DOI it is encoded as follows:

<citation key="ref1">
    <doi>10.1017/upo9781844652594</doi>
</citation>

On the other hand, when <bibl> does not contain a DOI it is converted into a <unstructured_citation> element:

<citation key="ref5">
    <unstructured_citation>Bentham, Jeremy, ‘An Introduction to the Principles of Morals and Legislation’, in Utilitarianism and Other Essays, ed. by Alan Ryan (London: Penguin Books, 2004).</unstructured_citation>
</citation>

The element <unstructured_citation> has been chosen over <structured_citation> because the task of inferring structured information from text formatting is excessively error-prone. As recommended by CrossRef, all formatting is removed from the unstructured citation and any URL that is not a DOI is ignored.

This XSL transformation can be run in conjunction with the conversion tools hosted at https://github.com/OpenBookPublishers/XML-last, after the conversion has been completed and the result has been validated.

See https://support.crossref.org/hc/en-us/articles/215578403 for more on adding references to a CrossRef's metadata record.

Clone this wiki locally