Wrapper to extract citations from XML editions of OBP books.
docker run --rm \
-v /path/to/local/file.xml.zip:/ebook_automation/file.xml.zip \
-v /path/to/local/doi_deposit.xml:/ebook_automation/file.xml \
-v /path/to/output:/ebook_automation/output \
openbookpublishers/obp-extract-cit
Alternatively you may clone the repo, build the image using docker build . -t some/tag
and run the command above replacing openbookpublishers/obp-extract-cit
with some/tag
.
This wrapper requires saxonb-xslt
to be installed on your system. On Debian (or Debian-based distributions) this package can be installed via
apt-get install libsaxonb-java
To perform the setup, run:
bash setup
The setup contains the necessary instruction to initialise the submodule.
To run the process, place a copy of the XML edition of the book and the DOI deposit in the obp-extract-cit folder. Finally, run:
bash run prefix
where prefix is the name of the book and the DOI deposit files; i.e.: bash run Siklos-Advanced_Problems2
.
bash clean [-y]
would remove temporary files (untracked files and folders stored in the obp-extract-cit folder). The script asks for the user's confirmation before removing the files, but if you are running this as part of a script you might want to use the-y
flag to bypass the confirmation.
Extract-citations-from-book.xsl
fails if the Crossref schema version declared in the DOI deposit does not correspond with the one hardcoded in the stylesheets.
Since the version of our DOI deposits changed over the time, we need a resilient system able to process the all the deposits. The small collection of scripts stored in ./src.
serve for this purpose:
./src/extract_schema_version.py
reads the schema version declared in the DOI deposit;./src/tailor_extract_citations.py
produces compatible variations of the stylesheets.
This repository contains a simple tool to extract bibliographic citations from content encoded in XML TEI and creates a file for submission to CrossRef's cited-by service (see the repo's wiki).
- Extract-citations-from-book.xsl: the script that extracts bibliographic citations
- LICENSE
- README.md: this file
This XSL transformation has been developed in conjunction with the conversion tools hosted at https://github.com/OpenBookPublishers/XML-last but can be used on any XML TEI file where bibliographic citations have been encoded as <bibl>
elements (see http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-bibl.html).
This program:
- individuates every
<bibl>
element within the input file - extracts and numbers them sequentially
- converts each of them to a
<citation>
or<unstructured_citation>
element (see the repo's wiki to read more about the structure of the output file).
To run it:
- Copy your input files to the project folder:
- the XML TEI file containing the book or article you wish to extract citations from
- 'doi-deposit.xml', a file that records the book or article metadata according to the CrossRef schema, version 4.3.5 or newer ( https://support.crossref.org/hc/en-us/articles/214530063). This is the same file that is often used to register content to the CrossRef database (see https://support.crossref.org/hc/en-us/articles/215577783-Creating-content-registration-XML)
- Run 'Extract-citations-from-book.xsl'. To run this transformation (XSLT 2.0) a processor such as SaxonHE will be needed (https://sourceforge.net/projects/saxon/files/Saxon-HE/9.8/). Saxon can be run (1) from within a product that provides a graphical user interface (such as oXygen, https://www.oxygenxml.com/), (2) from the command line or (3) from within a Java or .NET application.
- (1) select your input file and the XSL; the output field can be left blank
- (2) type
java -jar _dir_/saxon9he.jar -s:_your_dir_/Extract-citations/_your_input_file_ -xsl:_your_dir_/Extract-citations/Extract-citations-from-book.xsl -o:_your_dir_/Extract-citations/Extract-citations-from-book.xsl
- (3) see eg http://www.oracle.com/technetwork/java/gazfm-138953.html