obp-extract-cit

Wrapper to extract citations from XML editions of OBP books.

How to run this tool

Run with docker

docker run --rm \
  -v /path/to/local/file.xml.zip:/ebook_automation/file.xml.zip \
  -v /path/to/local/doi_deposit.xml:/ebook_automation/file.xml \
  -v /path/to/output:/ebook_automation/output \
  openbookpublishers/obp-extract-cit

Alternatively you may clone the repo, build the image using docker build . -t some/tag and run the command above replacing openbookpublishers/obp-extract-cit with some/tag.

Run locally

Setup

This wrapper requires saxonb-xslt to be installed on your system. On Debian (or Debian-based distributions) this package can be installed via

apt-get install libsaxonb-java

To perform the setup, run:

bash setup

The setup contains the necessary instruction to initialise the submodule.

Run

To run the process, place a copy of the XML edition of the book and the DOI deposit in the obp-extract-cit folder. Finally, run:

bash run prefix

where prefix is the name of the book and the DOI deposit files; i.e.: bash run Siklos-Advanced_Problems2.

Clean-up

bash clean [-y]

would remove temporary files (untracked files and folders stored in the obp-extract-cit folder). The script asks for the user's confirmation before removing the files, but if you are running this as part of a script you might want to use the-y flag to bypass the confirmation.

DEV

Crossref schema version

Extract-citations-from-book.xsl fails if the Crossref schema version declared in the DOI deposit does not correspond with the one hardcoded in the stylesheets.

Since the version of our DOI deposits changed over the time, we need a resilient system able to process the all the deposits. The small collection of scripts stored in ./src. serve for this purpose:

./src/extract_schema_version.py reads the schema version declared in the DOI deposit;
./src/tailor_extract_citations.py produces compatible variations of the stylesheets.

Extract-citations

This repository contains a simple tool to extract bibliographic citations from content encoded in XML TEI and creates a file for submission to CrossRef's cited-by service (see the repo's wiki).

Files and directories in this repository

Extract-citations-from-book.xsl: the script that extracts bibliographic citations
LICENSE
README.md: this file

Extracting citations

This XSL transformation has been developed in conjunction with the conversion tools hosted at https://github.com/OpenBookPublishers/XML-last but can be used on any XML TEI file where bibliographic citations have been encoded as <bibl> elements (see http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-bibl.html). This program:

individuates every <bibl> element within the input file
extracts and numbers them sequentially
converts each of them to a <citation> or <unstructured_citation> element (see the repo's wiki to read more about the structure of the output file).

To run it:

Copy your input files to the project folder:
- the XML TEI file containing the book or article you wish to extract citations from
- 'doi-deposit.xml', a file that records the book or article metadata according to the CrossRef schema, version 4.3.5 or newer ( https://support.crossref.org/hc/en-us/articles/214530063). This is the same file that is often used to register content to the CrossRef database (see https://support.crossref.org/hc/en-us/articles/215577783-Creating-content-registration-XML)
Run 'Extract-citations-from-book.xsl'. To run this transformation (XSLT 2.0) a processor such as SaxonHE will be needed (https://sourceforge.net/projects/saxon/files/Saxon-HE/9.8/). Saxon can be run (1) from within a product that provides a graphical user interface (such as oXygen, https://www.oxygenxml.com/), (2) from the command line or (3) from within a Java or .NET application.
- (1) select your input file and the XSL; the output field can be left blank
- (2) type java -jar _dir_/saxon9he.jar -s:_your_dir_/Extract-citations/_your_input_file_ -xsl:_your_dir_/Extract-citations/Extract-citations-from-book.xsl -o:_your_dir_/Extract-citations/Extract-citations-from-book.xsl
- (3) see eg http://www.oracle.com/technetwork/java/gazfm-138953.html

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
Dockerfile.test		Dockerfile.test
Extract-citations-from-book.xsl		Extract-citations-from-book.xsl
LICENSE		LICENSE
README.md		README.md
clean		clean
pre-commit.sh		pre-commit.sh
requirements.txt		requirements.txt
run		run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

obp-extract-cit

How to run this tool

Run with docker

Run locally

Setup

Run

Clean-up

DEV

Crossref schema version

Extract-citations

Files and directories in this repository

Extracting citations

About

Releases 1

Packages

Contributors 4

Languages

License

OpenBookPublishers/obp-extract-cit

Folders and files

Latest commit

History

Repository files navigation

obp-extract-cit

How to run this tool

Run with docker

Run locally

Setup

Run

Clean-up

DEV

Crossref schema version

Extract-citations

Files and directories in this repository

Extracting citations

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages