XML-last

This repository contains a set of tools to convert an epub created with Adobe InDesign into a series of XML files that follow the TEI simplePrint customisation. For the conversion to work the InDesign documents need be formatted following a specific set of instructions (see the repo's wiki).

Files and directories in this repository

documents and templates: this folder contains an InDesign template. It also includes sample input files for book-, chapter- and object-level metadata
schemas: this folder contains the tei_simplePrint schema (also available at http://www.tei-c.org/Guidelines/Customization/index.xml) and the OBP customisation
LICENSE
README.md: this file
Transform-to-XML-book.xsl: this script creates a unique book-long XML TEI file by combining the documents already converted
Transform-to-XML-section.xsl: this is the main conversion tool that transforms each XHTML file forming the input epub into a XML TEI file
XML-after-transformation.py: this python script should be run after conversion to fix some small mistakes in the XML
XML-before-transformation.py: this python script must be run before conversion to correctly set-up the input and output folders

Running the conversion

copy your input files to the project folder:
- the epub of the book you want to convert (see Preparing the epub for conversion)
- the file containing book- and optionally chapter-level metadata (see documents and templates/book-chapter-metadata-TEMPLATE.xml and Book and chapter metadata)
- (optional) the file containing object-level metadata (see documents and templates/Object-metadata-TEMPLATE.csv and Object metadata)
Run 'XML-before-transformation.py' (you will need Python 3.6.2 or newer). This will:
- un-package the epub
- selectively copy the content of the epub to a newly created 'input' folder
- re-name the book metadata file
- create the output folder 'XML-edition'
- transfer images, audio and video files (if any) from the epub to the 'XML-edition' folder
Run 'Transform-to-XML-section.xsl' to transform each input XHTML file into a XML TEI file. The output files will be saved to the 'XML-edition' folder. To run this transformation (XSLT 2.0) a processor such as SaxonHE will be needed (https://sourceforge.net/projects/saxon/files/Saxon-HE/9.8/ -- note that the open source edition of Saxon does not allow the validation of the result documents). Saxon can be run (1) from within a product that provides a graphical user interface (such as oXygen, https://www.oxygenxml.com/), (2) from the command line or (3) from within a Java or .NET application.
- (1) select 'Transform-to-XML-section.xsl' as both the input and the XSL source of the transformation; the output field can be left blank
- (2) type java -jar _dir_/saxon9he.jar -s:_dir_/XML-last/Transform-to-XML-section.xsl -xsl:_dir_/XML-last/Transform-to-XML-section.xsl -o:_dir_/XML-last/Transform-to-XML-section.xsl
- (3) see eg http://www.oracle.com/technetwork/java/gazfm-138953.html
Run 'Transform-to-XML-book.xsl'. This second transformation uses Xinclude to merge the newly created XML TEI files into one single file. The output is saved to the 'XML-edition' folder as 'entire-book.xml'. (See above for more on how to run the transformation).
Run 'XML-after-transformation.py' to:
- change cross-references destination throughout 'entire-book.xml'
- modify relative URLs throughout
- delete empty list items
- delete empty <div>s
- delete tabs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XML-last

Files and directories in this repository

Running the conversion

Further reading

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
documents and templates		documents and templates
schemas		schemas
.gitignore		.gitignore
Dockerfile.test		Dockerfile.test
LICENSE		LICENSE
README.md		README.md
Transform-to-XML-book.xsl		Transform-to-XML-book.xsl
Transform-to-XML-section.xsl		Transform-to-XML-section.xsl
XML-after-transformation.py		XML-after-transformation.py
XML-before-transformation.py		XML-before-transformation.py
pre-commit.sh		pre-commit.sh

License

OpenBookPublishers/XML-last

Folders and files

Latest commit

History

Repository files navigation

XML-last

Files and directories in this repository

Running the conversion

Further reading

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages