Skip to content

OpenBookPublishers/XML-last

Repository files navigation

XML-last

This repository contains a set of tools to convert an epub created with Adobe InDesign into a series of XML files that follow the TEI simplePrint customisation. For the conversion to work the InDesign documents need be formatted following a specific set of instructions (see the repo's wiki).

Files and directories in this repository

  • documents and templates: this folder contains an InDesign template. It also includes sample input files for book-, chapter- and object-level metadata
  • schemas: this folder contains the tei_simplePrint schema (also available at http://www.tei-c.org/Guidelines/Customization/index.xml) and the OBP customisation
  • LICENSE
  • README.md: this file
  • Transform-to-XML-book.xsl: this script creates a unique book-long XML TEI file by combining the documents already converted
  • Transform-to-XML-section.xsl: this is the main conversion tool that transforms each XHTML file forming the input epub into a XML TEI file
  • XML-after-transformation.py: this python script should be run after conversion to fix some small mistakes in the XML
  • XML-before-transformation.py: this python script must be run before conversion to correctly set-up the input and output folders

Running the conversion

  1. copy your input files to the project folder:
  2. Run 'XML-before-transformation.py' (you will need Python 3.6.2 or newer). This will:
    • un-package the epub
    • selectively copy the content of the epub to a newly created 'input' folder
    • re-name the book metadata file
    • create the output folder 'XML-edition'
    • transfer images, audio and video files (if any) from the epub to the 'XML-edition' folder
  3. Run 'Transform-to-XML-section.xsl' to transform each input XHTML file into a XML TEI file. The output files will be saved to the 'XML-edition' folder. To run this transformation (XSLT 2.0) a processor such as SaxonHE will be needed (https://sourceforge.net/projects/saxon/files/Saxon-HE/9.8/ -- note that the open source edition of Saxon does not allow the validation of the result documents). Saxon can be run (1) from within a product that provides a graphical user interface (such as oXygen, https://www.oxygenxml.com/), (2) from the command line or (3) from within a Java or .NET application.
    • (1) select 'Transform-to-XML-section.xsl' as both the input and the XSL source of the transformation; the output field can be left blank
    • (2) type java -jar _dir_/saxon9he.jar -s:_dir_/XML-last/Transform-to-XML-section.xsl -xsl:_dir_/XML-last/Transform-to-XML-section.xsl -o:_dir_/XML-last/Transform-to-XML-section.xsl
    • (3) see eg http://www.oracle.com/technetwork/java/gazfm-138953.html
  4. Run 'Transform-to-XML-book.xsl'. This second transformation uses Xinclude to merge the newly created XML TEI files into one single file. The output is saved to the 'XML-edition' folder as 'entire-book.xml'. (See above for more on how to run the transformation).
  5. Run 'XML-after-transformation.py' to:
    • change cross-references destination throughout 'entire-book.xml'
    • modify relative URLs throughout
    • delete empty list items
    • delete empty <div>s
    • delete tabs

Further reading

Visit the repo's wiki to read about:

If you wish to extract bibliographic citations from your content after conversion, visit https://github.com/OpenBookPublishers/Extract-citations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •