Skip to content

Arrangement (Directory Structure)

Cliff Wulfman edited this page May 1, 2015 · 3 revisions

Blue Mountain Arrangement and Directory Structure

The components of the journal object have different storage and access requirements. Master TIFF files are very large binary files that will seldom be accessed but must be carefully preserved (they are expensive or impossible to replace). Image derivatives, too, are large binary files, but they can be regenerated from the master TIFFs and therefore require less care, but they will be accessed from a variety of sources (primarily the web). PDF files are hybrids: they are large binary files, composites of image derivatives and OCR output that cannot easily be recreated and so must be preserved more carefully than image derivatives while still being accessible. Metadata files are relatively small but very expensive to replace, and so must be curated carefully. They are also liable to updating, so version tracking is important.

The Blue Mountain Project will manage these assets separately. The non-binary data and metadata will be stored and managed in a distributed version control system (DVCS), which will enable change management, collaborative development among PUL and its METS/ALTO vendor, and resource sharing, as stipulated in the grant.

  • Master TIFF files and text-under-image PDFs will be maintained in a preservation store;
  • Image derivatives, and delivery-optimized copies of the PDFs, will be kept in an access store.

1 The Metadata Store

Metadata will be organized as a hierarchy of files and directories, like this:

- metadata/
  - periodicals/
    - bmtnID/
      - bmtnID.mets.xml
      - issues

The issues directory will be organized by publication date, following the same convention as that used for constructing identifiers. So, for example, for bmtnabi_1859-01-05_01:

 - bmtnabi/
   - issues/
     - 1859/
	- 01/
	  - 05_01/
	    - bmtnabi_1859-01-05_01.mets.xml
	    - alto/
	      - bmtnabi_1859-01-05_01-001.alto.xml
	      - bmtnabi_1859-01-05_01-002.alto.xml

and for bmtnaam_1922-03_01:

 - bmtnaam/
   - issues/
     - 1922/
	- 03_01/
	  - bmtnaam_1922-03_01.mets.xml
	  - alto/
	    - bmtnaam_1922-03_01-001.alto.xml
	    - bmtnaam_1922-03_01-002.alto.xml

2 The Preservation Store

The Preservation Store will be arranged as a filesystem mirroring the structure of the metadata tree and rooted at /usr/share/BlueMountain/pstore/periodicals.

 - pstore/
   - periodicals/
     - bmtnid/
	- issues/
	  - CCYY/
	    - MM/
	      - DD_II/
		- bmtnid_issueid.pdf
		- bmtnid_issueid_001.tif
		- bmtnid_issueid_002.tif

3 The Access Store

Like the Preservation Store, the Access store will be arranged as a filesystem mirroring the structure of the metadata tree; it will be rooted at /usr/share/BlueMountain/astore/periodicals.

 - astore/
   - periodicals/
     - bmtnid/
	- issues/
	  - CCYY/
	    - MM/
	      - DD_II/
		- bmtnid_issueid.pdf
		- generative/
		  - bmtnid_issueid_001.jp2
		  - bmtnid_issueid_002.jp2
		  - bmtnid_issueid_003.jp2
		- delivery/
		  - bmtnid_issueid_001.jp2
		  - bmtnid_issueid_002.jp2
		  - bmtnid_issueid_003.jp2