Skip to content

Latest commit

 

History

History
194 lines (176 loc) · 10.1 KB

TODO.md

File metadata and controls

194 lines (176 loc) · 10.1 KB

Action items

Next

Transitional support for feeds

  • Simplfied record (invenio-RDM record) to EPrint XML/EPrint JSON
    • implement libdataset for v2 dataset collection
    • convert feeds' datasets to v2 ds collections
    • CaltechDATA records are a good test case for mapping Invenio-RDM to JSON that will drive EPrints templates for feeds (v1 of feeds is EPrints centric)
    • I need to generate EPrint XML as Invenio-RDM records (simplified records) first and input into the test instance of CaltechAUTHORS as RDM based repository
    • I can then take the Invenio-RDM records returned and turn those into EPrint XMl/JSON for integration into existing feeds

This is for the simplified eprinttools codebase.

  • Make a list of the missing page types I need to render to match feeds
  • Render json types
    • Misc. JSON files
      • caltechauthors-grid.json
      • caltechdata-grid.json
      • caltechthesis-grid.json
      • directory_info.json (only found for people)
      • index.json
      • pagefind-entry.json
      • group.json
      • group_list.json
      • people.json
      • people_list.json
    • CaltechTHESIS derived
      • advisor-bachelors.json (needed in groups only)
      • advisor-combined.json (needed in groups only)
      • advisor-engd.json (needed in groups only)
      • advisor-masters.json (needed in groups only)
      • advisor-other.json (needed in groups only)
      • advisor-phd.json (needed in groups only)
      • advisor-senior_major.json (needed in groups only)
      • advisor-senior_minor.json (needed in groups only)
      • advisor.json (person/group)
      • bachelors.json (person/group)
      • engd.json (person/group)
      • masters.json (person/group)
      • phd.json (person/group)
      • senior_major.json (person/group)
      • senior_minor.json (person/group)
    • CaltechAUTHORS derived
      • combined.json (from CaltechAUTHORS), should be renamed combined_authors.json
      • pub_types.json
      • article.json
      • audiovisual.json
      • book.json
      • book_section.json
      • collection.json
      • conference_item.json
      • dataset.json
      • image.json
      • interactiveresource.json
      • model.json
      • monograph.json
      • object_types.json
      • patent.json
      • software.json
      • teaching_resource.json
      • text.json
      • thesis.json
      • video.json
      • workflow.json
    • CaltechDATA
      • combined_data.json
      • software.json
      • data.json
      • teaching_resource.json
      • data_object_types.json
      • data_pub_types.json
      • data_types.json
      • image.json
  • Render keys types
  • Render Markdown types
  • Render include types
  • Render BibTeX types
  • Render RSS types
  • Make a list of the new pages types reflecting use of simplified records

Bugs

  • ep3datasets renders EPrints JSON objects without primary_object being set
  • Cleanup eprint content for public views
    • Does sanitization happen at rendering of JSON/Markdown documents or when harvesting the content?
      • Santization happens when we render content, this lets us do one harvest for both dark and public archives
    • Cleanup email content fields
    • Remove "notes"
  • Still debugging mapping the advisor_id, thesis_id and authors_id to person_id for aggregation tables and people feed generation
    • caltechthesis record 15078 is showing up with a local group of "Scott Cushing" who is actaully a committee member not a local gorup.
  • Are messy people identifiers in EPrints are preventing a simple mapping to a single person id, when the EPRint record is read in it needs the ID should be corsswalked to the cl_people_id value.
  • If feeds are "public only" then I need to strip email addresses from the JSON objects.
  • For feeds generated as REPO_NAME-RECORD_TYPE.json to name the feed by record type only, but before I add this I need to see if there is any case where thesis in CaltechAUTHORS need to be itemized along with thesis in CaltechTHESIS
  • updated value retrieved from database isn't converting correctly into a time.Time object in Go. Need to figure the best way to make this correct
  • Aggregation group_list.json has empty "combined" mapped when there are no eprintid for the specific group in the respository
  • each index.html under people and group should have a corresponding index.json that is used by Pandoc to render index.md that then renders index.html, include.include
  • Issue 40, SQL reference document_relation_type table issues
  • Issue 41, Add related URL as DOI value (really make eprints show this as a linked field in the display, don't do that in the data structure)
  • Issue 44, Funders are coming up as "UNSPECIFIED"
  • Issue 45, Related URLs are coming in as "UNSPECIFIED"
  • Issue 47, Need to strip HTML from Abstract field
  • Issue 48, Imported EPrint doesn't show up in review buffer
    • in release 1.1.1-next datestamp isn't set, example eprintid 111912
    • I might be setting the wrong event_status (e.g. buffer or inbox)
    • I need to confirm all timestamp fields and datestamp field is being set correctly
  • Issue 49, Field defaults on import including resolver URL and collection
  • Issue 50, Verify why imported and published EPrints don't show in recent additions (is the an issue with generated views or with a datestamp not getting set correctly?).

Next

  • Add deposit info to EPrintXML output
  • ioutil is depreciated, need to update the code that uses it
  • Need a means of filtering for public EPrint records only
    • is-public end point added to ep3apid
    • ?eprint_status=... added for keys and keys by timestamp ranges
  • Add Extended API support to eputil command
  • Implement Solr index record view for Solr 8.9 ingest
  • Add update end point to support update EPrints Metadata
    • Figure out how historical diffs of EPrints XML are generated in EPrints' History tab
    • Implement updates versioning the EPrint Metadata record
    • Implement file upload and manage document versioning

Completed

  • Implement an example ep3apid Python API
  • Implement a /version end point displaying ep3apid version number
  • Create an example service file for running ep3apid as a service under SystemD (Linux)
  • Create an example service file for running ep3apid as a service under LaunchD (macOS)
  • Need a Users end point to get a list of users in the system and retrieve their numeric user id
  • the various related tables that represent item lists don't have the same row count so I need to explicitly query for eprintid, pos or do JOIN and handle the NULL column cases.
  • Fix lemurprints-import-api-16 through 21 examples, re-import with ./bin/doi2eprintxml tool
  • Add script to generate "lemurprints" database with support for all fields present across our repositories so I can do robust testing and generate appropriate testdata
    • Include all fields and tables in caltechauthors
    • Include all fields and tables in caltechthesis
    • Include all fields and tables in caltechconf
    • Include all fields and tables in caltechcampuspubs
    • Include all fields and tables in calteches
    • Include all fields and tabels in caltechoh
    • Include all fields and tabels in caltechln
    • Exported selected records from production, sanitize them and write import test against lemurprints test database
    • Fetch DOI of records found in EPrints use them to test in lemurprints
  • Add create end points to support importing EPrint XML metadata into eprints
    • Implement SQLReadEPrint
    • Implement SQLCreateEPrint
    • Implement ImportEPrint for importing EPrint XML metadata
    • Implement a method that takes a table/column map and EPrint structure then renders a INSERT or REPLACE sequence to create or update an EPrint record
    • Implement a method that takes a table/column map and EPrint structure and update the EPrint structure from a sequnce of SELECT statements
  • Split clsrules into separate options to allow for more specific control
  • Add end point for /{REPO_ID}/year (list years that have eprint records with a "published" date type)
  • Add end point for /{REPO_ID}/year/{YEAR} lists eprint records published in that year
  • Implement a method to show which tables a repository instance has and the column names in each table
    • Implement a startup data structure that captures the /repository/ end point data so that table/column map can be used to build the SQL queries need to read, create, and update an EPrint record
    • Implement /repository/<REPO_ID> end point with map[string][]string{} output
  • doi2eprintxml list of DOI should allow for pipe separator and URL to object and handle it like Acacia does
  • doi2eprintxml needs to fetch the object URL and save results along side the generated EPrints XML
    • added with a -D,-download option in doi2eprintxml.
  • Added created (datestamp) end point for feeds
  • Implement Simplified JSON record based on
  • Add simplified JSON output option to
    • eputil
    • epfmt
    • doi2eprintxml

Someday, Maybe

  • Add end point to recreate Person A-Z list
  • Add end point for subjects
  • Add end point for events (Conferences)
  • Add end point for collection
  • Add end point for publication
  • Add end point for place_of_pub
  • Add end point for issn
  • Add end point for Person (Person IDs)
  • Add end point for Authors (creators)
  • Add end point for Editors
  • Add end point for contributors
  • Add end point for types
  • Add end point for corp_creators
  • Add end point ofr issuing_body