-
Notifications
You must be signed in to change notification settings - Fork 0
/
EPrints-extended-API.html
205 lines (202 loc) · 10.3 KB
/
EPrints-extended-API.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
<!DOCTYPE html>
<html>
<head>
<title>eprinttools - EPrints-extended-API.html</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/site.css">
</head>
<body>
<header>
<a href="http://library.caltech.edu" title="link to Caltech Library Homepage"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="README.html">README</a></li>
<li><a href="LICENSE">LICENSE</a></li>
<li><a href="install.html">INSTALL</a></li>
<li><a href="user-manual.html">User Manual</a></li>
<li><a href="search.html">Search Docs</a></li>
<li><a href="about.html">About</a></li>
<li><a href="https://github.com/caltechlibrary/eprinttools">GitHub</a></li>
</ul>
</nav>
<section>
<h1 id="eprints-extended-api">EPrints extended API</h1>
<p>The EPrints software package from University of Southampton provides
a rich internal Perl API along with a RESTful web API. The latter has
been used extensively by Caltech Library to facilitate content reuse
across campus for our various EPrints repositories. The challenge now is
to move beyond the present limitations. (See priorities two and three of
the <a
href="https://caltechlibrary.atlassian.net/wiki/spaces/ADMIN/pages/2500493313/AY22+Library-Wide+Strategic+Plan+Objectives">AY
22 Caltech Library’s strategic plan</a>)</p>
<p>Extending EPrints directly is error prone and cumbersome.
Implementing features in Perl safely is only the start of trouble if we
modify EPrints directly. In contrast EPrints’ MySQL database structure
has proven to be durable and predictable. MySQL can be leverage directly
to extended API seeks to beyond our current constraints.</p>
<p>What should an extended web API look like?</p>
<h2 id="design-considerations">Design considerations</h2>
<ul>
<li>The extended API should be web accessible to support data platforms
such as feeds.library.caltech.edu as well as our growing cast of
application hosted on apps.library.caltech.edu</li>
<li>It needs to interact with MySQL’s EPrints database safely, e.g. be
read only</li>
<li>Minimize the load on EPrints’ MySQL database, e.g. favor simple SQL
queries perhaps limiting them to single table scans</li>
<li>Be near zero management, it should run as a service that doesn’t
require on going interventions and easily integrate into DLD’s
monitoring infrastructure</li>
</ul>
<p>An extended API should provide a limited web service that maps URL
end points to simple MySQL queries run against the various EPrints
databases. The service should be easy to implement require minimal
resources, e.g. one prepared SQL statement per end point.</p>
<p>Security and privacy should be front and center when implementing any
web service. By returning EPrint ID only we limit the risk of exposing
in appropriate metadata (e.g. author information). The EPrint ID is an
integer without specific meaning. It does not give you access to
sensitive information.</p>
<h2 id="unique-ids-to-eprint-ids">Unique IDs to EPrint IDs</h2>
<p>The following URL end points are intended to take one unique
identifier and map that to one or more EPrint IDs. This can be done
because each unique ID targeted can be identified by querying a single
table in EPrints. In addition the scan can return the complete results
since all EPrint IDs are integers and returning all EPrint IDs in any of
our repositories is sufficiently small to be returned in a single HTTP
request.</p>
<ul>
<li><code>/<REPO_ID>/doi/<DOI></code> with the adoption of
EPrints “doi” field in the EPrint table it makes sense to have a quick
translation of DOI to EPrint id for a given EPrints repository.</li>
<li><code>/<REPO_ID>/creator-id/<CREATOR_ID></code> scans
the name creator id field associated with creators and returns a list of
EPrint ID</li>
<li><code>/<REPO_ID>/creator-orcid/<ORCID></code> scans the
“orcid” field associated with creators and returns a list of EPrint
ID</li>
<li><code>/<REPO_ID>/editor-id/<CREATOR_ID></code> scans the
name creator id field associated with editors and returns a list of
EPrint ID</li>
<li><code>/<REPO_ID>/contributor-id/<CONTRIBUTOR_ID></code>
scans the “id” field associated with a contributors and returns a list
of EPrint ID</li>
<li><code>/<REPO_ID>/advisor-id/<ADVISOR_ID></code> scans
the name advisor id field associated with advisors and returns a list of
EPrint ID</li>
<li><code>/<REPO_ID>/committee-id/<COMMITTEE_ID></code>
scans the committee id field associated with committee members and
returns a list of EPrint ID</li>
<li><code>/<REPO_ID>/group-id/<GROUP_ID></code> this scans
group ID and returns a list of EPrint IDs associated with the group</li>
<li><code>/<REPO_ID>/grant-number/<GRANT_NUMBER></code>
returns a list of EPrint IDs associated with the grant number</li>
<li><code>/<REPO_ID>/creator-name/<FAMILY_NAME>/<GIVEN_NAME></code>
scans the name fields associated with creators and returns a list of
EPrint ID</li>
<li><code>/<REPO_ID>/editor-name/<FAMILY_NAME>/<GIVEN_NAME></code>
scans the family and given name field associated with a editors and
returns a list of EPrint ID</li>
<li><code>/<REPO_ID>/contributor-name/<FAMILY_NAME>/<GIVEN_NAME></code>
scans the family and given name field associated with a contributors and
returns a list of EPrint ID</li>
<li><code>/<REPO_ID>/advisor-name/<FAMILY_NAME>/<GIVEN_NAME></code>
scans the name fields associated with advisors returns a list of EPrint
ID</li>
<li><code>/<REPO_ID>/committee-name/<FAMILY_NAME>/<GIVEN_NAME></code>
scans the family and given name fields associated with committee members
and returns a list of EPrint ID</li>
<li><code>/<REPO_ID>/pubmed/<PUBMED_ID></code> returns a
list of EPrint IDs associated with the PubMed ID</li>
<li><code>/<REPO_ID>/issn/<ISSN></code> returns a list of
EPrint IDs associated with the ISSN</li>
<li><code>/<REPO_ID>/isbn/<ISSN></code> returns a list of
EPrint IDs associated with the ISSN</li>
<li><code>/<REPO_ID>/patent-number/<PATENT_NUMBER</code>
returns a list of EPrint IDs associated with the patent number</li>
</ul>
<h2 id="change-events">Change Events</h2>
<p>The follow API end points would facilitate faster updates to our
feeds platform as well as allow us to create a separate public view of
our EPrint repository content.</p>
<ul>
<li><code>/<REPO_ID>/updated/<TIMESTAMP>/<TIMESTAMP></code>
returns a list of EPrint IDs updated starting at the first timestamp
(timestamps should have a resolution to the minute, e.g. “YYYY-MM-DD
HH:MM:SS”) through inclusive of the second timestmap (if the second is
omitted the timestamp is assumed to be “now”)</li>
<li><code>/<REPO_ID>/deleted/<TIMESTAMP>/<TIMESTAMP></code>
through the returns a list of EPrint IDs deleted starting at first
timestamp through inclusive of the second timestamp, if the second
timestamp is omitted it is assumed to be “now”</li>
<li><code>/<REPO_ID>/pubdate/<APROX_DATESTAMP>/<APPOX_DATESTAMP></code>
this query scans the EPrint table for records with publication starts
starting with the first approximate date through inclusive of the second
approximate date. If the second date is omitted it is assumed to be
“today”. Approximate dates my be expressed just the year (starting with
Jan 1, ending with Dec 31), just the year and month (starting with first
day of month ending with the last day) or year, month and day. The end
returns zero or more EPrint IDs.</li>
</ul>
<h2 id="nice-to-have-end-points">Nice to have end points</h2>
<p>The following end points would be nice to have but they would either
requirecustomization of our existing EPrints deployments or require
significant work on part of our Library staff to populate.</p>
<ul>
<li><code>/<REPO_ID>/editor-orcid/<ORCID></code> scans the
“orcid” field associated with a editors and returns a list of EPrint
ID</li>
<li><code>/<REPO_ID>/contributor-orcid/<ORCID></code> scans
the “orcid” field associated with a contributors and returns a list of
EPrint ID</li>
<li><code>/<REPO_ID>/advisor-orcid/<ORCID></code> scans the
“orcid” field associated with advisors and returns a list of EPrint
ID</li>
<li><code>/<REPO_ID>/committee-orcid/<ORCID></code> scans
the “orcid” field associated with committee members and returns a list
of EPrint ID</li>
<li><code>/<REPO_ID>/group-ror/<ROR></code> this scans the
local group ROR related fields and returns a list of EPrint ids.</li>
<li><code>/<REPO_ID>/funder-ror/<FUNDER_ROR></code> returns
a list of EPrint IDs associated with the funder’s ROR</li>
</ul>
<p>EPrints XML is complex and hard to work with. A simplified data
structure could make working with our repository data much easier. If
user/role restrictions were enforced in an extended EPrints API it could
provide a clean JSON expression of a more general bibliographic record.
Additionally would couple provide JSON documents suitable for direct
ingest into Solr/Lunr search engines. At that stage it might also be
desirable to allow updates to existing EPrints records via the extended
API.</p>
</section>
<footer>
<span>© 2021 <a href="https://www.library.caltech.edu/copyright">Caltech Library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span><a href="mailto:[email protected]">Email Us</a></span>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
</footer>
<!-- START: PrettyFi from https://github.com/google/code-prettify -->
<script>
/* We want to add the class "prettyprint" to all the pre elements */
var pre_list = document.querySelectorAll("pre");
pre_list.forEach(function(elem) {
elem.classList.add("prettyprint");
elem.classList.add("linenums");/**/
elem.classList.add("json"); /**/
});
</script>
<style>
li.L0, li.L1, li.L2, li.L3, li.L4, li.L5, li.L6, li.L7, li.L8, li.L9
{
color: #555;
list-style-type: decimal;
}
</style>
<link rel="stylesheet" type="text/css" href="/css/prettify.css">
<script src="https://cdn.jsdelivr.net/gh/google/code-prettify@master/loader/run_
prettify.js"></script>
<!-- END: PrettyFi from https://github.com/google/code-prettify -->
</body>
</html>