-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openalex fetch example #61
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkampe thanks for this. I didn't test it yet, but I briefly reviewed the code. There is one tiny comment about gitignore file. And I have one more comment about SPARQL API based approach. Ivan Mrsulja makes SPARQL API based approach working in the case of DSpace ETL (#63). There is a parameter for the main script file (the value of the parameter might be tdb or sparql). I am wondering whether that approach might be copied in this PR as well?
/example-scripts/bash-scripts/full-harvest-examples/1.13-1.15-examples/example-openalex/logs/ | ||
/example-scripts/bash-scripts/full-harvest-examples/1.13-1.15-examples/example-openalex/data/ | ||
/example-scripts/bash-scripts/full-harvest-examples/1.13-1.15-examples/example-openalex/previous-harvest/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that lines 3-7 includes this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that adding:
**/data
**/logs
**/previous-harvest
to root-level .gitignore
will solve this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Please check out my comments, I believe they can be helpfull.
//log.trace("Adding record: " + fixedkey + "_" + recID); | ||
//log.trace("data: "+ sb.toString()); | ||
//log.info("rhOutput: "+ this.rhOutput); | ||
//log.info("recID: "+recID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably remove these comments (and also in the other places you commented-out the code snippets), it will clean up the code slightly.
sb.append(" <"); | ||
sb.append(SpecialEntities.xmlEncode(field)); | ||
sb.append(">"); | ||
|
||
// insert field value | ||
sb.append(SpecialEntities.xmlEncode(val.toString().trim())); | ||
|
||
// Field END | ||
sb.append("</"); | ||
sb.append(SpecialEntities.xmlEncode(field)); | ||
sb.append(">\n"); | ||
return sb.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these appends be chained using StringBuilder's default builder pattern?
.replaceAll(" |/", "_") | ||
.replaceAll("\\(|\\)", "") | ||
.replaceAll("/", "_"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use replaceAll("[ /]", "_").replaceAll("[()]", "")
to make this more clear.
.replaceAll(" |/", "_") | ||
.replaceAll("\\(|\\)", "") | ||
.replaceAll("/", "_"); | ||
if (!Character.isDigit(fixedkey.charAt(0)) && !fixedkey.equals("abstract_inverted_index")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can fixedKey
ever be null
? If yes, then I think there should be a null-check for that edge case.
} | ||
|
||
public String getTagName(String field, Object val) { | ||
StringBuffer sb = new StringBuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is StringBuffer
used here? If this class will not be used in a multithreaded environment I think we should switch to using StringBuilder
everywhere because it is a lot faster.
Added an example of fetching publication metadata from Openalex based on the JSON fetch.
Three example queries are available in example-scripts/bash-scripts/full-harvest-examples/1.13-1.15-examples/example-openalex/openAlexfetch.config.xml. Choose one for first test or modify it to fit to your needs.
OpenAlex fetch is already used in the Research Atlas: https://forschungsatlas.fid-bau.de/research
Namespace in example-scripts/bash-scripts/full-harvest-examples/1.13-1.15-examples/example-openalex/openalex-to-vivo.datamap.xsl needs to be adjusted according to your settings in runtime.properties:
<xsl:variable name = "baseURI">https://forschungsatlas.fid-bau.de/individual/</xsl:variable>
JSONFetch.java was extended to be capable of handling nested object. Also some filtering for unwanted characters was added to avoid problems with the XSLTranslator (javax.xml.transform)
Closes #56