Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing issue: illegal character in path #3694

Open
henning-gerhardt opened this issue May 26, 2020 · 8 comments
Open

Indexing issue: illegal character in path #3694

henning-gerhardt opened this issue May 26, 2020 · 8 comments
Labels
migration migration from previous Kitodo versions question search search, filter

Comments

@henning-gerhardt
Copy link
Collaborator

After migrating existing meta data files to new format with the provided transformation file and start indexing all the data this error appear in the catalina.out file:

Exception in thread "Indexing 0 of type PROCESS" java.lang.IllegalArgumentException: Illegal character in path at index 9: file://./[alldeba_266928358_0001_tif/00000001.tif
        at org.kitodo.dataformat.access.FLocatXmlElementAccess.getAndRepairUri(FLocatXmlElementAccess.java:82)
        at org.kitodo.dataformat.access.FLocatXmlElementAccess.<init>(FLocatXmlElementAccess.java:65)
        at org.kitodo.dataformat.access.FileXmlElementAccess.<init>(FileXmlElementAccess.java:81)
        at org.kitodo.dataformat.access.MetsXmlElementAccess.readMeadiaUnitsTreeRecursive(MetsXmlElementAccess.java:157)
        at org.kitodo.dataformat.access.MetsXmlElementAccess.<init>(MetsXmlElementAccess.java:135)
        at org.kitodo.dataformat.access.MetsXmlElementAccess.read(MetsXmlElementAccess.java:194)
        at org.kitodo.production.services.dataformat.MetsService.loadWorkpiece(MetsService.java:105)
        at org.kitodo.production.services.dataformat.MetsService.getBaseType(MetsService.java:84)
        at org.kitodo.production.services.data.ProcessService.getBaseType(ProcessService.java:1702)
        at org.kitodo.production.services.data.ProcessService.addAllObjectsToIndex(ProcessService.java:246)
        at org.kitodo.production.helper.IndexWorker.indexObjects(IndexWorker.java:116)
        at org.kitodo.production.helper.IndexWorker.indexChunks(IndexWorker.java:110)
        at org.kitodo.production.helper.IndexWorker.run(IndexWorker.java:78)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.URISyntaxException: Illegal character in path at index 9: file://./[alldeba_266928358_0001_tif/00000001.tif
        at java.net.URI$Parser.fail(URI.java:2848)
        at java.net.URI$Parser.checkChars(URI.java:3021)
        at java.net.URI$Parser.parseHierarchical(URI.java:3105)
        at java.net.URI$Parser.parse(URI.java:3053)
        at java.net.URI.<init>(URI.java:588)
        at org.kitodo.dataformat.access.FLocatXmlElementAccess.getAndRepairUri(FLocatXmlElementAccess.java:71)
        ... 13 more

An excerpt from the meta data file of this process:

...
  <mets:fileSec>
    <mets:fileGrp USE="LOCAL">
      <mets:file ID="FILE_0000" MIMETYPE="image/tiff">
        <mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" LOCTYPE="URL" xlink:href="file://./[alldeba_266928358_0001_tif/00000001.tif"/>
      </mets:file>
      <mets:file ID="FILE_0001" MIMETYPE="image/tiff">
        <mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" LOCTYPE="URL" xlink:href="file://./[alldeba_266928358_0001_tif/00000002.tif"/>
      </mets:file>
...

I don't know how this error is influencing the index operation. Should this fixed outside of the application or should the application handle this?

@matthias-ronge
Copy link
Collaborator

METS file cannot be read. This is another job for org.kitodo.dataformat.access.FLocatXmlElementAccess.getAndRepairUri(FileType file)

@henning-gerhardt
Copy link
Collaborator Author

I don't know how the [ character was added at this position but the process title alldeba_266928358_0001 did not contain this character. So it can be removed in a manual way or during the meta data transformation?

@henning-gerhardt
Copy link
Collaborator Author

With your change in #3698 I can even more illegal characters like normal white space.

@matthias-ronge
Copy link
Collaborator

I assume the mistake was there before, only now you can see it for the first time.

@henning-gerhardt
Copy link
Collaborator Author

Sure. I don't know the reason nor the time when this illegal characters was "added". Maybe from a former migration (1.5.x to 1.6.x or so). Maybe I can fix this for our data but maybe even the application should handle this.

@matthias-ronge
Copy link
Collaborator

With your change in #3698 I can even more illegal characters like normal white space.

@henning-gerhardt, could you make me a list of the illegal characters you found in paths and how the paths should look correct?

@henning-gerhardt
Copy link
Collaborator Author

There is no list and the list of illegal characters depends on many things like your used operation system, used file system and how you may interact with this kind of characters. All illegal characters ([, , ...) which I found I removed for our instance but I don't know if this change is correct until we successful migrated and checked the data.

@matthias-ronge
Copy link
Collaborator

Should this fixed outside of the application or should the application handle this?

Since we don't have a clear error pattern, I would answer your initial question that such errors have to be corrected locally outside the application. Should we still be able to obtain a clear error pattern in the future, which affects several installations, then we can of course also incorporate a correction function here.

@solth solth removed the 3.x label Jul 7, 2022
@matthias-ronge matthias-ronge added the search search, filter label Feb 27, 2023
@matthias-ronge matthias-ronge added the migration migration from previous Kitodo versions label May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migration migration from previous Kitodo versions question search search, filter
Projects
None yet
Development

No branches or pull requests

3 participants