You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use cases drawn from SUL Text Search Study Report (July 2019). Please note: this is ultimately likely to be an overlapping set of requirements that requires further investigation and specification.
Use case 1:
The Magario diaries include 40 years of handwritten pages in Japanese by donor Steven Yoba, representing a rare instance of trans-Japanese history (Japan + US). The Japanese diary pages have been accessioned as individual images in the SDR. OCR does not work well for Japanese; Japanese transcriptions for each page were created by hand and are currently in non-accessioned individual MS Word pages. This is a high profile collection with broad faculty support. The content should be searchable and ideally accessible to text-mining. Curator: Murphy Kao
Use case 2:
The NDC collection comprises Japanese books cataloged by Hoover using the Nippon Decimal Classification system and housed at SAL1/2. The collection, which was transferred to EAL in the early 2000s, contains many rare books related to 20th century history and was digitized by Google Books years ago. Curator: Regan Murphy Kao
@anarchivist comment:
Our implementation of the IIIF Content Search API does not currently support the level of analysis for CJK query terms as provided for SearchWorks. However, the Content Search API supports CJK text, and examples of CJK transcription via annotation do exist.
Quinn Dombrowski comment:
Dombrowski has experimented with creating page-level Japanese-language OCR files (TXT) for the Magario Family Diaries (see below for more information about this collection). Note that in addition to requiring Japanese-language support in Content Search, remediating the accessioned diary page images with the OCR files and enabling text search support for the collection would also require infrastructure development.
The text was updated successfully, but these errors were encountered:
Use cases drawn from SUL Text Search Study Report (July 2019). Please note: this is ultimately likely to be an overlapping set of requirements that requires further investigation and specification.
Use case 1:
The Magario diaries include 40 years of handwritten pages in Japanese by donor Steven Yoba, representing a rare instance of trans-Japanese history (Japan + US). The Japanese diary pages have been accessioned as individual images in the SDR. OCR does not work well for Japanese; Japanese transcriptions for each page were created by hand and are currently in non-accessioned individual MS Word pages. This is a high profile collection with broad faculty support. The content should be searchable and ideally accessible to text-mining. Curator: Murphy Kao
Use case 2:
The NDC collection comprises Japanese books cataloged by Hoover using the Nippon Decimal Classification system and housed at SAL1/2. The collection, which was transferred to EAL in the early 2000s, contains many rare books related to 20th century history and was digitized by Google Books years ago. Curator: Regan Murphy Kao
@anarchivist comment:
Our implementation of the IIIF Content Search API does not currently support the level of analysis for CJK query terms as provided for SearchWorks. However, the Content Search API supports CJK text, and examples of CJK transcription via annotation do exist.
Quinn Dombrowski comment:
Dombrowski has experimented with creating page-level Japanese-language OCR files (TXT) for the Magario Family Diaries (see below for more information about this collection). Note that in addition to requiring Japanese-language support in Content Search, remediating the accessioned diary page images with the OCR files and enabling text search support for the collection would also require infrastructure development.
The text was updated successfully, but these errors were encountered: