Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop indexing collection if they aren't public #1322

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dnoneill
Copy link
Contributor

@dnoneill dnoneill commented Feb 5, 2024

From what I can tell this is working on sw. It will not fix the issue with currently indexed items but it will stop this issue from re-occuring. I have been using https://searchworks-preview-stage.stanford.edu/view/bh250xv2418 + https://argo-stage.stanford.edu/view/druid:qr592gj5093 to test that this works. It looks like you can release/withdraw a collection without its objects and the items will get reindex. Be warned, it can take a while to reindex.

Closes sul-dlss/SearchWorks#3840

If a argo collection has not been released to Searchworks, the solr index with not provide the id in the collection field. It still provides collection_with_title which becomes -|- which I thought would be good to keep to let us know that there is a non public collection associated with the record but Searchworks doesn't display those non viewable collections.

@jcoyne
Copy link
Contributor

jcoyne commented Feb 6, 2024

@dbranchini is the designed solution to not to link to the collection, or to not show any collection information at all? I think this code change only accomplishes the latter.

@dbranchini
Copy link

@jcoyne I think that's more of a question for @andrewjbtw.

@andrewjbtw
Copy link

to not show any collection information at all

Are you referring to the collection information being in the MODS? I don't know how we can prevent that from appearing. My understanding of the scope of this issue is to prevent displaying links to collection pages that don't exist.

get_value(public_xml_doc.xpath('/publicObject/identityMetadata/objectLabel'))
end

def collection_in_searchworks?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've been tripped up by this method name and/or the context for how this gets used. Maybe it's just the method name, but I wonder if there's a more explicit/clearer way to implement this (kind of confusing in the first place) requirement.

From your description, it sounds like we're trying to suppress the id and title from the collection and collection_with_title fields?

https://github.com/sul-dlss/searchworks_traject_indexer/blob/main/lib/traject/config/sdr_config.rb#L295-L303

Instead of tweaking the underlying data for collection objects (which.. feels like it might be useful in some contexts.. debugging if nothing else?), what if PublicXmlRecord had a released_collections accessor that we used anywhere we needed it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works. The upside of this is we don't have to make any changes to the code in searchworks or the solr index. I have also left collection_with_title to still index with this it just returns an empty value of -|- which I thought would be good for debugging. Basically it would indicate that this has a collection attached but it isn't public. Probably not the best indicator so I can update the code, but I think that means we would need to update the solr index along with Searchworks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't display "Digital collection" link when collection is not released.
5 participants