Update metadata and unstructured content extraction #53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This updates the content and metadata extraction to match the new semi-final schema and cleans up the content to be more usefully snippetable.
additional_searchable_text
metadata field (we've originally put it in the primary indexable content out of convenience and because it's what the existing search does, but this allows us to keep the content "cleaner" in case we enable snippeting in the future)public_timestamp_int
field and make the regularpublic_timestamp
an integer (there is no reason we need two fields, the API can convert the integer back into an ISO timestamp at the point of retrieval)content_purpose_supergroup
,part_of_taxonomy_tree
,locale
)