You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It will occasionally be the case that we want to ingest stories that are given to us by third parties, and they'll probably find it straightforward to offer that data in the form of json data dumps.
We've chosen WARCs as a kind of catch-all format, and already have ingestion architecture built around them. We should have a standard process- just a single well instrumented script probably- that can take a json dump in some format and produce a WARC archive for us to then ingest via our standard pipeline.
Documentation around this, to give to third parties as they produce data-dumps, would be nice- just a description of the schema we expect and some explanation of the rational behind WARCS
The text was updated successfully, but these errors were encountered:
It will occasionally be the case that we want to ingest stories that are given to us by third parties, and they'll probably find it straightforward to offer that data in the form of json data dumps.
We've chosen WARCs as a kind of catch-all format, and already have ingestion architecture built around them. We should have a standard process- just a single well instrumented script probably- that can take a json dump in some format and produce a WARC archive for us to then ingest via our standard pipeline.
Documentation around this, to give to third parties as they produce data-dumps, would be nice- just a description of the schema we expect and some explanation of the rational behind WARCS
The text was updated successfully, but these errors were encountered: