-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Establish a data control plan #8
Comments
Though we hope that wikipedia will take this under their wing sometime, we should not assume that they will. Based on that, we're setting up a community-based model for managing the generation of snapshots from kiwix dumps. This is one of the first tests of the model that evolved out of the Data Rescue hackathons in early 2017 -- where communities of hackers, content specialists and do-gooders work together to manage the work of pulling data off of centralized servers and redistributing it. To apply this model we're partnering with @b5 from http://www.qri.io/, who did a lot of the technical work behind the Data Rescue hackathons. Many other people like @dcwalk @titaniumbones @mayaad @trinberg @ abergman contributed to the evolution of this model. The ProcessKey elements of this process:
Balancing Open Community with Careful Chain of CustodyIt may seem like the open community model is at odds with maintaining a clear chain of custody when processing the snapshots. Here's how we will balance the two: Open community contributions (via github Pull Requests, etc) wherever possible.
Meanwhile a smaller group of committers will handle:
Eventually we might incorporate cryptographic techniques (ie. SNARKS) to prove that the intended operations (and only the intended operations) were run on the snapshots, which would allow anyone to build the snapshots without corrupting the chain of custody. This will require some research. For now, it's overkill. |
Note: one cool thing about using IPFS with this structure: if you want to validate that someone actually ran the scripts they claim, you can just re-run the scripts from the same sources and compare the hashes of the results... |
pinging @patcon and |
Ok we've started to make progress on this. Currently this is just defaulting to sending emails while we figure out how to connect the requests to a queue, but it's a start. Live url here: https://task-mgmt.archivers.space Note, you'll need write access to I've outlined some next steps in the repo readme, @flyingzumwalt it might make sense to touch base on next steps sometime soon, specifically around the question of where the actual task execution is going to happen. If we need to build that, that's ok. In the meantime I still have lots to chew on. |
The Many platforms allow for public and separate upgrade to private repo access when need arrives. |
Is archivers requesting access? I thought it was just using GH oauth response to know if the user has write access to this repo -- so you need write permission in the GH repo in order to manage stuff in |
The management page does: https://task-mgmt.archivers.space if you try to login with GH. |
aha. yeah we have to change that. |
Oh yes completely agreed. I'll drop the permissions ask, will report back once the change is up. |
Ok, change is now live. App shouldn't request access to private repos. |
Update: @b5 is making amazing progress building a robust and reusable solution for our data-control needs datatogether/task_mgmt#4 |
we should outline the control plan (hand over to wikipedia itself, etc)
The text was updated successfully, but these errors were encountered: