-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy Data Between Users Across Dev & Prod #711
Conversation
[diff-counting] Significant lines: 123. |
Visit the preview URL for this PR (updated for commit fad5d3c): https://cornelldti-courseplan-dev--pr711-copy-user-data-on0f6uti.web.app (expires Wed, 26 Oct 2022 02:49:00 GMT) 🔥 via Firebase Hosting GitHub Action 🌎 |
If we're going to be parsing command line arguments, I think it would be better to use a CLI library. That way, we can have named arguments for the source and destination, as well as switches for boolean arguments. |
That sounds good! I've only ever manually handled command line arguments, let alone done anything with the command line in TypeScript. What do you suggest? |
I think the |
Would that work fine with how we run typescript using |
Yeah I think it should be fine. |
Wow PR #711. Go team! Just wanted to say hi this is Han I was a former PM at CoursePlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minimist
can be a dev dependency. Also, I think fromUser
and toUser
should be separate arguments, instead of being bundled into the argument for the source/destination environment.
I think it's nice to have them in one argument because it refers to a pseudo path to the user's data. Also, the command gets very long with 5 separate arguments. |
That's true--I think what I mainly take issue with is how easy it is to enter arguments in an incorrect format. How would you feel about having the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the idea of copying data and I think it's a good idea! Some questions:
- Did you test this script successfully?
- What happens if one of the users doesn't exist?
- Can you put some example CLI commands for running this script in the PR description?
How would you feel about having the from and to environment arguments being switches instead (i.e. --fromDev or --fromDev=true)?
I like this approach better, except it should be for dev by default and there should be a switch to run it in prod. Similarly, for execute=true
, it should be false by default (our previous scripts used a --dry-run
switch but I think it should be a dry run by default).
Some broader thoughts:
- We should keep in mind that dev and prod could have different schemas. Maybe we should keep a log of database copies between environments just in case something breaks
- There is currently no way to roll back changes, eg. if the script fails halfway through (db atomicity). Should there be? Maybe we want a way to back up data 😱
- In theory it's possibly for two people to run the same or unrelated scripts at the same time, which is an issue for scripts that affect multiple collections (db isolation). Maybe we should post in slack whenever we run a script that writes to database
src/admin-copy-user-data.ts
Outdated
} | ||
} | ||
} | ||
return copied; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much data does this look like for an average user? Is it small enough that it's readable in the console?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, way too much to read in the console. But I wanted the user to be able to see what they will be copying over first. Maybe I should write it to a file? Firestore doesn't exactly use JSONs, but I could probably make something work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's way too much to read in the console, then we probably shouldn't print everything. Maybe we can write it to a file or multiple files (one per collection) -- if you go this route, make sure to add the output to the gitignore. If possible, we should still have some console outputs (eg. alerts for starting and ending copying a collection) that'll help ensure correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup it writes to a file now based on the -o
argument.
@benjamin-shen I'll read more later. Thanks for the review Ben!!!
Yes, this is dangerous. I could catch all the errors, store the data I'm overwriting, and then write that data back to the doc. It's an interesting problem because there's many things that could fail… even just writing the data back to the doc could cause issues.
I was thinking that maybe we could run through the data we're replacing to "understand" if our new data will match the schema, but honestly that wouldn't be too useful especially if we're trying to copy to dev to understand a broken doc. Writing to prod is a lot more dangerous and it would be good to have an invariant checker to ensure that the data we write is okay, but honestly we should never write to prod from this. So, I think we can maybe get away with blindly copying data to dev because there's no harm in breaking our own dev users. Can you explain what you're thinking with the log? Are you talking about a way to revert back? |
We can probably just accept that we can't guarantee atomicity and rerun the script if it failed midway.
This is a good point. I agree
Could you incorporate this? Let me know if you need some clarification |
@noschiff looking good! Can you
|
* write ts script to copy user data * rewrite to use command line arguments * update credential input to modular SDK * move core logic into a function * clean up code * add option to preview changes * validate arguments outside of function * narrow types with union * rename function arguments * rename variable * put arguments into options object * refactor command line arguments to use minimist * add minimist types * make minimist dev dependency * update package-lock.json * move to scripts directory * rename variables * improve output logging for user * pretty-print output * update example in comments
Summary
This PR creates a TypeScript script for the command line to copy a user's data from one user to another, optionally across dev and prod Firebases. Firebase makes it very difficult to copy data between documents in their web interface since there's no way to easily download documents, and it's nearly impossible to upload data using the web interface. The only practical way to move data on Firebase is to use a program with a service account that can get the data for a document and set the contents of another document.
To further investigate #672, we need to make changes to the data for a user with a broken account on prod. However, we cannot and should not change data on prod, so this script is needed to transfer the data to dev where we can safely experiment with it. Only the TPM should use this with prod. However, this script will also allow us to make copies of documents on dev for developers to experiment with.
Test Plan
Run the script from the command line with various arguments in preview mode and actually copy between documents on dev (e.g. copy from your account on dev to another google account that you have). You don't actually have to copy to another user if you want; the "user" only refers to the name of a document, so you can name your documents what you want. However, you'll need an email that corresponds to the document name to actually use the data. I recommend only having the dev service account in your repo at first until you have thoroughly read through the code and are confident that it will behave as it is supposed to once a prod service account credentials are in the repo.
Again, it is very important to read through the code because a prod service account grants the ability to mess with our data on prod if the script has an error that I didn't notice.
Example script: preview a copy from a dummy account to your data on dev:
Example script: copy from a dummy account to your data on dev: