Copy Data Between Users Across Dev & Prod #711

noschiff · 2022-09-02T04:47:04Z

Summary

This PR creates a TypeScript script for the command line to copy a user's data from one user to another, optionally across dev and prod Firebases. Firebase makes it very difficult to copy data between documents in their web interface since there's no way to easily download documents, and it's nearly impossible to upload data using the web interface. The only practical way to move data on Firebase is to use a program with a service account that can get the data for a document and set the contents of another document.

To further investigate #672, we need to make changes to the data for a user with a broken account on prod. However, we cannot and should not change data on prod, so this script is needed to transfer the data to dev where we can safely experiment with it. Only the TPM should use this with prod. However, this script will also allow us to make copies of documents on dev for developers to experiment with.

Test Plan

Run the script from the command line with various arguments in preview mode and actually copy between documents on dev (e.g. copy from your account on dev to another google account that you have). You don't actually have to copy to another user if you want; the "user" only refers to the name of a document, so you can name your documents what you want. However, you'll need an email that corresponds to the document name to actually use the data. I recommend only having the dev service account in your repo at first until you have thoroughly read through the code and are confident that it will behave as it is supposed to once a prod service account credentials are in the repo.

Again, it is very important to read through the code because a prod service account grants the ability to mess with our data on prod if the script has an error that I didn't notice.

Example script: preview a copy from a dummy account to your data on dev:

npm run ts-node -- scripts/copy-user-data.ts -f dev/dummyaccount -t dev/[email protected] -o "log.json"

Example script: copy from a dummy account to your data on dev:

npm run ts-node -- scripts/copy-user-data.ts -f dev/dummyaccount -t dev/[email protected] -o "log.json" --execute

dti-github-bot · 2022-09-02T04:47:19Z

[diff-counting] Significant lines: 123.

github-actions · 2022-09-02T04:51:02Z

Visit the preview URL for this PR (updated for commit fad5d3c):

https://cornelldti-courseplan-dev--pr711-copy-user-data-on0f6uti.web.app

_{(expires Wed, 26 Oct 2022 02:49:00 GMT)}

_{🔥 via Firebase Hosting GitHub Action 🌎}

zachary-kent · 2022-09-02T04:57:12Z

If we're going to be parsing command line arguments, I think it would be better to use a CLI library. That way, we can have named arguments for the source and destination, as well as switches for boolean arguments.

src/admin-copy-user-data.ts

noschiff · 2022-09-02T05:29:59Z

@zachary0kent

If we're going to be parsing command line arguments, I think it would be better to use a CLI library. That way, we can have named arguments for the source and destination, as well as switches for boolean arguments.

That sounds good! I've only ever manually handled command line arguments, let alone done anything with the command line in TypeScript. What do you suggest?

zachary-kent · 2022-09-02T05:35:29Z

@zachary0kent

If we're going to be parsing command line arguments, I think it would be better to use a CLI library. That way, we can have named arguments for the source and destination, as well as switches for boolean arguments.

That sounds good! I've only ever manually handled command line arguments, let alone done anything with the command line in TypeScript. What do you suggest?

I think the minimist package would be good here. yargs is fun and pirate-themed, but also likely a little overkill.

src/admin-copy-user-data.ts

noschiff · 2022-09-04T19:06:05Z

@zachary0kent

If we're going to be parsing command line arguments, I think it would be better to use a CLI library. That way, we can have named arguments for the source and destination, as well as switches for boolean arguments.

That sounds good! I've only ever manually handled command line arguments, let alone done anything with the command line in TypeScript. What do you suggest?

I think the minimist package would be good here. yargs is fun and pirate-themed, but also likely a little overkill.

Would that work fine with how we run typescript using npm run ts-node which then calls ts-node -T -P tsconfig.node.json?

zachary-kent · 2022-09-05T03:17:28Z

@zachary0kent

If we're going to be parsing command line arguments, I think it would be better to use a CLI library. That way, we can have named arguments for the source and destination, as well as switches for boolean arguments.

That sounds good! I've only ever manually handled command line arguments, let alone done anything with the command line in TypeScript. What do you suggest?

I think the minimist package would be good here. yargs is fun and pirate-themed, but also likely a little overkill.

Would that work fine with how we run typescript using npm run ts-node which then calls ts-node -T -P tsconfig.node.json?

Yeah I think it should be fine.

handotdev · 2022-09-05T05:21:29Z

Wow PR #711. Go team!

Just wanted to say hi this is Han I was a former PM at CoursePlan

zachary-kent

minimist can be a dev dependency. Also, I think fromUser and toUser should be separate arguments, instead of being bundled into the argument for the source/destination environment.

noschiff · 2022-09-05T16:27:49Z

minimist can be a dev dependency. Also, I think fromUser and toUser should be separate arguments, instead of being bundled into the argument for the source/destination environment.

I think it's nice to have them in one argument because it refers to a pseudo path to the user's data. Also, the command gets very long with 5 separate arguments.

zachary-kent · 2022-09-05T17:41:32Z

minimist can be a dev dependency. Also, I think fromUser and toUser should be separate arguments, instead of being bundled into the argument for the source/destination environment.

I think it's nice to have them in one argument because it refers to a pseudo path to the user's data. Also, the command gets very long with 5 separate arguments.

That's true--I think what I mainly take issue with is how easy it is to enter arguments in an incorrect format. How would you feel about having the from and to environment arguments being switches instead (i.e. --fromDev or --fromDev=true)? I also think we should print an error if any arguments (like the user) are invalid.

benjamin-shen

I appreciate the idea of copying data and I think it's a good idea! Some questions:

Did you test this script successfully?
What happens if one of the users doesn't exist?
Can you put some example CLI commands for running this script in the PR description?

How would you feel about having the from and to environment arguments being switches instead (i.e. --fromDev or --fromDev=true)?

I like this approach better, except it should be for dev by default and there should be a switch to run it in prod. Similarly, for execute=true, it should be false by default (our previous scripts used a --dry-run switch but I think it should be a dry run by default).

Some broader thoughts:

We should keep in mind that dev and prod could have different schemas. Maybe we should keep a log of database copies between environments just in case something breaks
There is currently no way to roll back changes, eg. if the script fails halfway through (db atomicity). Should there be? Maybe we want a way to back up data 😱
In theory it's possibly for two people to run the same or unrelated scripts at the same time, which is an issue for scripts that affect multiple collections (db isolation). Maybe we should post in slack whenever we run a script that writes to database

src/admin-copy-user-data.ts

benjamin-shen · 2022-09-08T22:51:52Z

src/admin-copy-user-data.ts

+      }
+    }
+  }
+  return copied;


How much data does this look like for an average user? Is it small enough that it's readable in the console?

Honestly, way too much to read in the console. But I wanted the user to be able to see what they will be copying over first. Maybe I should write it to a file? Firestore doesn't exactly use JSONs, but I could probably make something work?

If it's way too much to read in the console, then we probably shouldn't print everything. Maybe we can write it to a file or multiple files (one per collection) -- if you go this route, make sure to add the output to the gitignore. If possible, we should still have some console outputs (eg. alerts for starting and ending copying a collection) that'll help ensure correctness.

Yup it writes to a file now based on the -o argument.

src/admin-copy-user-data.ts

noschiff · 2022-09-09T03:42:54Z

@benjamin-shen I'll read more later. Thanks for the review Ben!!!

There is currently no way to roll back changes, eg. if the script fails halfway through (db atomicity). Should there be? Maybe we want a way to back up data 😱

Yes, this is dangerous. I could catch all the errors, store the data I'm overwriting, and then write that data back to the doc. It's an interesting problem because there's many things that could fail… even just writing the data back to the doc could cause issues.

We should keep in mind that dev and prod could have different schemas. Maybe we should keep a log of database copies between environments just in case something breaks

I was thinking that maybe we could run through the data we're replacing to "understand" if our new data will match the schema, but honestly that wouldn't be too useful especially if we're trying to copy to dev to understand a broken doc. Writing to prod is a lot more dangerous and it would be good to have an invariant checker to ensure that the data we write is okay, but honestly we should never write to prod from this. So, I think we can maybe get away with blindly copying data to dev because there's no harm in breaking our own dev users. Can you explain what you're thinking with the log? Are you talking about a way to revert back?

benjamin-shen · 2022-09-20T01:05:29Z

Yes, this is dangerous. I could catch all the errors, store the data I'm overwriting, and then write that data back to the doc. It's an interesting problem because there's many things that could fail… even just writing the data back to the doc could cause issues.

We can probably just accept that we can't guarantee atomicity and rerun the script if it failed midway.

Writing to prod is a lot more dangerous and it would be good to have an invariant checker to ensure that the data we write is okay, but honestly we should never write to prod from this. So, I think we can maybe get away with blindly copying data to dev because there's no harm in breaking our own dev users.

This is a good point. I agree

How would you feel about having the from and to environment arguments being switches instead (i.e. --fromDev or --fromDev=true)?

I like this approach better, except it should be for dev by default and there should be a switch to run it in prod. Similarly, for execute=true, it should be false by default (our previous scripts used a --dry-run switch but I think it should be a dry run by default).

Could you incorporate this? Let me know if you need some clarification

benjamin-shen · 2022-09-26T02:08:31Z

@noschiff looking good! Can you

change the example command in the script documentation to not use your email addresses and to not include --execute
pretty print the output json so it's not on a single line

* write ts script to copy user data * rewrite to use command line arguments * update credential input to modular SDK * move core logic into a function * clean up code * add option to preview changes * validate arguments outside of function * narrow types with union * rename function arguments * rename variable * put arguments into options object * refactor command line arguments to use minimist * add minimist types * make minimist dev dependency * update package-lock.json * move to scripts directory * rename variables * improve output logging for user * pretty-print output * update example in comments

noschiff added 6 commits August 29, 2022 15:23

write ts script to copy user data

b6d9e4b

rewrite to use command line arguments

32b82a0

update credential input to modular SDK

fa53657

move core logic into a function

3fc7b3e

clean up code

79da096

add option to preview changes

bd561ac

noschiff requested a review from zachary-kent September 2, 2022 04:47

noschiff requested a review from a team as a code owner September 2, 2022 04:47

zachary-kent reviewed Sep 2, 2022

View reviewed changes

src/admin-copy-user-data.ts Outdated Show resolved Hide resolved

src/admin-copy-user-data.ts Outdated Show resolved Hide resolved

src/admin-copy-user-data.ts Outdated Show resolved Hide resolved

zachary-kent reviewed Sep 2, 2022

View reviewed changes

src/admin-copy-user-data.ts Outdated Show resolved Hide resolved

noschiff added 4 commits September 2, 2022 01:40

validate arguments outside of function

6b0e530

narrow types with union

ff964b1

rename function arguments

ec80da4

rename variable

c47a093

put arguments into options object

9080a25

noschiff added 2 commits September 5, 2022 00:43

refactor command line arguments to use minimist

e1fd24a

add minimist types

ae66f35

noschiff requested a review from zachary-kent September 5, 2022 04:51

zachary-kent reviewed Sep 5, 2022

View reviewed changes

make minimist dev dependency

51d9ce2

update package-lock.json

d6455cd

benjamin-shen requested changes Sep 8, 2022

View reviewed changes

noschiff added 3 commits September 10, 2022 02:15

move to scripts directory

55b631a

rename variables

625f7d3

improve output logging for user

d5fc9c4

noschiff requested a review from benjamin-shen September 25, 2022 03:08

noschiff added 2 commits September 25, 2022 22:43

pretty-print output

d21fce4

update example in comments

fad5d3c

benjamin-shen approved these changes Sep 26, 2022

View reviewed changes

noschiff merged commit 4d2991b into master Sep 26, 2022

noschiff deleted the copy-user-data branch September 26, 2022 03:49

noschiff mentioned this pull request Sep 26, 2022

Fix Script Copy Data Between Users #715

Merged

noschiff mentioned this pull request Oct 27, 2022

Fall 2022 Pre-Enroll Release #749

Merged

43 tasks

noschiff added the util Utility and tools for development, like scripts or workflow automation label Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy Data Between Users Across Dev & Prod #711

Copy Data Between Users Across Dev & Prod #711

noschiff commented Sep 2, 2022 •

edited

Loading

dti-github-bot commented Sep 2, 2022 •

edited

Loading

github-actions bot commented Sep 2, 2022 •

edited

Loading

zachary-kent commented Sep 2, 2022

noschiff commented Sep 2, 2022

zachary-kent commented Sep 2, 2022

noschiff commented Sep 4, 2022

zachary-kent commented Sep 5, 2022

handotdev commented Sep 5, 2022

zachary-kent left a comment

noschiff commented Sep 5, 2022

zachary-kent commented Sep 5, 2022

benjamin-shen left a comment

benjamin-shen Sep 8, 2022

noschiff Sep 9, 2022

benjamin-shen Sep 19, 2022

noschiff Sep 20, 2022

noschiff commented Sep 9, 2022 •

edited

Loading

benjamin-shen commented Sep 20, 2022

benjamin-shen commented Sep 26, 2022 •

edited

Loading

Copy Data Between Users Across Dev & Prod #711

Copy Data Between Users Across Dev & Prod #711

Conversation

noschiff commented Sep 2, 2022 • edited Loading

Summary

Test Plan

dti-github-bot commented Sep 2, 2022 • edited Loading

github-actions bot commented Sep 2, 2022 • edited Loading

zachary-kent commented Sep 2, 2022

noschiff commented Sep 2, 2022

zachary-kent commented Sep 2, 2022

noschiff commented Sep 4, 2022

zachary-kent commented Sep 5, 2022

handotdev commented Sep 5, 2022

zachary-kent left a comment

Choose a reason for hiding this comment

noschiff commented Sep 5, 2022

zachary-kent commented Sep 5, 2022

benjamin-shen left a comment

Choose a reason for hiding this comment

benjamin-shen Sep 8, 2022

Choose a reason for hiding this comment

noschiff Sep 9, 2022

Choose a reason for hiding this comment

benjamin-shen Sep 19, 2022

Choose a reason for hiding this comment

noschiff Sep 20, 2022

Choose a reason for hiding this comment

noschiff commented Sep 9, 2022 • edited Loading

benjamin-shen commented Sep 20, 2022

benjamin-shen commented Sep 26, 2022 • edited Loading

noschiff commented Sep 2, 2022 •

edited

Loading

dti-github-bot commented Sep 2, 2022 •

edited

Loading

github-actions bot commented Sep 2, 2022 •

edited

Loading

noschiff commented Sep 9, 2022 •

edited

Loading

benjamin-shen commented Sep 26, 2022 •

edited

Loading