-
-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to use fstrings #224
base: master
Are you sure you want to change the base?
Conversation
Fstrings are better for RAM usage and processing time, as well as being more easily readable. Literally no reason not to use them over "string + string" stuff. Converted over some of the "string.format()" stuff to fstrings as well for slightly reduced filesize. Also removed os.path.join() statements in the "copy_[XYZ]_[to/from]_gdrive()" functions. They were unnecessary, took up a bit extra processing time/RAM, and didn't make this any more compatible because these were already functions that can be run on Colaboratory anyways. Then, finally, added .txt file extension to sample files so they can be viewed directly in Colaboratory and you don't need to download them, and it's more obviously a file to be opened for those running this tool on their own computers.
os.path.join(checkpoint_path, | ||
'model-{}').format(counter-1)) | ||
os.path.join(checkpoint_path, f"model-{counter-1}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, changes nothing besides a slight efficiency increase.
text = '======== SAMPLE {} ========\n{}\n'.format( | ||
index + 1, text) | ||
text = f'======== SAMPLE {index + 1} ========\n{text}\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More readable in the code, slightly smaller filesize
os.path.join(SAMPLE_DIR, run_name, | ||
'samples-{}').format(counter), 'w') as fp: | ||
os.path.join(SAMPLE_DIR, run_name, f"samples-{counter}.txt", 'w') as fp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, more readable in the code, slightly smaller filesize, and added .txt
file extension for accessibility in Colaboratory.
'[{counter} | {time:2.2f}] loss={loss:2.2f} avg={avg:2.2f}' | ||
'[{} | {2.2f}] loss={2.2f} avg={2.2f}' | ||
.format( | ||
counter=counter, | ||
time=time.time() - start_time, | ||
loss=v_loss, | ||
avg=avg_loss[0] / avg_loss[1])) | ||
counter, | ||
time.time() - start_time, | ||
v_loss, | ||
avg_loss[0] / avg_loss[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really no particular reason here, other than the fact that the identifiers were unnecessary (and personal preference).
checkpoint_folder = os.path.join('checkpoint', run_name) | ||
checkpoint_folder = f'checkpoint/{run_name}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More efficient, and os.path.join()
was unneeded, since this is a function that can only be run on Colaboratory with the /content/XYZ
references later in the function.
EDIT: called the function a "command" lmao
shutil.copytree(checkpoint_folder, "/content/drive/My Drive/" + checkpoint_folder) | ||
shutil.copytree(checkpoint_folder, f"/content/drive/My Drive/{checkpoint_folder}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you get the point by now. Fstrings are just overall better in these scenarios.
I'll stop commenting on every single one of these, as you can probably guess by now why I used them in each scenario.
r = requests.get(url_base + "/models/" + model_name + "/" + file_name, stream=True) | ||
r = requests.get(f"{url_base}/models/{model_name}/{file_name}", stream=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes nothing, besides increasing the efficiency of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting with reasons for changes
Fstrings are better for RAM usage and processing time, as well as being more easily readable. Literally no reason not to use them over
string + string
stuff. Converted over some of thestring.format()
stuff to fstrings as well for slightly reduced filesize.Also removed
os.path.join()
statements in thecopy_[XYZ]_[to/from]_gdrive()
functions. They were unnecessary, took up a bit extra processing time/RAM, and didn't make this any more compatible because these were already functions that can be run on Colaboratory anyways.Then, finally, added .txt file extension to sample files so they can be viewed directly in Colaboratory and you don't need to download them, and it's more obviously a file to be opened for those running this tool on their own computers.
A Recommendation
I didn't implement it in this pull request, but I recommend using
zipfile
to compress the checkpoints for Google Drive storage (or at leastgzip
to make compressed.tar.gz
archives). While it realistically doesn't save much percentage-wise, it still shaves off a good 100+ MB from each checkpoint, which is insanely beneficial for Google Drive storage.If you're okay with using something not directly from Python for that, you can even use the following code I have made for creating extremely well-compressed checkpoint archives with 7-Zip. With this code, I've achieved a filesize reduction of 200 MB or more. It simply takes ~5 minutes extra to compress the archive, though I consider this a small price to pay for a huge filesize reduction. Since these functions can run only in Colaboratory, there should be no problems using this programming in them.