-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive lora #66
base: main
Are you sure you want to change the base?
Adaptive lora #66
Conversation
Thanks for this feature and your analysis! This is a great initial effort. However, there are quite a few things to be figured out before I can merge this PR. For example,
|
Yeah this indeed might lead to low-compression ratio. For MNIST case the compression ratio seems plausible. But regarding "fix the number of parameters", how do we determine this number from the percentage covariance threshold? Should we put all singular values across layers together and sort them?
Thanks for catching this! |
I don't have a concrete answer to the first question at the moment, and believe this is largely a research question (which is exciting). I know many literatures in communication efficient distributed training, which also does gradient compression, also applies different compression ratio across layers. You can probably review them, try/develop new ideas, and find the best working one. Once we have this, we can merge this PR! |
Also, we can think about using different ranks for |
Instead of explicitly saving rank information, I did it by using the shape of the lora weight matrix. Let me know if this seems okay! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would you be able to come up with some basic unit tests for lora
? This would be massively helpful for the future development!
analog/logging/option.py
Outdated
@@ -97,11 +97,16 @@ def _sanity_check(self): | |||
) | |||
self._log["grad"] = True | |||
|
|||
def eval(self): | |||
def eval(self, log="grad"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having "grad"
as a default value, what do you think about having None
as a default value, and when it's None
we set it to "grad"
with a warning message like:
def eval(self, log=None):
if log is None:
get_logger().warning("we automatically set 'log' to 'grad'. if this is not a desired behavior, please explicitly set your 'log' value.")
log = "grad"
if isinstance(log, str):
...
if __name__ == "__main__": | ||
unittest.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion: I am not sure if this two lines are necessary, let's try to follow the format of other tests!
print(if_scores) | ||
# torch.save(if_scores, f"{os.path.dirname(os.path.abspath(__file__))}/data/if_analog_lora.pt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the comments! Also, if we are to print the scores( or any other sorts) in the test, I would love to see it in the formatted way!
def construct_mlp(num_inputs=784, num_classes=10): | ||
return torch.nn.Sequential( | ||
nn.Flatten(), | ||
nn.Linear(num_inputs, 4, bias=False), | ||
nn.ReLU(), | ||
nn.Linear(4, 2, bias=False), | ||
nn.ReLU(), | ||
nn.Linear(2, num_classes, bias=False), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be only used for MNIST data, I am not sure if this can be a general util function.. Let me know of your thoughts!
if_scores = if_scores.numpy().tolist() | ||
torch.save(if_scores, "if_analog_pca.pt") | ||
if_scores = if_scores.numpy().tolist()[0] | ||
torch.save(if_scores, f"if_analog_pca.pt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: f-string unnecessary.
@@ -91,6 +89,6 @@ | |||
|
|||
# Save | |||
if_scores = if_scores.numpy().tolist()[0] | |||
torch.save(if_scores, "if_analog_scheduler.pt") | |||
torch.save(if_scores, f"if_analog_scheduler.pt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
Add
adaptive_threshold
option to dynamically determine the rankExperiment result with MNIST
To use this feature, set compression_ratio_by_covariance or compression_ratio_by_memory in config.yaml
This will determine the rank needed for PCA to explain 80% of the covariance
or
This will determine the rank that compresses gradient memory to 80%.