Synchronize updates; fix AdamW lr_t (keras)
OverLordGoldDragon
released this
04 Jun 02:20
·
9 commits
to master
since this release
BUGFIXES:
- Last weight in network would be updated with
t_cur
one update ahead, desynchronizing it from all other weights AdamW
inkeras
(optimizers.py, optimizers_225.py) weight updates were not mediated byeta_t
, so cosine annealing had no effect.
FEATURES:
- Added
lr_t
to tf.keras optimizers to track "actual" learning rate externally; useK.eval(model.optimizer.lr_t)
to get "actual" learning rate for givent_cur
anditerations
- Added
lr_t
vs. iterations plot to README, and source code inexample.py
MISC:
- Added
test_updates
to ensure all weights update synchronously, and thateta_t
first applies on weights as-is and then updates according tot_cur
- Fixes #47