Skip to content

Synchronize updates; fix AdamW lr_t (keras)

Compare
Choose a tag to compare
@OverLordGoldDragon OverLordGoldDragon released this 04 Jun 02:20
· 9 commits to master since this release

BUGFIXES:

  • Last weight in network would be updated with t_cur one update ahead, desynchronizing it from all other weights
  • AdamW in keras (optimizers.py, optimizers_225.py) weight updates were not mediated by eta_t, so cosine annealing had no effect.

FEATURES:

  • Added lr_t to tf.keras optimizers to track "actual" learning rate externally; use K.eval(model.optimizer.lr_t) to get "actual" learning rate for given t_cur and iterations
  • Added lr_t vs. iterations plot to README, and source code in example.py

MISC:

  • Added test_updates to ensure all weights update synchronously, and that eta_t first applies on weights as-is and then updates according to t_cur
  • Fixes #47