Quick Question on adam.py #2

btahir · 2019-04-19T02:53:50Z

Awesome work! The adam.py implementation is very useful! I just had a quick question about:

alpha_t = tf.sqrt(1 - beta2_power) / (1 - beta1_power)

What does this achieve? Wouldn't the original Adam implementation be?

var - lr * m_t / (tf.sqrt(v_t) + eps)

rather than

var - lr * (m_t*alpha_t) / (tf.sqrt(v_t) + eps)

Are you adding weight decay with alpha_t maybe?

The text was updated successfully, but these errors were encountered:

angetato · 2019-07-13T18:56:22Z

Hello @btahir,
I know my answer is late but for those with the same question here is the answer :
In the original paper, see algorithm 1, they use

instead of m_t (same for v_t) to compute the update. Thus :

Thanks.

btahir changed the title ~~Quick QUestion on Adam.py~~ Quick Question on adam.py Apr 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Question on adam.py #2

Quick Question on adam.py #2

btahir commented Apr 19, 2019

angetato commented Jul 13, 2019

Quick Question on adam.py #2

Quick Question on adam.py #2

Comments

btahir commented Apr 19, 2019

angetato commented Jul 13, 2019