Skip to content

Commit

Permalink
sections
Browse files Browse the repository at this point in the history
  • Loading branch information
jxbz committed Jul 30, 2024
1 parent 9dae810 commit 8cdd80b
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion docs/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ Frequently asked questions

Feel free to reach out or start a `GitHub issue <https://github.com/jxbz/modula/issues>`_ if you have any questions about Modula. We'll post answers to any useful or common questions on this page.

Conceptual questions
^^^^^^^^^^^^^^^^^^^^^

.. dropdown:: The gradient is a vector: how can a vector have a spectral norm?
:icon: question

Expand Down Expand Up @@ -114,6 +117,9 @@ Feel free to reach out or start a `GitHub issue <https://github.com/jxbz/modula/
2. however, the conditions are not unique, and in specific cases you can modify the rules---so long as you know what you're doing;
3. you may want to take advantage of scale symmetries if you are interested in designing low-precision training algorithms.

Related work
^^^^^^^^^^^^^

.. dropdown:: What is the relationship between Modula and spectral-μP?
:icon: question

Expand Down Expand Up @@ -141,6 +147,9 @@ Feel free to reach out or start a `GitHub issue <https://github.com/jxbz/modula/

I (Jeremy) still think an analogue of AGD that is also fast and performant might still be possible. It might involve combining Modula with ideas from people like Konstantin Mishchenko and Aaron Defazio such as `Prodigy <https://arxiv.org/abs/2306.06101>`_ or `schedule-free optimizer <https://arxiv.org/abs/2405.15682>`_. I think this is a great direction for future work.

Modula package
^^^^^^^^^^^^^^^

.. dropdown:: The modular norm involves a max---why do I not see any maxes in the package?
:icon: question

Expand All @@ -167,7 +176,10 @@ Feel free to reach out or start a `GitHub issue <https://github.com/jxbz/modula/

Not yet, although we plan to implement this and provide some examples.

Research philosophy
^^^^^^^^^^^^^^^^^^^^

.. dropdown:: Do I need to be a mathematical savant to contribute to research of this kind?
:icon: question

I don't think so. There are a lot of very technical people working in this field, bringing with them some quite advanced tools from math and theoretical physics, and this is great. But in my experience it's usually the simpler and more elementary ideas that actually work in practice. I strongly believe that deep learning theory is still at the stage of model building. And I resonate with both Rahimi and Recht's call for `"simple theorems" and "simple experiments" <https://archives.argmin.net/2017/12/11/alchemy-addendum/>`_ and George Dahl's call for `a healthy dose of skepticism <https://www.youtube.com/watch?v=huTx3rtv8q8>`_ when evaluating claims in the literature.
I don't think so. There are a lot of very technical people working in this field bringing with them some quite advanced tools from math and theoretical physics, and this is great. But in my experience it's usually the simpler and more elementary ideas that actually work in practice. I strongly believe that deep learning theory is still at the stage of model building. And I resonate with both Rahimi and Recht's call for `"simple theorems" and "simple experiments" <https://archives.argmin.net/2017/12/11/alchemy-addendum/>`_ and George Dahl's call for `a healthy dose of skepticism <https://www.youtube.com/watch?v=huTx3rtv8q8>`_ when evaluating claims in the literature.

0 comments on commit 8cdd80b

Please sign in to comment.