-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up loading of REMARKs using JupyterHub or alternative #5
Comments
@llorracc, @mnwhite - I'm going to check out Google Colab sometime in the next week or so, using one or two of the existing notebooks as tests. Are they all about the same in terms of computational power needed? If not, can you suggest a notebook that's on the computation-heavy end, but not an outlier? |
Shauna,
The BufferStockTheory.ipynb REMARK notebook is a good test case. The
objective here, for workshops etc, is a bit different from the other case
where we wanted computational power. What I'm hoping to find for ordinary
workshops and notebooks used therein is a way to speed up the initial
launching of the notebook. Especially in cases where the notebook has a
lot of setup stuff. The BufferStockTheory.ipynb notebook checks to see
whether "latex" has been installed on the VM and if so changes some
configuration stuff. Unfortunately, it can take 2-4 minutes for the remote
VM to FINALLY "go live" because of all of the prep software installation
that has to be done. BufferStockTheory provides a good example of a case
where latex is preferred (and used if the tools necessary are available).
…On Wed, Mar 20, 2019 at 2:46 PM Shauna ***@***.***> wrote:
@llorracc <https://github.com/llorracc>, @mnwhite
<https://github.com/mnwhite> - I'm going to check out Google Colab
sometime in the next week or so, using one or two of the existing notebooks
as tests. Are they all about the same in terms of computational power
needed? If not, can you suggest a notebook that's on the computation-heavy
end, but not an outlier?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABQlf7yxnrx-M4U7Q4xwaeVHs2iyxdufks5vYju1gaJpZM4bX937>
.
--
- Chris Carroll
|
I'm liking Google CoLab so far. Loading a notebook stored in github was as simple as using the customized url which takes the format: colab.research.google.com/github/$our_organization_name/$repository_name/blob/master/$relative_path_to_notebook.ipynb It loaded fairly fast, about 10 seconds by my count, and it looks like the Latex is all there. The hosted runtime options seem fairly limited - only Python 2.7 and 3.6 are options, and they come with certain libraries pre-installed, which means we don't have as much flexibility in choose the environment we want the notebooks to run in. I think that means we're stuck with a little cell at the top of all our notebooks that looks something like:
But the notebooks currently have set-up cells anyway, so. Anyway, here's a list of issues I encountered and their solutions:
ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running This is due to trying to load
As expected I then got told that HARK didn't exist, and added
In cell 25 the latex on lines 22-26 (I think, annoyingly it doesn't give line numbers) is causing an error. I deleted the four lines and it ran fine but without the necessary output. I don't know what's going on well enough to fix it. Maybe the Latex didn't load as well as I thought? Same/similar issue with the last code cell. Anyway, that was surprisingly straightforward, but I'm also not familiar enough with the notebook to know if there's content errors or missing pieces. Chris & Matt, you should definitely take a look! |
Great summary @shaunagm! Its exactly where QuantEcon is also at - including a cell with pip install commands to get the requirements. I think it's a good solution, the requirements are transparent and means the notebooks can run standalone. |
If @shauna Gordon-McKeon <[email protected]> and @andrij Stachurski
<[email protected]> and the QuantEcon team are all converging on Google
colab, I feel that it must be right!
What is the next step we need to take to start using CoLab?
…On Wed, Mar 27, 2019 at 8:59 PM DrDrij ***@***.***> wrote:
Great summary @shaunagm <https://github.com/shaunagm>! Its exactly where
QuantEcon is also at - including a cell with pip install commands to get
the requirements. I think it's a good solution, the requirements are
transparent and means the notebooks can run standalone.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABQlfx8Sw6_JYr6iDBdt37BLrgr-Tvrcks5vbBQLgaJpZM4bX937>
.
--
- Chris Carroll
|
We chatted about this during our meeting, but I'm going to test out how fast Colab is when loading Latex (Chris, do you know what package that is?) and then, if that works, the next steps are a) making sure existing notebooks all work in colab; b) changing how we refer to the notebooks on the website to point to colab instead of mybinder and c) re-organizing the remarks repo to remove the mybinder stuff since it will no longer be necessary. |
Update: the issue with overline appears to be a minor syntax error. There's a cell in the original notebook that uses the syntax I'm finding the underline issue deeply confusing, because isn't underline in basic latex? Why would we need to import anything? I tried adding |
Does basic LaTeX have underline in math mode? That might be the issue.
…On Thu, Mar 28, 2019 at 3:19 PM Shauna ***@***.***> wrote:
Update: the issue with overline appears to be a minor syntax error.
There's a cell in the original notebook that uses the syntax overline c
instead of overline{c}. Once I fixed that, and replaced underline with
underbar, I ceased to get errors, although I can't verify that the output
is what's desired beyond saying "yup sure does include some underlines and
overlines".
I'm finding the underline issue deeply confusing, because isn't underline
in basic latex? Why would we need to import anything?
I tried adding !pip install jupyterlab_latex and it doesn't change
initial load time at all, since it's not executed until the cell is run.
When you do run the cell, it adds another 3-4 seconds, which is not great
but not terrible. Importing jupyterlab_latex did not solve the underline
issue though.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ANUQFU_bvqcp93WgS1CAgpG5QLIaqLNpks5vbRW_gaJpZM4bX937>
.
|
I've got no idea - I've barely ever used LaTeX, and I'm finding the documentation hard to parse. Unfortunately Colaboratory's documentation is not great either (documentation & user support has always been a weak point for Google) so I'm not sure what's even running in the notebook. How do you know if you're in math mode? How can I check what LaTeX extensions would have underline implemented? Anyway, here's a version of the notebook with the fixes described above: |
Math mode is anything between dollar signs like $y = 5x -3$ or in
environments like {equation} or {eqnarray}.
Underline in math mode (if not standard) should be in the package amsmath.
It's one of the packages I include in my document template (along with
amsfonts), so I've lost track of what's in it.
…-- mnw
On Thu, Mar 28, 2019 at 3:52 PM Shauna ***@***.***> wrote:
I've got no idea - I've barely ever used LaTeX, and I'm finding the
documentation hard to parse. Unfortunately Colaboratory's documentation is
not great either (documentation & user support has always been a weak point
for Google) so I'm not sure what's even running in the notebook. How do you
know if you're in math mode? How can I check what LaTeX extensions would
have underline implemented?
Anyway, here's a version of the notebook with the fixes described above:
https://colab.research.google.com/github/shaunagm/REMARK/blob/master/REMARKs/BufferStockTheory/BufferStockTheory.ipynb#scrollTo=cB71h4tn1dC0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ANUQFeem4bgmI3YYPn08bdkeaPF6CQ-xks5vbR1lgaJpZM4bX937>
.
|
Okay, I think I was misunderstanding what math mode even is. Yeah, that may be the issue. If swapping underline for underbar doesn't work for aesthetic reasons, I can explore the issue further, but for now I'll hold off. |
Actually, underbar seems to do something completely different from
underline: It just prints an underbar character as regular text (rather
than underneath the targeted thing). Like Matt, I always import the
amsmath environments and so don't really even have a good idea what is in
them, but I think underline is, so if we want these figures to work I guess
we need latex. It does seem like there should be some way to force install
it at the beginning of execution -- people are used to there being a little
delay while things start up but will be more discombobulated by a 5 second
delay when a simple plot is requested.
Shauna, I don't know how familiar you are with matplotlib. If you compare
the output obtained by mybinder and that by colab, you will see that the
colab figures are missing important stuff -- like the axes! If you are
highly familiar with matplotlib and know exactly how to fix this, then
please do so. Otherwise, I will ask my student to try to construct the
figures in such a way that they look nice in Jupyter or CoLab or ipython or
whatever.
One other thing: The notebooks rely heavily on some Jupyter nbextensions,
in particular the "codefolding" extension -- again, compare the mybinder
version with the CoLab version to see how useful codefolding is (in hiding
the code until the person wants to expose it). I'm guessing that getting
codefolding working on CoLab just requires putting more of the config stuff
in the /binder directory (either equirements.txt or mayb postBuild) into
the beginning of the Jupyter notebook.
PS. It's unfortunate that it seems that a notebook written for CoLab won't
work with mybinder and vice versa, because the former requires the !pip
install stuff to be at the beginning of the notebook (which is a better
place for it) and mybinder requires it to be in a special folder. The
CoLab approach is better, but we've already configured a bunch of things to
work with mybinder, and obviously one would prefer to be able to choose on
the fly whether to view a given notebook in mybinder, CoLab, or some other
tool (google "Six easy ways to run your jupyter notebook in the cloud" from
"Data School" for an overview of the options, which seem to be growing by
the minute).
…On Thu, Mar 28, 2019 at 4:05 PM Shauna ***@***.***> wrote:
Okay, I think I was misunderstanding what math mode even is. Yeah, that
may be the issue. If swapping underline for underbar doesn't work for
aesthetic reasons, I can explore the issue further, but for now I'll hold
off.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABQlf7d-3SRYT6uwY_LBVMilwUOlEcbBks5vbSB7gaJpZM4bX937>
.
--
- Chris Carroll
|
I'm not super familiar with matplotlib. I'm sure I could fix the display issues given enough time but it may be more efficient to have a student work through it. I'm going to try to generate a list of libraries and extensions we need. I suppose I could just use everything in the binder requirements.txt folder but I think that will include some extra stuff. But your "jupyter_contrib_nbextensions" is in there, so it's a good start. re: configuring for both mybinder and colab - we should be able to do that, the question is whether we're okay with the potential added complexity. For instance, we could have a line in the notebook which checks whether something's already installed and only installs it if it isn't. |
From Chris's email, it seems line of of the biggest barriers with regard to CoLab is installing latex. I'm going to take a look and see if there's a way to install a much smaller subset of the library. |
Update from the weekly meeting: Chris has tried the approach from this StackOverflow answer in this notebook (on colab) but it's not currently working - I'm going to try to debug. There also appears to be an issue where notebooks aren't running if you aren't logged in to a google account, which I need to look into. |
There is now a revision of BufferStockTheory.ipynb that works on MyBinder, CoLab, and on my Mac using the local jupyter notebook server that comes with Anaconda. The notebook requires LaTeX tools from the American Mathematical Society's amsmath package in order for matplotlib to be able to render all the figures, and the solution to this problem is painful: The first (code) cell installs all needed dependencies from scratch, which can be very slow if LaTeX is not installed (e.g., for either MyBinder or CoLab's default environments). The notebook starts by testing whether LaTeX is installed on the machine. If not, it tests whether the machine is ubuntu or not. MyBinder and CoLab both have a default of ubuntu; so you're in ubuntu, it installs the full version of LaTeX:
in CoLab this seems to take about 2-3 minutes, which is painful but not intolerable. MyBinder can take 10 minutes or more, if it works at all (it seems to fail altogether about half the time). This kind of defeats the purpose of "live" notebooks. (Even when myBinder says it "found built image" it might say "Launching server ... Launch attempt 1 failed, retrying ...") It appears that MyBinder allows you to use prespecified Docker images instead of their default setup, but it is not clear to me whether that would be any faster. And at this point it looks like CoLab doesn't let you use your own docker images? If there is not now a way to "pre-cook" or "pre-cache" a VM image to speed up loading, I'm guessing that's not an accident. At some point MyBinder needs a revenue model, and I'd totally be willing to pay something to reduce the loading time from 5 minutes to 30 seconds. I just wish they'd roll out their pricing scheme and let me pay them for this! PS. The installations are not necessary (and therefore a waste of time) if the libraries are already available. But the overriding goal was to have a single notebook that works everywhere, and so the installation stuff all has to be in that first cell since CoLab does not have a mechanism like MyBinder for prespecifying requirements. |
@llorracc can you add a link to the "revision of BufferStockTheory.ipynb that works on MyBinder, CoLab, and on my Mac using the local jupyter notebook server that comes with Anaconda"? |
There WAS a link -- it just didn't work! That's what I get for trying to construct the link myself. I've now fixed it -- just click on BufferStockTheory.ipynb above |
I've heard from a couple folks here at PyCon, including @MridulS (one of our sprinters), that many projects use MathJax to load subsets of Latex fast. Here's some info on configuration options. Still need to research this |
Hmmm, I had thought of MathJax as more of a rendering engine (it draws the characters on the your bitmap) than a tool that can read packages and interpret things like |
@llorracc, what do you need done by the 24th? Specifically, which notebooks do you need to be ready, and what are the ideal, and maximum acceptable, times for them to load? I know you said you wanted Bufferstock Theory ready - any others? |
The best test-case is the BufferStockTheory remark, partly because it has a
boolean that determines whether to use the "full version" of LaTeX (which
is the huge 2.3 gb thing) or the built-in slimmed down version. It would
be easy to compare the two versions and use that to test the degree of
speedup.
The degree of speedup needed is actually a more complicated question than
it might seem, because of the different ways that the different tools
work. For MyBinder, the whole virtual machine is built according to specs
in the /binder file before anything is displayed. For Google CoLab, you
have to include the packages you need in a "pip install" command in the
first cell, so the notebook displays immediately but can't be used until it
finishes building.
Let's wait until I get back to the US early next week to focus on this; it
shouldn't take too long with both of us focused on it. Also, Andrij has
been tasked with essentially the same mission by QuantEcon and so I want to
piggyback on what they do. That changes my view that we should make a "big
push" to come up with a long-term solution; instead, our objective should
be to have a quick-and-dirty solution before Jun 24, and then work with
Andrij on the longer term solution.
…On Thu, Jun 13, 2019 at 3:42 PM Shauna ***@***.***> wrote:
@llorracc <https://github.com/llorracc>, what do you need done by the
24th? Specifically, which notebooks do you need to be ready, and what are
the ideal, and maximum acceptable, times for them to load?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAKCK72UXHXOPILRMHEIKI3P2JFDDA5CNFSM4G273X52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXTXJXA#issuecomment-501707996>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKCK7Y6FL75GXZ3S4IOHELP2JFDDANCNFSM4G273X5Q>
.
--
- Chris Carroll
|
I've just cc'd you on a message to a startup that seems to have been
created to solve several of the exact problems we have been struggling with
(including this one). Will keep you in the loop if I get a response.
Maybe our "trial" will be sufficient for my needs at present.
On Thu, Jun 13, 2019 at 5:41 PM Carroll, Christopher <[email protected]>
wrote:
…
The best test-case is the BufferStockTheory remark, partly because it has
a boolean that determines whether to use the "full version" of LaTeX (which
is the huge 2.3 gb thing) or the built-in slimmed down version. It would
be easy to compare the two versions and use that to test the degree of
speedup.
The degree of speedup needed is actually a more complicated question than
it might seem, because of the different ways that the different tools
work. For MyBinder, the whole virtual machine is built according to specs
in the /binder file before anything is displayed. For Google CoLab, you
have to include the packages you need in a "pip install" command in the
first cell, so the notebook displays immediately but can't be used until it
finishes building.
Let's wait until I get back to the US early next week to focus on this; it
shouldn't take too long with both of us focused on it. Also, Andrij has
been tasked with essentially the same mission by QuantEcon and so I want to
piggyback on what they do. That changes my view that we should make a "big
push" to come up with a long-term solution; instead, our objective should
be to have a quick-and-dirty solution before Jun 24, and then work with
Andrij on the longer term solution.
On Thu, Jun 13, 2019 at 3:42 PM Shauna ***@***.***> wrote:
> @llorracc <https://github.com/llorracc>, what do you need done by the
> 24th? Specifically, which notebooks do you need to be ready, and what are
> the ideal, and maximum acceptable, times for them to load?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#5?email_source=notifications&email_token=AAKCK72UXHXOPILRMHEIKI3P2JFDDA5CNFSM4G273X52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXTXJXA#issuecomment-501707996>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAKCK7Y6FL75GXZ3S4IOHELP2JFDDANCNFSM4G273X5Q>
> .
>
--
- Chris Carroll
--
- Chris Carroll
|
Update: Google Colab is looking very promising. Next steps:
If we decide on Colab, we will then:
Currently, mybinder takes several minutes to load our remarks and other notebooks. We'd like anyone loading a notebook from the econARK site to be able to access it quickly, definitely in under 30 seconds and ideally faster. Not sure whether we need special hosting, caching, etc to make this happen. Relatedly, our current setup causes issues with dependency management (see issue #12).
Use cases:
The above use cases have different limitations. A lot of platforms I've investigated have access control such that it would be hard to provide this service to anonymous internet people but easy enough for anyone we approve of and who can spare a few minutes of initial setup. It's possible we could set up a two tiered system where community members (and the students/workshop attendees of community members) can launch notebooks quickly and internet strangers continue to use the mybinder system. (Although this doesn't address the dependency management aspect, just the performance aspect.)
Potential approaches:
People to consult:
Hosting Platforms & Notes
The text was updated successfully, but these errors were encountered: