Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Historical) Discussion of theme refactoring #4

Closed
mattstratton opened this issue May 30, 2020 · 36 comments
Closed

(Historical) Discussion of theme refactoring #4

mattstratton opened this issue May 30, 2020 · 36 comments

Comments

@mattstratton
Copy link
Member

mattstratton commented May 30, 2020

NOTE: This issue was originally opened on devopsdays/devopsdays-web and it shared here for historical/reference purposes. None of these are decisions or even what we should be doing. Just reference!


I might change the title of this issue at some point. What I am looking to do here is get some work in place for a major refactor of the devopsdays-theme theme. The reason that I think this is potentially a reasonable time to start considering this is that 2020 will have a drastically smaller amount of events than usual, so it's relatively "safe" to make breaking changes at this time.

A few things that come to mind:

Treat everything like a Page

Much of the logic/structure in this theme is heavily dependent upon data files, and a lot of inefficient queries. Modern Hugo treats every element as a Page object, which drives more of the information into the frontmatter elements instead of where we do it with a lot of looping queries based upon file locations. I suspect we could get more efficiency by pushing things down to the Page levels

Page Bundles

If we refactor how content is organized, we can leverage Page Bundles, which would allow us to group resources associated with an event in one place (and which gives us the ability to do some clever things with processing the files in the Hugo part itself, rather than trying to do things in build pipelines, etc). Here is a helpful post with some pros/cons to bundles vs static folder.

File structure

I have made the argument before that the filesystem for a lot of this is unwieldy. Of course changes to the paths for events would need to be handled with redirects or alias so that old URLs don't break. But for example, I see something like this:

content/
└── events
    └── 2020
        └── ponyville
            ├── index.md
            ├── logo.png
            ├── speakers.md
            └── speaker-images
                └── twilight-sparkle.jpg
        └── hoofington
            ├── index.md
            └── speakers.md

So what you have there is a generated path of devopsdays.org/events/2020/ponyville which would want to make sure to have redirects created for backward-compatibility so that devopsdays.org/events/2020-ponyville still works.

I would suggest a similar refactor of the data directory as well, which makes it easier to navigate. I am not sure what we would have to do to refactor this.

Migrate from data files to Page frontmatter

This is a big one, and it might not work out super well. Much of what is in the data/events/YYYY-CITY.yml files could be moved to frontmatter for the Pages; the challenge I see is there is data that is used on multiple pages (although TBH there isn't that much). Data fields that are used by more than one page are:

city - this is the "friendly" name of the city. It's used in a lot of places, I think.
year and name - these are used in a lot of places, but to construct queries. They probably aren't needed anymore, and if they are needed, can be obtained via the file path.
ga_tracking_id - the Google Analytics UA ID. This is used (when set) on every page for the event.
cancel - while not really used by pages themselves, used for queries. TBH this probably is handled in frontmatter of the index.md page for the event and queries will hit that, the same way they use the dates, etc.
startdate, etc - there are multiple date fields which are used on event pages, but also in "top level" page queries (open CFP, etc). The start/end date fields could be in index.md and the cfp related ones in propose.md, etc. I think we could probably figure out how to query against that (pseudocode - "select all pages of type propose where cfp end date is greater than now", etc)

Navigation is currently handled in the data file; instead it could simply be built by iterating through the pages in the event's directory. If someone wanted a link that wasn't to a Page, we could provide docs on how to do that (you would actually create a page for it, with no content, and maybe of the type "redirect" with frontmatter of the actual URL, and it would do two things: one, if someone hit that path somehow, it would redirect there, but the menu building logic could be like "range through all the pages; if it's type 'redirect' then make a different URL, but otherwise pull the name or something to make the menu")

Sponsor stuff is currently all in the data file; this one would be trickier. I don't have any ideas for this yet.

The other data elements are all very page-specific and would be able to be moved to frontmatter.

Change Program template to use Pages

I believe that the data file concept for the Program makes is a lot more unwieldy. The program could be dynamically generated based on frontmatter elements (which the current script actually creates, but we don't use), i.e., the program element pages all have a date, start, and end time, and they get collected into one place, sorted, displayed, etc. This makes it a lot easier to create some custom program elements (you just create a Page for them).

Downside of that is things that have to go on the program that you repeat a lot, but don't want to create a whole Page element just for them (like breaks).

Maybe the Program gets actually done via shortcodes in the markdown; you have a list of the things for the day, and the shortcode takes arguments to turn it into what needs to be done. So, the program.md would look like this:

{{< program_element type="talk" name="twilight-sparkle" start_time="08:00" end_time="08:30">}}
{{< program_element type="custom" name="Morning Break" start_time="08:30" end_time="09:00">}}

Upon reflection, I can't see how that actually works, because of HTML. But I'm keeping the reference so that we don't think it's a good idea in the future.

Another possiblity with the "get all the program elements from frontmatter of the pages in the program directory" is that the pages could support more than one start/end time (for the custom stuff like breaks).

It's a not trivial problem, that's for sure.

A simpler mechanism might be to create a script that will generate the program for you, based upon prompts, and it outputs it as HTML to the program.md page. The good part of this idea is that it's VERY easy to customize it later by hacking the HTML (if you wanted to do fancy things/stuff not supported by the default template, like having multiple blocks of ignites, etc).

Downside is that the script has to be where we make changes if we want to improve the styling, etc, of the program.

The other thing we could do is keep all the program layout in the markdown file, and just have the generated program be a "sample" and people have to do it manually. That's not great either. I think that the happy medium is to have a templated markdown file that people can use as inspiration, but also provide a script to make one based upon prompts.

@mattstratton mattstratton added the enhancement New feature or request label May 30, 2020
@mattstratton
Copy link
Member Author

I have created the project Theme Refactor to track issues related to this.

@mattstratton mattstratton changed the title Refactor theme Discussion of theme refactoring May 30, 2020
@bridgetkromhout
Copy link

a lot of looping queries based upon file locations

I do not have the bandwidth to dive deep into the logistics of fixing this, but I agree that you're pointing out one of the reasons the site build is both slow and brittle. Fixing this is worth doing; I support this effort.

@mattstratton
Copy link
Member Author

I've done a little bit of work that I want to capture; I suspect that using Bundles might not work out the way we are hoping (at least from the perspective of Page Resources, just because of how they work).

What does this mean? In my head, I was thinking that what would happen is that, for example, images related to speakers, organizers, etc, would then live in the content directory for the event, rather than static (they can be accessed programmatically which is powerful). But due to how Bundles work, this will end up making the file structure of the content files really complicated, and to be honest, I suspect that at our scale, having to parse those images as Resources during the build would suck anyway.

People are used to the static folder; I think we can refactor the file structure of it to match the proposed content folder changes anyway, so it would look like this:

static/
└── events
    └── 2020
        └── ponyville
            ├── logo.png
            └── speakers
                └── twilight-sparkle.jpg
        └── hoofington
            ├── logo.png
            └── other-image.jpg

I'm working on this in devopsdays/devopsdays-web#9794; I'll focus more on the page content/frontmatter approach for that spike, rather than trying to get the images as part of it.

@mattstratton
Copy link
Member Author

Putting this here so I don't forget - I think it will actually work to put almost everything that is currently in YYYY-CITY.yml into the frontmatter of events/YYYY/CITY/welcome.md using GetPage.

Something like this, in pseudocode:

{{ with .Site.GetPage "/events/2020/ponyville/welcome.md" }}
  {{ .StartDate }}
{{ end }}

This presumes we are not using Page Bundles, or at least not Branch Bundles...if we do Leaf Bundles I think this will work.

@mattstratton
Copy link
Member Author

mattstratton commented Jun 13, 2020

Thinking about navigation...I think that the model is going to be that the event-level nav is built by querying all the Pages in the content/events/YYYY/CITY/ directory. There are two things to consider:

  1. Pages that you don't want added to the nav
  2. The sort order of the pages in the nav

I think that we can add an optional frontmatter to Pages (nav = false) which would prevent them from showing in the navigation. That's pretty easy to solve the first issue.

For sort order, it's harder. We either have to add something like weight = 100 etc to the frontmatter of each Page, or we just move Navigation to frontmatter in welcome.md.

For the latter, it would look something like this:

navigation = [
	"speakers",
	"program",
	"location",
	"sponsors",
	"contact",
	"conduct"
    ]

While I like it happening automatically, this actually solves both issues in one place.

If there's a way to make the array more nested, so to speak, that would handle the "navigation needs to be off-site links" use case.

For this to work, the frontmatter would be a little more complex, but not terribly hard:

+++
date = "2016-12-14T21:27:05.454Z"
publishdate = "2016-12-14T21:27:05.454Z"
title = "devopsdays ponyville"
type = "welcome"

[navigation]
  elements = [
	"speakers",
	"propose",
	"location",
	"sponsors",
	"contact",
	"conduct"
    ]
  [navigation.propose]
    url = "https://papercall.com/myevent"

In psuedocode, the navigation would be built like this:

{{ range .Page.navigation.elements }}
  {{ Scratch.Set "element-name" . }}
  {{ if IsSet .(Scratch.Get "element-name") }}
    {{ link := (Scratch.Get "element-name").url }}
  {{ else }}
    {{ link := print("/events/YYYY/CITY/" . | absURL }}
  {{ end }}
{{ end }}

@mattstratton
Copy link
Member Author

Having thought about this some more, I am really wanting to focus on the idea of getting EVERYTHING (event-wise; not sure about sponsor files yet but maybe?) out of data files and into frontmatter. This will make automation a LOT easier (devopsdays-cli can probably be finished!) and while it will be a pretty big change for people used to working the "old" way, it will be a lot easier to work with in this new way.

I think that sponsors, even, should be able to be moved to Pages (possibly a headless bundle?)

@mattstratton
Copy link
Member Author

to update the nav conversation (in case anyone is reading this)...

Navbar code:

{{- $event_year := (index (split (.Permalink | relURL) "/") 2) -}}
{{- $event_city := (index (split (.Permalink | relURL) "/") 3) -}}
{{- $event_homepage := (printf "/new-events/%s/%s" $event_year $event_city) -}}
{{- $event_data := .Site.GetPage $event_homepage -}}


<nav class="navbar event-navigation navbar-expand-md navbar-light">
  <a href="{{ $event_homepage }}" class="nav-link">{{ $event_city }}</a>
  <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbar2">
    <span class="navbar-toggler-icon"></span>
</button>
  <div class="navbar-collapse collapse" id="navbar2">
      <ul class="navbar-nav">
        {{ with .Site.GetPage $event_homepage }}
          {{ range .Params.navigation.elements }}
            <li class="nav-item active">
              {{ if isset $event_data.Params.navigation . }}
                <a class="nav-link" href="{{ (index $event_data.Params.navigation .).url }}">{{ . }}</a>
              {{ else }}
                <a class="nav-link" href="{{(printf "%s/%s" $event_homepage . )}}">{{ . }}</a>
              {{ end }}
            </li>
          {{ end }}
        {{ end }}
      </ul>
  </div>
</nav>

How it looks in frontmatter:

+++
date = "2020-06-11T02:11:48-05:00"
description = "devopsdays ponyville is awesome"
title = "devopsdays ponyville 2020"
type = "new-event"
city = "Ponyville"
event_twitter = "devopsdayschi"
startdate = "2020-07-05T08:00:00-06:00"
enddate = "2020-07-06T17:00:00-06:00"
[navigation]
  elements = [
	"speakers",
	"propose",
	"location",
	"sponsor",
	"contact",
	"conduct"
    ]
  [navigation.propose]
    url = "https://papercall.com/myevent"
+++

@mattstratton
Copy link
Member Author

mattstratton commented Jun 18, 2020

Some interesting stuff in here that might be relevant:

https://forestry.io/blog/data-relationships-in-hugo/

Don’t Repeat Yourself is the perennial mantra of the software developer. It doesn’t mean you should never do the same thing twice, but instead refers to having a single, authoritative source of truth for every piece of information used in your software. Don’t Repeat Yourself is frequently applied to code, where knowledge is susceptible to duplication through the copying and pasting of code blocks where an abstraction should be used instead.

@mattstratton
Copy link
Member Author

Most of the event-level templates are all working. There's one kind of weird thing to figure out...

We have a lot of pages that are spun off to static. In this new world, the data files completely go away. The problem is, for the pages that are spun off to static, we don't have Pages to hit in content where the datafile stuff would have been moved to.

One thing we can do, possibly, is keep data files around for the "archived" pages. There are two places where we need to query information for the older stuff:

  1. For the "other XXX events" on the welcome page for an event
  2. For the "Past" events list on the Events page.

For number 1, I managed to handle this in the new event level page; it basically first loads any past events that exist in the data directory, and then it lists pages it finds in the Content. This is working not too terribly bad.

For number 2, we could do something similar. For each year, it could first load any cities it finds from the data directory in that year, and then follow it with the events in Content.

The trick would be to make sure that we remove data files when we migrate events. This won't be too bad, as the migration tool can take care of it (it has to import stuff from the data file, so the last part of the import could be to delete the associated data file).

The only events that would migrate would be 2019 and after (previous events have been archived). The thing is, this change will fundamentally remove our ability to archive to static; I don't know if I mind this terribly, as I kind of thing that the efficiencies of the new code will make having lots of stuff in Content not as impactful as the past.

We could also handle this even a little more elegantly; we could add a new parameter to the frontmatter that is archive = true. So when we archive an event to static, we keep the _index.md file, but set that parameter.

If we did this, it could be part of the migration as well - the only thing we need from the Page is the year/city.

I just did a test - whatever is in static takes precedence over any generated pages!

SOOOO

what we do is move all the stuff in data/events to an associated _index.md populated with the frontmatter, etc, from the data file. So it ends up kind of like this:

  1. data/events/2018-chicago.yml gets migrated to content/events/2018/chicago/_index.md
  2. data/events/2018-chicago.yml gets deleted
  3. Hugo has content/events/2018/chicago available to it for queries, etc, but the HTML it generates isn't what gets used.

@mattstratton
Copy link
Member Author

For experimenting purposes, I went back through old commits and got copies of all the content/events directories for archived events, and saved them to my hard drive. When I get the migrator/theme completed, I will do a test migration that includes all of these, in order to see what the impact of "un-archiving" events will be (there are a lot of negative things that happen with archiving, and it would be much better to not have them moved to static, but I will want to see what the impact is!)

@mattstratton
Copy link
Member Author

Note to self - before migrating the "static" events, the speaker pages need to be converted (2016 and some 2017 events used the speaker data files instead of Pages for speakers, so speakers/program need some changes).

I will probably have to do this manually, but it might not be too onerous (a migrator might be tough, but maybe not terrible? I'll see if I can write something quickly...it really is just going to be reading some YAML and converting it to markdown via a template)

@mattstratton
Copy link
Member Author

I just thought about the migration for data file speakers to Pages...it will be pretty simple, but there will still need to be a manual step to go through each of the content/events/YYYY/city/program directory files for the 2016-2017 events and just add the proper frontmatter (the older version of talk pages doesn't have a frontmatter element for "speakers")

I could write something that is a later migration that goes through those files and adds the frontmatter based on the program file name, although if I do this, it has to be done on an event-by-event basis (although thinking some more; we only will run the "convert speaker data files if there is a data file", and that same function could then also call the "update the talk page associated" thing; it would be slightly tricky but doable? I think I'll try it)

@mattstratton
Copy link
Member Author

In case anyone is following this (I would be surprised if they were), I did add the function to migrator to handle the "old" style speakers, etc.

I've very curious to see what builds look like with the new theme when we have things moved "back" out of static.

There is still one small manual thing that has to happen...in the "old" style, the title field for a talk page in program is the filename of the speaker...and people just put the title of the talk in free text in the content. So for all those talks, there will be a manual effort to edit the files and copy/paste to the title field. It's only for 20-30 (guesstimate) events, so it won't be awful (if it looks like the performance is worthwhile to keep it this way).

@mattstratton
Copy link
Member Author

mattstratton commented Jun 30, 2020

Here are the things I still need to update in the theme:

  • main homepage - i.e., do a Pages query instead of looking for data files to draw all the events with their logos. I'm not stressing about this one.
  • speaking page - this is the list of open CFPs; this really won't be too bad. Just like the main page, it's just changing the query to be Pages instead of data files.
  • past and future partials - used on the /events page, and in the main sidebar. Might take a little bit of effort to modify the queries, but should be okay.
  • event program page - this one will need a lot more thinking about (I mentioned in the first post of this thread)
  • sponsor top level page - I just noticed that /sponsor page has some more logic (I think it's the thing that shows events that are open for sponsorship). I don't think it will be too bad.
  • shortcodes - this is a bunch of work; there are a LOT of shortcodes and they will all need to be refactored.

@mattstratton
Copy link
Member Author

For the Program page template, I think that the easiest thing to do (and the least impactful) is to move the YAML for program out of the data file and into the frontmatter on the program.md file for the event. That frontmatter will look HUGE, but it should work.

If we want to make enhancements to how the program template functions (later!!), the move will be to create a new template for type "old-program" and then we will need to update all the existing program.md files to use that type, and have new stuff use the new version.

I did go back to the survey I ran years ago to see what people prefer, and data fields in a YAML file (for this purpose, TOML in frontmatter is just as fine) was the overwhelming preference.

if we migrate the YAML "as-is" to TOML, I think the frontmatter for a program page would look something like this:

Title = "Program"
Type = "program"
Description = "Program for devopsdays Chicago 2019"
icons = "TRUE"
program_elements = [
    { title = "Registration, Breakfast, Sponsors", type = "custom", start = 2019-02-15T08:00:00-06:00, end = 2019-02-15T09:00:00-06:00 },
    { title = "Opening Welcome", type = "custom", start = 2019-02-15T09:00:00-06:00, end = 2019-02-15T09:15:00-06:00 },
    { title = "jeff-smith", type = "talk", start = 2019-02-15T09:15:00-06:00, end = 2019-02-15T09:45:00-06:00 },
    { title = "Ignites", type = "ignite", start = 2019-02-15T09:45:00-06:00, end = 2019-02-15T10:15:00-06:00 },
    { title = "Registration, Breakfast, Sponsors", type = "custom", start = 2019-02-16T08:00:00-06:00, end = 2019-02-16T09:00:00-06:00 },
]

and so on...the upshot of this is that we get timezone info. But we don't necessarily have that. We might have to do something where "if we don't know the timezone from another field in the main data file, we just set it to US central" because this is only for historical programs and if the time isn't right, it's probably Not the End of the World.

We could also have the following be supported:

program_elements = [
    { title = "Registration, Breakfast, Sponsors", type = "custom", date = "2019-02-15",start_time = "08:00", end_time = "09:00" },
]

so the template code for the program page would have to be pretty weird; it would have to check for an element that has date set, and if so, it works differently. This might be okay (we did this for the speaker page to handle the backward-compatibility with the data files; it means that in the template you have one big giant "if" statement to say "do I draw the program the new fancy way or use the old way?" and go from there.

We also need to handle devopsdays/devopsdays-web#6543 as long as we are here; I think it works the same way. If it detects any elements with date set, then it displays ignites a certain way; if not, it will do whatever magic I come up with to handle this idea of multiple ignite blocks per day (fyi, this is something that I need to solve for 2020 Chicago, so it's Kind of Important to me, lol)

@mattstratton
Copy link
Member Author

Program page template discussion moved to devopsdays/devopsdays-web#9839

@mattstratton
Copy link
Member Author

I've migrated almost everything properly (there are some really old events that I didn't properly get the old markdown content files for, but it's only a handful).

Here are the build time analysis!

New site:

     cumulative       average       maximum
       duration      duration      duration  count  template
     ----------      --------      --------  -----  --------
  4m7.198241424s   71.362078ms  203.938951ms   3464  new-speaker/single.html
  1m10.853977895s    9.442161ms  137.802741ms   7504  partials/new-sponsors.html
  38.388595125s   12.292217ms  124.652365ms   3123  new-talk/single.html
   20.23957483s   49.364816ms  166.671067ms    410  new-event/list.html
  13.641480828s   73.737734ms  152.571845ms    185  new-speakers/single.html
  13.536706684s    1.440534ms  600.530843ms   9397  partials/head.html
   8.911487036s     948.333µs  600.155799ms   9397  partials/head/seo.html
   4.700883476s  111.925797ms  183.689751ms     42  partials/future.html
   4.338586854s  149.606443ms  225.515045ms     29  blog/single.html
   3.801047853s    2.258495ms    37.41748ms   1683  new-event/single.html
   3.260748601s   10.126548ms   65.946655ms    322  new-contact/single.html
   2.795071051s     297.442µs   61.662898ms   9397  partials/head_includes.html
   2.328530263s    8.656246ms   34.635231ms    269  shortcodes/event_logo.html
   1.775490276s     189.831µs   21.419536ms   9353  partials/events/new_event_navbar.html
   1.641669484s     174.701µs   44.212858ms   9397  partials/footer_scripts.html
   1.438979812s   79.943322ms  732.685425ms     18  _default/single.html
   1.260961215s    8.135233ms   36.460195ms    155  new-program/single.html
   1.162803153s     123.741µs   21.063993ms   9397  partials/global_navbar.html
   963.119448ms    2.326375ms  271.512444ms    414  _internal/_default/rss.xml
    664.80865ms   44.320576ms   72.769378ms     15  partials/footer.html
    659.25097ms      70.155µs   33.427298ms   9397  partials/google_analytics.html
   598.851374ms  598.851374ms  598.851374ms      1  shortcodes/list_core_active.html
   449.257966ms  449.257966ms  449.257966ms      1  index.html
   416.054834ms  138.684944ms  185.146191ms      3  section/blog.html
   398.072419ms      42.361µs   22.615669ms   9397  partials/meta.html
   245.366061ms  245.366061ms  245.366061ms      1  events/single.html
   133.395671ms  133.395671ms  133.395671ms      1  _internal/_default/sitemap.xml
   126.357255ms  126.357255ms  126.357255ms      1  speaking/single.html
   118.534747ms  118.534747ms  118.534747ms      1  partials/past.html
   112.667043ms      135.58µs   15.626225ms    831  shortcodes/email_organizers.html
   101.773333ms  101.773333ms  101.773333ms      1  sponsor/single.html
    69.763855ms   69.763855ms   69.763855ms      1  404.html
    61.642929ms     150.348µs    8.506989ms    410  partials/events/new-cta.html
    58.892062ms     218.929µs    15.92786ms    269  shortcodes/cfp_dates.html
    55.139061ms      31.634µs    8.437299ms   1743  shortcodes/event_link.html
    39.966181ms     168.633µs    5.814388ms    237  shortcodes/event_map.html
    32.909353ms      86.832µs    5.151657ms    379  shortcodes/event_start.html
    29.966051ms     5.99321ms    17.83608ms      5  shortcodes/emoji.html
    26.473372ms      90.352µs   10.320912ms    293  shortcodes/event_location.html
    20.407951ms       65.62µs    1.202001ms    311  shortcodes/event_twitter.html
    19.553302ms      57.341µs    1.140029ms    341  shortcodes/event_end.html
    11.840894ms     105.722µs     2.09472ms    112  shortcodes/email_proposals.html
     2.849758ms        16.1µs     201.663µs    177  _internal/alias.html
     2.680501ms    2.680501ms    2.680501ms      1  partials/toc.html
     2.037494ms      70.258µs     344.762µs     29  blog/summary.html
     1.886572ms    1.886572ms    1.886572ms      1  section/blog.rss.xml
     1.435582ms     287.116µs    1.031258ms      5  shortcodes/registration_end.html
     1.158586ms     231.717µs     531.296µs      5  shortcodes/registration_start.html
     1.094763ms      182.46µs     416.704µs      6  _internal/shortcodes/figure.html
       652.55µs     217.516µs     514.414µs      3  partials/blog_pagination.html
      583.744µs     583.744µs     583.744µs      1  _internal/shortcodes/tweet.html
      493.843µs     493.843µs     493.843µs      1  _internal/shortcodes/youtube.html
      436.569µs     436.569µs     436.569µs      1  shortcodes/privacy_policy.html
      409.024µs       1.213µs     104.006µs    337  shortcodes/list_organizers.html
      390.485µs       65.08µs     305.007µs      6  shortcodes/google_form.html
      317.548µs     317.548µs     317.548µs      1  partials/heading_link.html
      312.477µs     312.477µs     312.477µs      1  shortcodes/list_core_emeritus.html
      309.157µs     309.157µs     309.157µs      1  shortcodes/list_core_advisory.html
      302.402µs         900ns      67.984µs    336  shortcodes/list_core.html
       264.84µs       88.28µs     262.619µs      3  _default/list.html
      215.636µs     215.636µs     215.636µs      1  partials/sponsors-accepted.html
      194.365µs     194.365µs     194.365µs      1  partials/speaking.html


                   |  EN
-------------------+--------
  Pages            | 15028
  Paginator pages  |     2
  Non-page files   |    31
  Static files     | 26538
  Processed images |     0
  Aliases          |   177
  Sitemaps         |     1
  Cleaned          |     0

Total in 36564 ms

old site

     cumulative       average       maximum
       duration      duration      duration  count  template
     ----------      --------      --------  -----  --------
  1m7.141879644s   818.80341ms   2.75559427s     82  program/single.html
  49.022228463s    31.08575ms   86.377586ms   1577  speaker/single.html
  13.083167933s  451.143721ms  820.236448ms     29  blog/single.html
  12.431525919s  295.988712ms  751.434448ms     42  partials/future.html
   8.739697362s    2.045809ms   48.081051ms   4272  partials/sponsors.html
   7.422612487s    5.063173ms   77.923768ms   1466  talk/single.html
   7.388977107s    1.678168ms   53.827168ms   4403  partials/head.html
   4.855053821s   53.352239ms  102.325922ms     91  speakers/single.html
   4.545636171s    1.032395ms    53.26201ms   4403  partials/head/seo.html
   4.044093229s    4.160589ms   36.961761ms    972  event/single.html
   3.176256676s   19.134076ms    70.35561ms    166  welcome/single.html
    1.66153021s    9.660059ms   37.441762ms    172  partials/welcome.html
    1.41393655s     324.595µs   19.577609ms   4356  partials/events/event_navbar.html
   1.376075366s     312.531µs   30.829811ms   4403  partials/head_includes.html
   1.207311289s  301.827822ms  335.276925ms      4  events/single.html
   1.148701917s  287.175479ms  324.197614ms      4  partials/past.html
   885.245797ms    6.461648ms   30.455048ms    137  shortcodes/event_logo.html
   563.094225ms   35.193389ms   51.969411ms     16  partials/footer.html
   553.726423ms     125.761µs   19.899702ms   4403  partials/footer_scripts.html
   514.987023ms     116.962µs   16.468136ms   4403  partials/global_navbar.html
   435.388986ms      98.884µs    20.65234ms   4403  partials/google_analytics.html
   173.040048ms  173.040048ms  173.040048ms      1  index.html
   128.749556ms   14.305506ms   20.145763ms      9  _default/single.html
   127.923024ms      29.053µs    3.712582ms   4403  partials/meta.html
    119.36236ms    59.68118ms  118.569823ms      2  _internal/_default/rss.xml
    86.445084ms     502.587µs    8.680545ms    172  shortcodes/list_organizers.html
    80.812261ms   80.812261ms   80.812261ms      1  _internal/_default/sitemap.xml
    62.399057ms   20.799685ms    24.75598ms      3  section/blog.html
    60.893624ms     468.412µs   46.010828ms    130  shortcodes/cfp_dates.html
      55.5727ms      69.902µs    3.270125ms    795  shortcodes/event_link.html
     47.25586ms    47.25586ms    47.25586ms      1  sponsor/single.html
    42.568848ms   42.568848ms   42.568848ms      1  speaking/single.html
    31.792145ms      72.917µs    1.923405ms    436  shortcodes/email_organizers.html
    16.171083ms      94.017µs     934.304µs    172  partials/events/cta.html
    15.094301ms   15.094301ms   15.094301ms      1  partials/speaking.html
       14.611ms     103.624µs     2.99729ms    141  shortcodes/event_map.html
    11.639992ms   11.639992ms   11.639992ms      1  partials/sponsors-accepted.html
     8.165111ms    8.165111ms    8.165111ms      1  404.html
     7.878009ms      48.036µs     271.305µs    164  shortcodes/event_start.html
     6.151983ms       39.69µs     372.112µs    155  shortcodes/event_twitter.html
     5.743033ms      16.646µs     263.058µs    345  _internal/alias.html
      5.67479ms       42.99µs     470.912µs    132  shortcodes/event_location.html
     5.464073ms      38.479µs     920.857µs    142  shortcodes/event_end.html
     5.078207ms    5.078207ms    5.078207ms      1  section/events.rss.xml
      4.33283ms     4.33283ms     4.33283ms      1  section/speaking.rss.xml
     3.188881ms     109.961µs      539.91µs     29  blog/summary.html
     3.068013ms    3.068013ms    3.068013ms      1  partials/toc.html
     1.856055ms      48.843µs     336.352µs     38  shortcodes/email_proposals.html
     1.652016ms     330.403µs    1.621271ms      5  shortcodes/emoji.html
     1.429043ms       8.356µs    1.187748ms    171  shortcodes/list_core.html
     1.295719ms    1.295719ms    1.295719ms      1  section/blog.rss.xml
     1.289742ms     214.957µs     557.668µs      6  _internal/shortcodes/figure.html
      629.189µs     209.729µs     425.809µs      3  partials/blog_pagination.html
      406.801µs     406.801µs     406.801µs      1  shortcodes/list_core_active.html
      362.902µs     120.967µs     269.559µs      3  _default/list.html
       351.47µs      351.47µs      351.47µs      1  partials/heading_link.html
      314.877µs     314.877µs     314.877µs      1  shortcodes/list_core_emeritus.html
      303.987µs     303.987µs     303.987µs      1  shortcodes/registration_end.html
      286.188µs     286.188µs     286.188µs      1  shortcodes/list_core_advisory.html
      241.866µs     241.866µs     241.866µs      1  shortcodes/privacy_policy.html
      227.993µs      75.997µs     185.851µs      3  shortcodes/google_form.html
      217.208µs     217.208µs     217.208µs      1  shortcodes/registration_start.html
      195.629µs      97.814µs     185.859µs      2  _internal/shortcodes/param.html
      172.347µs     172.347µs     172.347µs      1  _internal/shortcodes/youtube.html


                   |  EN
-------------------+--------
  Pages            |  4406
  Paginator pages  |     2
  Non-page files   |    42
  Static files     | 14630
  Processed images |     0
  Aliases          |   345
  Sitemaps         |     1
  Cleaned          |     0

Total in 25561 ms

@mattstratton
Copy link
Member Author

Some thoughts...

the new site with (most of) the old events takes 36564 ms (about 36 seconds).
the old site takes 25561 ms (about 25 seconds).

So it takes about 10 seconds longer to build now, but that is with EVERYTHING vs just 2019-2021 events.

There is a HUGE improvement in the program page. Old code takes avg of 818 ms vs 8 ms for the new one.

The speaker pages take about twice as long in the new code (71 ms vs 31 ms) and I'll have to look at why. That's the new heaviest page, I think, but that's in terms of there being so many of them.

One thing that happened when I ran this in watch mode is that even setting max files to 65K was not enough, but this also still has all the archived static HTML files that will get deleted later, so I'm not terribly worried.

I am slightly concerned about total build time; remember that the durations listed (except for the total build time) is in terms of CPU time; I am running on 8 cores. I'm tempted to push this to netlify to see how long it takes to build as a PR, but I'm fairly sure it will totally time out on upload (there are way too many files) but it should get far enough to build hugo so I can see what it does.

@mattstratton
Copy link
Member Author

mattstratton commented Jul 3, 2020

Testing with a push to netlify...

for comparison, the current site in Netlify comes up as taking this amount of time for the hugo build (from hugo):
Total in 91840 ms

the newly pushed code takes this long:
Total in 262951 ms

That's not insubstantial; for comparison:

old site: 1m 36s
new site: 4m 23s

I am considering that Netlify is probably a somewhat reasonable estimate for the lowest-end computer that an organizer would be using; 4 minutes seems waaaaaay too long.

I need to consider this some more.

@mattstratton
Copy link
Member Author

I suspect that if the speaker page could be improved that might make it feasible.

If we abandon the overall refactor, i do want to see if i can replicate what I did with the program page in the “old” code :)

@mattstratton
Copy link
Member Author

This is the part of the speaker page that might be the heavy part:


{{- range where (where $.Site.Pages "Type" "new-talk") ".File.Dir" "=" (printf "new-events/%s/%s/program/" $event_year $event_city) -}}
                {{- $talk_title := .Title -}}
                {{- $talk_link := .Permalink -}}
                {{- range .Params.speakers -}}
                    {{- if eq . $speaker_slug -}}

So it’s spinning through all the talks and then ranging over the element; I could take another swing and writing the query so that the check for speaker slug is in the first range statement.

@mattstratton
Copy link
Member Author

Another thing that might help is working with caching of partials

We do this a little bit in the old site (and I am pretty sure I didn't carry this over to the new code). It also looks like you can somehow be specific about where the caching happens, i.e., should it cache across the whole site, or just in sections?

For example, it might be possible to cache the sponsors partial per-event; this would help a bunch I think.

@mattstratton
Copy link
Member Author

mattstratton commented Jul 6, 2020

A bit more detail here - https://regisphilibert.com/blog/2019/12/hugo-partial-series-part-1-caching-with-partialcached/

What might be helpful is if it's possible to cache based on the path/fildir; that would let us do a lot of caching per-event.

Partials are very handy to maintain reusable code but can take up on build time if processed by Hugo more than neeeded. In this article we'll cover how their own caching solution can help reduce the build time!

@mattstratton
Copy link
Member Author

I don't know if this will help. I just set the sponsors partial to be cached (globally, which we wouldn't do, but it's the most aggressive) on the speaker page and this is the difference:

before:

  4m1.194052124s   69.628767ms  190.073127ms   3464  new-speaker/single.html

after:

  3m18.077833386s   57.181822ms  174.646104ms   3464  new-speaker/single.html

That did cut off about 10ms per, which times 3000 executions isn't minimal. But again, it wouldn't be that good in reality (as the sponsor partial wouldn't cache across all of them)

I do wonder if optimizing across ALL pages, even small amounts, would end up with a cumulative improvement.

@mattstratton
Copy link
Member Author

Moving the sponsors partial to partialCached on all the event pages (based on their city/year directory) ends up looking like this:

     cumulative       average       maximum
       duration      duration      duration  count  template
     ----------      --------      --------  -----  --------
  3m18.28275776s    57.24098ms   196.61659ms   3464  new-speaker/single.html
  18.055811558s   44.038564ms   98.389806ms    410  new-event/list.html
  14.138267525s    4.527142ms   95.070696ms   3123  new-talk/single.html
  13.936818714s    1.483113ms  1.487809441s   9397  partials/head.html
  11.061936049s   59.794248ms  157.220906ms    185  new-speakers/single.html
   9.886230627s    1.052062ms  1.487615894s   9397  partials/head/seo.html
   9.239666614s    7.876953ms   40.784638ms   1173  partials/new-sponsors.html

The key difference is that new-sponsors.html now runs 1,173 times instead of 7,504 times. It also reduces that partial cumulative from 1m10s to 9s. That makes a pretty big difference. The total build time on my system is still 37,223ms vs 36,564ms but I am thinking that for a lower-end system this will help a lot to cut the issue. I'm going to push this change to netlify to see what it does to the build there.

@mattstratton
Copy link
Member Author

If the netlify build looks better (not perfect) I think going through all the partials (head/footer ones and their component partials) and adding caching in the same way will make a non-trivial improvement as well.

@mattstratton
Copy link
Member Author

(I also discovered a fun bug in the current site - the program.html template doesn't include sponsors, so the program page for an event doesn't have sponsors show up. I'll add that into the new site once we figure out if the sponsors partial is fixed)

@mattstratton
Copy link
Member Author

Caching the sponsors partial per-event resulted in a netlify hugo build time of:
3m 4s

compared to before it was 4m 23s

So that just cut over a minute off of the netlify build. This is promising.

@mattstratton
Copy link
Member Author

If we want to start doing more aggressive caching of the head includes, I think we need to do a little more refactoring of them. For example, head.html includes a bunch of other partials. The logic we do on head.html is totally based on Type. So at first, you think you can cache based on that like this:

{{- partialCached "head_includes.html" . "Type" -}}

However, it would end up with all pages of type "Talk" with the same header, which while the stuff that head.html does that would be fine, the sub-partials are different. So we would either not cache head.html partial (but cache the sub-partials) or move the stuff in the sub-partials into head.html.

The downside of that is that head.html code would get pretty big and long, but also it might be easier to understand anyway?

@mattstratton
Copy link
Member Author

So it’s spinning through all the talks and then ranging over the element; I could take another swing and writing the query so that the check for speaker slug is in the first range statement.

This definitely is making this heavier; when I remove this part, it goes from 57 ms to 1 ms, which across all the speakers, is a big deal. Hmm.

@mattstratton
Copy link
Member Author

OK, this generally just sucks to have to go pull all the talks (even just the ones for the speaker). I modified the range statement so it only pulls talks from that event if the filename is aaron-blythe) so it is just one single talk in the result) and it still runs this way:


     cumulative       average       maximum
       duration      duration      duration  count  template
     ----------      --------      --------  -----  --------
  3m32.127522523s   61.237737ms  227.768554ms   3464  new-speaker/single.html

removing the "list the talks for this speaker" section results in this:

6.175388232s    1.782733ms   29.374921ms   3464  new-speaker/single.html

Hmm.

@mattstratton
Copy link
Member Author

the previous speaker page took 31.08575ms. The new one is twice as long, but the query is basically the same (clearly the look up talks part is what takes up all the time).

old code:

{{ range where (where $.Site.Pages "Type" "talk") ".File.Dir" "=" (print "events/" $e.name "/program/") }}
                <!-- Now we can display stuff! -->
                {{- range .Params.speakers -}}
                  {{- if eq . ($.Scratch.Get "speaker") -}}
                    {{- $.Scratch.Set "display" "true" -}}
                  {{- end -}}
                {{- end -}}
                {{- if eq ($.Scratch.Get "display") "true" -}}
                  <a href = "{{ .Permalink }}" class= "list-group-item list-group-item-action">{{ .Title }}</a>
                  {{ $.Scratch.Set "display" "false" }}
                {{- end -}}
            {{- end -}} <!-- end range where -->

new code:

            {{- range where (where $.Site.Pages "Type" "new-talk") ".File.Dir" "=" (printf "new-events/%s/%s/program/" $event_year $event_city) -}}
                {{- $talk_title := .Title -}}
                {{- $talk_link := .Permalink -}}
                {{- range .Params.speakers -}}
                    {{- if eq . $speaker_slug -}}
                        <a href = "{{ $talk_link }}" class= "list-group-item list-group-item-action">{{ $talk_title }}</a>
                    {{- end -}}
                {{- end -}}
            {{- end -}}

I don't see how the new code is really any different? What am I missing?

@mattstratton
Copy link
Member Author

the only thing I can imagine is that the first range query is just really really long? Because there are just so many more Pages in general? So if the way the nested where works is first it pulls the first one and then re-queries, that could be really long.

I could test it by taking an event and making the "type" something other than new-talk just to make that first query smaller and see if that affects things?

NOTE: it does not help. I narrowed it down and it comes back this way:

42.217035ms - that is saying "find all pages of type talk and then also in a certain directory" so it should actually return nothing for the range. But it still takes a long time. Hmm.

@mattstratton
Copy link
Member Author

I did some caching on the head partial (it also used to be a lot of different partials, but now it's just one).

here's what it looks like now:

   7.451082345s    5.041327ms  1.745011401s   1478  partials/head.html

vs previous:

13.936818714s    1.483113ms  1.487809441s   9397  partials/head.html

it definitely cut the time that the particular partial was executed, which is good. it cuts it in half, but that partial doesn't look like it was terribly heavy before? Although having it execute much less frequently probably helps. I'll try a netlify deploy.

@mattstratton
Copy link
Member Author

with the new caching, the netlify build is now:

3m 23.3s
vs
3m 4s

so...uh...it got ... worse?

@mattstratton mattstratton transferred this issue from devopsdays/devopsdays-web Jul 10, 2024
@mattstratton mattstratton removed the enhancement New feature or request label Jul 10, 2024
@mattstratton mattstratton changed the title Discussion of theme refactoring (Historical) Discussion of theme refactoring Jul 10, 2024
@mattstratton
Copy link
Member Author

Closing the issue, but keeping it around for historical reasons.

discussion moved to #6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants