Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frontend stack overflow at stream fragmenter #19910

Closed
xxchan opened this issue Dec 24, 2024 · 0 comments · Fixed by #19911
Closed

frontend stack overflow at stream fragmenter #19910

xxchan opened this issue Dec 24, 2024 · 0 comments · Fixed by #19911

Comments

@xxchan
Copy link
Member

xxchan commented Dec 24, 2024

To reproduce: TPCH q5

It's not stack depth too deep, but memory usage too large. So #16279 doesn't help.
(On mac, default stack size is 8MB.)

Analysis:

  • stream NodeBody: 4120 bytes. (BTW, batch node body is 680 bytes)
    • largest is HashJoin 4120 bytes, 2nd largest is DyanamicFilter` 2144 bytes.
    • small ones: Project 112 bytes, Filter 336 bytes
    • It's an unbalanced enum, so we will use 4120 bytes for all nodes.
  • The main reason of the large plan node is Table: 904 bytes (HashJoin has 4 tables)
    • Largest problem: recently introduced webhook_info is 344 bytes. (That's why we hit the stackoverflow problem recently but not before)
      • This is because ExprNode is 328 bytes, where DataType is 72 bytes and RexNode is 248 bytes.
      • For RexNode, Udf is 248 bytes (!), and other branch is 32 bytes.
    • Some other minor issues: Option<u32> wasted 4 bytes, since the zero value is not used, or we don't need this large range. We may use i32 or u32 instead.
    • 41 fields in total! @st1page said that this was a mistake: we should pub sth like TableDesc in plan, instead of TableCatalog.

Mitigate:

  • make expr.ExprNode.rex_node.udf boxed: ExprNode becomes 112 bytes, Table 688 bytes, NodeBody 3040 bytes. So this seems not enough.
  • Currently the easiest way seems to be just boxing all variants of NodeBody. I'm not sure whether this is optimal.
  • Alternatively we can box all occurences of Table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant