Extracted out `RawEGraph` type #296

dewert99 · 2024-02-06T19:16:12Z

This is another attempt with a similar motivation to #293 (It still assumes #291).

Instead of trying to work with compositional traits, I decided to extract out a RawEGraph type with raw versions of add, union and rebuild that each has various hooks, and then reimplement EGraph to use RawEGraph in a more backwards-compatible way. I also created the EGraphResidual type to represent what's left of a RawEGraph without its classes, so it could still be used while mutably borrowing data from an eclass. The EGraphResidual type has methods like find, lookup, and id_to_node and both RawEGraph and EGraph dereference to EGraphResidual so these implementations are shared.

None of the RawEGraph implementation is pub(crate), so it could be extracted into its own crate, although it depends on Id, Language, UnionFind, and RecExpr so they would need to be moved as well.

…and `analysis_pending`

… are disabled

mwillsey · 2024-02-06T21:23:19Z

Cool! Just so I understand, what is the overall motivation here? Is this to reduce complexity in the current implementation? Or is there a particular use-case/feature that you are looking to support?

dewert99 · 2024-02-06T22:38:46Z

The main motivation is flexibility, currently anyone using EGraph is locked into using the current explanation implementation, and tracking the nodes in each EClass. For example, I think it would be interesting to revisit the idea of database-based ematching within egg, and that implementation probably wouldn't want to store nodes in each EClass but may want to use the various RawEGraph hooks keeping the database in sync with egraph.

A potential side benefit is that with the core egraph abstracted away it may be less costly to revisit something like #284

oflatt · 2024-03-22T16:03:21Z

This looks like an interesting refactor. Is one of the motivations of this PR to support multiple or composable Analysis?
I wonder if one strategy here is to work on a new rust-based API for egglog. It may be comparable work to doing this huge refactor of egg.
I started a thread on that here: egraphs-good/egglog#232

dewert99 · 2024-03-22T18:35:44Z

This looks like an interesting refactor. Is one of the motivations of this PR to support multiple or composable Analysis?

Partially, my idea was to have a flexible lower-level plain egraph, that doesn't do extra work, (eg. managing explanations, keeping track of nodes in each class, managing a lattice based analysis (although this could already be turned off)). My specific motivation was https://github.com/dewert99/bat_egg_smt, a toy QF_UF. It uses a different strategy for explanations, and doesn't need to store a list of nodes in each eclass so I didn't want the overhead of a full egg::EGraph.

I wonder if one strategy here is to work on a new rust-based API for egglog. It may be comparable work to doing this huge refactor of egg.

I haven't looked as closely at egglog but it seem to work quite differently, so I'm guessing it would have quite different performance characteristics for an smt solver.

oflatt · 2024-03-22T19:08:49Z

Very cool! I'll be interested to see how you implement explanations and how the solver performs.

Egglog might have different performance characteristics, though it's usually faster than egg for a few reasons. Currently it doesn't have parent pointers, explanations, or uncanonical ids.

dewert99 · 2024-03-22T19:15:24Z

Egglog might have different performance characteristics, though it's usually faster than egg for a few reasons. Currently it doesn't have parent pointers, explanations, or uncanonical ids.

When you say that it doesn't have uncanonical ids, do you mean that it doesn't have store the original node for each uncanonical ids, and how does rebuilding work without parent pointers?

dewert99 · 2024-03-22T19:23:47Z

I'll be interested to see how you implement explanations and how the solver performs.

My original explanation implementation was similar to egg's but didn't worry about finding the shortest instead always giving the oldest explanation (so each explain node only need a next connection instead of a list of all connections), and used boolean literals (from the sat solver) as justifications instead of symbols.

Recently I tried switching to a different implementation based on https://www.cs.upc.edu/~oliveras/rta05.pdf "2.1 Union-find with an O(k log n) Explain operation" which seemed to give a minor performance benefit.

oflatt · 2024-03-25T16:23:35Z

Yeah, it doesn't store the original node for uncanonical ids.

Rebuiding works basically by running a query. For example, rebuilding all Add nodes can be though of a binary join:

(Add child1 child2 eclass1)
(Add child1 child2 eclass2)

The query finds Add nodes that differ in the eclass they belong to. It then makes them equal. I think egglog's rebuilding is eggs as a result, and timestamps help it only find new matches to this query.

oflatt · 2024-03-25T16:26:19Z

That's the paper I used to implement egg's explain function. When proof size optimization is off, it gives the oldest proof I believe.
Have you tried without_explanation_length_optimization()?

dewert99 · 2024-03-25T18:32:25Z

Rebuiding works basically by running a query. For example, rebuilding all Add nodes can be though of a binary join:
(Add child1 child2 eclass1)
(Add child1 child2 eclass2)
The query finds Add nodes that differ in the eclass they belong to. It then makes them equal. I think egglog's rebuilding is eggs as a result, and timestamps help it only find new matches to this query.

Sorry if I'm missing something, but if the database had (F child1 eclass1) and (F child2 eclass2), and I unioned child1 with child2 where child2 became canonical, the timestamp for (F child1 eclass1) would seem to need to be updated, to the rebuilding join would find it and union eclass1 with class2, and the association from child1 to the row (F child1 eclass1) would seem like a sort of parent pointer.

dewert99 · 2024-03-25T18:33:57Z

That's the paper I used to implement egg's explain function. When proof size optimization is off, it gives the oldest proof I believe.

Since I was only interested in the oldest proofs, I didn't want the overhead of keeping Vecs of connections

oflatt · 2024-03-26T22:28:26Z

Sorry if I'm missing something, but if the database had (F child1 eclass1) and (F child2 eclass2), and I unioned child1 with child2 where child2 became canonical, the timestamp for (F child1 eclass1) would seem to need to be updated, to the rebuilding join would find it and union eclass1 with class2, and the association from child1 to the row (F child1 eclass1) would seem like a sort of parent pointer.

Good point! The indices that we build for queries are like parent pointers.

oflatt · 2024-03-26T22:30:55Z

That's the paper I used to implement egg's explain function. When proof size optimization is off, it gives the oldest proof I believe.

Since I was only interested in the oldest proofs, I didn't want the overhead of keeping Vecs of connections

Is there a way to make the current implementation in egg as performant as yours when explanation optimization is disabled?
Sorry this is off topic for this PR

dewert99 · 2024-03-26T23:09:28Z

Is there a way to make the current implementation in egg as performant as yours when explanation optimization is disabled?

Probably not without making the explanation a generic/type parameter of EGraph, instead of having it be controlled at runtime

dewert99 · 2024-04-06T17:56:26Z

Now that #291, has been merged, I was wondering if you would consider merging this, in which case I would try and resolve conflicts. Otherwise, I will probably keep my fork mostly separate, maybe trying to upstream any optimizations.

mwillsey · 2024-04-08T17:34:42Z

I am currently thinking that it is not worth the increased complexity. Given that there isn't a ton of active development on egg, I'm really thinking we should prioritize simplicity. I do however like the idea of one day supporting DB-like matching in egg, but I'm not sure about this architecture. Happy to discuss further though!

dewert99 · 2024-04-08T22:27:42Z

I do however like the idea of one day supporting DB-like matching in egg, but I'm not sure about this architecture.

Do you have any suggestions about how to improve the architecture?

mwillsey · 2024-04-08T22:49:27Z

No. I’m not sure about any new architecture for egg just yet. I’m open to suggestions, but currently I don’t quite see the benefits these changes would bring to make it worth the additional code.

…

On Mon, Apr 8, 2024 at 3:28 PM David Ewert ***@***.***> wrote: I do however like the idea of one day supporting DB-like matching in egg, but I'm not sure about this architecture. Do you have any suggestions about how to improve the architecture? — Reply to this email directly, view it on GitHub <#296 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANTPTF7RW6UQEBHQTOZD7DY4MKXJAVCNFSM6AAAAABC4RORW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTG42DGOBTGE> . You are receiving this because you commented.Message ID: ***@***.***>

dewert99 added 6 commits January 3, 2024 10:21

Added nodes field to EGraph to avoid storing nodes in analysis …

1f838c6

…and `analysis_pending`

eliminated node field of ExplainNode (used EGraph.nodes instead)

3145a30

serde

c075cbf

serde

3187e36

Clarify id_to_expr and prevent copy_with_unions when explanations…

4d4c52d

… are disabled

Extracted out low level egraph API

8bcfe66

dewert99 added 2 commits February 7, 2024 17:13

doc-link fixes

8370122

Improved raw_union interface, fixed EGraph::dump and updated edition

c18f6d4

dewert99 mentioned this pull request Feb 13, 2024

Push Pop API #300

Draft

Make raw_union more flexible and add a fallible try_raw_rebuild

fb07f3b

dewert99 force-pushed the raw-egraph branch from 3981be9 to fb07f3b Compare March 21, 2024 16:42

This was referenced Mar 21, 2024

Raw egraph dewert99/plat-egg#3

Closed

Add nodes field to EGraph #291

Merged

This was referenced Apr 9, 2024

Become a (more) independent fork #309

Closed

Become a (more) independent fork dewert99/plat-egg#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracted out `RawEGraph` type #296

Extracted out `RawEGraph` type #296

dewert99 commented Feb 6, 2024

mwillsey commented Feb 6, 2024

dewert99 commented Feb 6, 2024

oflatt commented Mar 22, 2024

dewert99 commented Mar 22, 2024

oflatt commented Mar 22, 2024

dewert99 commented Mar 22, 2024 •

edited

Loading

dewert99 commented Mar 22, 2024

oflatt commented Mar 25, 2024

oflatt commented Mar 25, 2024

dewert99 commented Mar 25, 2024

dewert99 commented Mar 25, 2024

oflatt commented Mar 26, 2024

oflatt commented Mar 26, 2024

dewert99 commented Mar 26, 2024 •

edited

Loading

dewert99 commented Apr 6, 2024

mwillsey commented Apr 8, 2024

dewert99 commented Apr 8, 2024 •

edited

Loading

mwillsey commented Apr 8, 2024 via email

Extracted out RawEGraph type #296

Are you sure you want to change the base?

Extracted out RawEGraph type #296

Conversation

dewert99 commented Feb 6, 2024

mwillsey commented Feb 6, 2024

dewert99 commented Feb 6, 2024

oflatt commented Mar 22, 2024

dewert99 commented Mar 22, 2024

oflatt commented Mar 22, 2024

dewert99 commented Mar 22, 2024 • edited Loading

dewert99 commented Mar 22, 2024

oflatt commented Mar 25, 2024

oflatt commented Mar 25, 2024

dewert99 commented Mar 25, 2024

dewert99 commented Mar 25, 2024

oflatt commented Mar 26, 2024

oflatt commented Mar 26, 2024

dewert99 commented Mar 26, 2024 • edited Loading

dewert99 commented Apr 6, 2024

mwillsey commented Apr 8, 2024

dewert99 commented Apr 8, 2024 • edited Loading

mwillsey commented Apr 8, 2024 via email

Extracted out `RawEGraph` type #296

Extracted out `RawEGraph` type #296

dewert99 commented Mar 22, 2024 •

edited

Loading

dewert99 commented Mar 26, 2024 •

edited

Loading

dewert99 commented Apr 8, 2024 •

edited

Loading