Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracted out RawEGraph type #296

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

dewert99
Copy link
Contributor

@dewert99 dewert99 commented Feb 6, 2024

This is another attempt with a similar motivation to #293 (It still assumes #291).

Instead of trying to work with compositional traits, I decided to extract out a RawEGraph type with raw versions of add, union and rebuild that each has various hooks, and then reimplement EGraph to use RawEGraph in a more backwards-compatible way. I also created the EGraphResidual type to represent what's left of a RawEGraph without its classes, so it could still be used while mutably borrowing data from an eclass. The EGraphResidual type has methods like find, lookup, and id_to_node and both RawEGraph and EGraph dereference to EGraphResidual so these implementations are shared.

None of the RawEGraph implementation is pub(crate), so it could be extracted into its own crate, although it depends on Id, Language, UnionFind, and RecExpr so they would need to be moved as well.

@mwillsey
Copy link
Member

mwillsey commented Feb 6, 2024

Cool! Just so I understand, what is the overall motivation here? Is this to reduce complexity in the current implementation? Or is there a particular use-case/feature that you are looking to support?

@dewert99
Copy link
Contributor Author

dewert99 commented Feb 6, 2024

The main motivation is flexibility, currently anyone using EGraph is locked into using the current explanation implementation, and tracking the nodes in each EClass. For example, I think it would be interesting to revisit the idea of database-based ematching within egg, and that implementation probably wouldn't want to store nodes in each EClass but may want to use the various RawEGraph hooks keeping the database in sync with egraph.

A potential side benefit is that with the core egraph abstracted away it may be less costly to revisit something like #284

@dewert99 dewert99 mentioned this pull request Feb 13, 2024
@oflatt
Copy link
Member

oflatt commented Mar 22, 2024

This looks like an interesting refactor. Is one of the motivations of this PR to support multiple or composable Analysis?
I wonder if one strategy here is to work on a new rust-based API for egglog. It may be comparable work to doing this huge refactor of egg.
I started a thread on that here: egraphs-good/egglog#232

@dewert99
Copy link
Contributor Author

This looks like an interesting refactor. Is one of the motivations of this PR to support multiple or composable Analysis?

Partially, my idea was to have a flexible lower-level plain egraph, that doesn't do extra work, (eg. managing explanations, keeping track of nodes in each class, managing a lattice based analysis (although this could already be turned off)). My specific motivation was https://github.com/dewert99/bat_egg_smt, a toy QF_UF. It uses a different strategy for explanations, and doesn't need to store a list of nodes in each eclass so I didn't want the overhead of a full egg::EGraph.

I wonder if one strategy here is to work on a new rust-based API for egglog. It may be comparable work to doing this huge refactor of egg.

I haven't looked as closely at egglog but it seem to work quite differently, so I'm guessing it would have quite different performance characteristics for an smt solver.

@oflatt
Copy link
Member

oflatt commented Mar 22, 2024

Very cool! I'll be interested to see how you implement explanations and how the solver performs.

Egglog might have different performance characteristics, though it's usually faster than egg for a few reasons. Currently it doesn't have parent pointers, explanations, or uncanonical ids.

@dewert99
Copy link
Contributor Author

dewert99 commented Mar 22, 2024

Egglog might have different performance characteristics, though it's usually faster than egg for a few reasons. Currently it doesn't have parent pointers, explanations, or uncanonical ids.

When you say that it doesn't have uncanonical ids, do you mean that it doesn't have store the original node for each uncanonical ids, and how does rebuilding work without parent pointers?

@dewert99
Copy link
Contributor Author

I'll be interested to see how you implement explanations and how the solver performs.

My original explanation implementation was similar to egg's but didn't worry about finding the shortest instead always giving the oldest explanation (so each explain node only need a next connection instead of a list of all connections), and used boolean literals (from the sat solver) as justifications instead of symbols.

Recently I tried switching to a different implementation based on https://www.cs.upc.edu/~oliveras/rta05.pdf "2.1 Union-find with an O(k log n) Explain operation" which seemed to give a minor performance benefit.

@oflatt
Copy link
Member

oflatt commented Mar 25, 2024

Yeah, it doesn't store the original node for uncanonical ids.

Rebuiding works basically by running a query. For example, rebuilding all Add nodes can be though of a binary join:

(Add child1 child2 eclass1)
(Add child1 child2 eclass2)

The query finds Add nodes that differ in the eclass they belong to. It then makes them equal. I think egglog's rebuilding is eggs as a result, and timestamps help it only find new matches to this query.

@oflatt
Copy link
Member

oflatt commented Mar 25, 2024

That's the paper I used to implement egg's explain function. When proof size optimization is off, it gives the oldest proof I believe.
Have you tried without_explanation_length_optimization()?

@dewert99
Copy link
Contributor Author

Rebuiding works basically by running a query. For example, rebuilding all Add nodes can be though of a binary join:

(Add child1 child2 eclass1)
(Add child1 child2 eclass2)

The query finds Add nodes that differ in the eclass they belong to. It then makes them equal. I think egglog's rebuilding is eggs as a result, and timestamps help it only find new matches to this query.

Sorry if I'm missing something, but if the database had (F child1 eclass1) and (F child2 eclass2), and I unioned child1 with child2 where child2 became canonical, the timestamp for (F child1 eclass1) would seem to need to be updated, to the rebuilding join would find it and union eclass1 with class2, and the association from child1 to the row (F child1 eclass1) would seem like a sort of parent pointer.

@dewert99
Copy link
Contributor Author

That's the paper I used to implement egg's explain function. When proof size optimization is off, it gives the oldest proof I believe.

Since I was only interested in the oldest proofs, I didn't want the overhead of keeping Vecs of connections

@oflatt
Copy link
Member

oflatt commented Mar 26, 2024

Sorry if I'm missing something, but if the database had (F child1 eclass1) and (F child2 eclass2), and I unioned child1 with child2 where child2 became canonical, the timestamp for (F child1 eclass1) would seem to need to be updated, to the rebuilding join would find it and union eclass1 with class2, and the association from child1 to the row (F child1 eclass1) would seem like a sort of parent pointer.

Good point! The indices that we build for queries are like parent pointers.

@oflatt
Copy link
Member

oflatt commented Mar 26, 2024

That's the paper I used to implement egg's explain function. When proof size optimization is off, it gives the oldest proof I believe.

Since I was only interested in the oldest proofs, I didn't want the overhead of keeping Vecs of connections

Is there a way to make the current implementation in egg as performant as yours when explanation optimization is disabled?
Sorry this is off topic for this PR

@dewert99
Copy link
Contributor Author

dewert99 commented Mar 26, 2024

Is there a way to make the current implementation in egg as performant as yours when explanation optimization is disabled?

Probably not without making the explanation a generic/type parameter of EGraph, instead of having it be controlled at runtime

@dewert99
Copy link
Contributor Author

dewert99 commented Apr 6, 2024

Now that #291, has been merged, I was wondering if you would consider merging this, in which case I would try and resolve conflicts. Otherwise, I will probably keep my fork mostly separate, maybe trying to upstream any optimizations.

@mwillsey
Copy link
Member

mwillsey commented Apr 8, 2024

I am currently thinking that it is not worth the increased complexity. Given that there isn't a ton of active development on egg, I'm really thinking we should prioritize simplicity. I do however like the idea of one day supporting DB-like matching in egg, but I'm not sure about this architecture. Happy to discuss further though!

@dewert99
Copy link
Contributor Author

dewert99 commented Apr 8, 2024

I do however like the idea of one day supporting DB-like matching in egg, but I'm not sure about this architecture.

Do you have any suggestions about how to improve the architecture?

@mwillsey
Copy link
Member

mwillsey commented Apr 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants