Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heading in the direction of a query engine #29

Open
christhekeele opened this issue Mar 15, 2017 · 3 comments
Open

Heading in the direction of a query engine #29

christhekeele opened this issue Mar 15, 2017 · 3 comments

Comments

@christhekeele
Copy link

I've been tinkering with a similar concept recently (coyly called MatchQL). I set out to build an abstract MatchQL.Adapter with concrete rudimentary ETS, DETS, and Mnesia implementations. After a lot of research, I realized there just isn't enough manual locking capability to build a meaningful transaction in ETS or DETS, which invalidated a lot of the project, so I gave those up and was left with Mnesia.

At that point there isn't enough to differentiate that package from this one (that I referenced heavily during my own attempts), so I thought I'd shelve some thoughts I had around the package here before retiring the effort.

You've already polyfilled some SQL features for Mnesia:

adapter emulates query.select and query.order_bys behaviours

You've also expressed interest in supporting joins. From my own tinkering, that's much less trivial and would require some substantial rearchitecting of how queries are run, since you need to start dissecting an Ecto.Query into several :mensia.select calls, and properly recombining their results.

Once you have that figured out, though, you can polyfill several other SQL capabilities, since you've effectively built your own query planner and are halfway to a full-fledged SQL query engine that just happens to use Mnesia:

  • GROUP BY
  • SUM and ilk
  • HAVING
  • DISTINCT
  • UNION and ilk
  • OFFSET
  • LIMIT

I was wondering if you'd be interested in taking this project in that direction.


What you call a Ecto.Mnesia.Record.Context I'd termed a MatchQL.Query, representing a query against a single Mnesia table. Of course, only simple Ecto.Querys can translate into this limited form:

  • The match head interprets the table schema and converts the key-value table.property stuff into positional references
  • The match condition represents a WHERE clause
  • The match body filters out what's desired for the SELECT statement

(I fit these into a separate MatchQL.Spec struct independent of an explicit table reference.)

After execution, you apply the ORDER BY in post production. (MatchQL.Query was pretty much just the table reference, a MatchQL.Spec, and some order clauses.

The low hanging fruits are the other post-production clauses. You could throw DISTINCT, OFFSET, and LIMIT into there pretty easily.

Slightly harder are the GROUP BY, aggregates, and HAVING clauses. They have to come before the SELECT but after the actual table query.

They can be handled by:

  • Applying the non-virtual fields from the SELECT to a MatchQL.Query and getting the results
  • Inject aggregates into the result tuples where referenced by reducing and mapping over the results
  • Reducing over the new result tuples to GROUP BY criteria
  • Create a MatchQL.Spec out of HAVING just as you did WHERE but apply it to the raw result list instead of a Mnesia table

Joins are much more tricky, since you have to introspect the Ecto.Query to generate each individual MatchQL.Query, produce a topological sort to determine dependency order, and thread resolved variables in-between them. Only after the FROM+JOIN pipeline can the GROUP BY pipeline start.

Once you've done that, subqueries are mostly a recursive case of the topological-dependency logic, and UNIONs are just special-case post-processors that fully perform two queries and bang the result sets together, ordering them in the process.

Instead of trying to convert an Ecto.Query into a single-table MatchQL.Query, I was trying to introduce a higher-level MatchQL.Query.Plan data structure as the conversion target, that when executed would ideally work through all these phases using Flow. About then I noticed ecto_mnesia had done all the work I had done to date, only better.


I don't know if you have any interest in trying to build a full-fledged SQL query engine on top of Mnesia but I suspect implementing JOINs will force you to get most of the way there.

The main goal of my project was to become intimate with the Ecto, ETS, and Mnesia low-level APIs, which I feel I have succeeded at. The insight into how one would have to implement a query engine has already proved useful at work, especially in finally understanding (vs just remembering) why and when one has to use HAVING, which is a plus.

If you are interested in taking ecto_mnesia this direction, it would be cool to know in case I decide to pursue the concept further and contribute. If not, it's nice to know that my research notes for MatchQL got read by someone before getting scrapped. :)

@AndrewDryga
Copy link
Member

Wow, thats a huge pile of knowledge in a single issue 👍.

I had many weeks of thinking should it be included or not. Eventually I think that we could do many SQL emulating, but we need to be very explicit when we emulate this features, because people that will use adapter could shoot themselves in the foot in case they will not understand that there is a loot of data copying happens in place.

What is business case, why do you decided to create SQL over Mnesia? :)

@christhekeele
Copy link
Author

Most of my work was just to learn, I don't have a particular business case. :)

The main goal of my project was to become intimate with the Ecto, ETS, and Mnesia low-level APIs, which I feel I have succeeded at. The insight into how one would have to implement a query engine has already proved useful.

Additionally, I have been tinkering with writing game servers in Elixir. Half my entities need to be persisted and come from Postgres, but many are ephemeral with frequent access and searching but rare joins, and having them in memory is handy. But today I can't use ecto on both, and I need limited, if not fast, support for most sql operators for my tools to be reusable between the two repos.

Other usecases I can think of:

  • defer commiting to a SQL backend until more of a project has been written
  • allowing libraries that use ecto to give their users an in-memory option
  • run integration tests against an actual SQL implementation without needing to set up an external SQL runtime

@AndrewDryga
Copy link
Member

@christhekeele Sorry for delayed response. Too many projects in Elixir this days :).

I guess we can agree on following properties:

  1. Postpone full SQL compatibility untill adapter itself is stable and passes Ecto integration tests.
  2. Later we can build another adapter (eg. EctoMnesia.SQL) just on top of EctoMnesia, that will add support for all emulated features.

If this path is ok for you, I will be totally glad to see any new contributors. Even if it doesn't, I know that there are plenty of stuff I need help with and you are still welcome :).

We should also take look at some SQL-to-NoSQL libraries, for eg:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants