Introduce LogicalPlan invariants, begin automatically checking them #13651

wiedld · 2024-12-05T00:10:55Z

Which issue does this PR close?

Rationale for this change

The original discussion included implicit changes which can cause problems when trying to upgrade DF. One class of bugs are related to user-constructed LPs, and the mutations of these LPs. This PR is a first step to programmatically enforce the rules of what should, and should not, be done.

What changes are included in this PR?

defined what must always be true for an LP.
- This applies to all LPs, including those constructed by user (e.g. own language frontends).
define the contract for semantic mutation.
- The LP may not be executable before the Analyzer passes.
- The LP must be executable after the Analyzer passes.
define what further can be mutated during the LP optimizer passes.
- The LP must be executable before, and after, the optimizer passes.
- The LP schema cannot be mutated by the optimizer.

Are these changes tested?

By existing tests, and are benchmarked for impact to planning time.

Are there any user-facing changes?

There is a new LogicalPlan::check_invariants public api.

…tation

…as, after each optimizer pass

datafusion/optimizer/src/analyzer/mod.rs

datafusion/optimizer/src/optimizer.rs

alamb

Thank you @wiedld -- I really like this idea

I filed an issue for this idea here: #13652

@findepi @jonahgao or @Omega359 I wonder if you have any thoughts on this basic idea?

datafusion/optimizer/src/optimizer.rs

Omega359 · 2024-12-05T03:24:50Z

I love the idea of having additional testing like this! It's definitely out of my area of expertise but a few questions:

Should invariants be pluggable? For example, Influxdb might have additional invariants that another derivative of DF does not (or vice versa)?
Is the invariant testing overhead acceptable for production or should it there be the capability to disable it (enabled by default of course!) ?
Is there an equivalent set of invariants elsewhere in the core besides the LP (maybe physical plan? Just guessing here as I'm definitely no expert here) where this approach could be duplicated ?

wiedld · 2024-12-05T04:37:57Z

datafusion/optimizer/src/optimizer.rs

+    // verify invariant: equivalent schema across union inputs
+    // assert_unions_are_valid(check_name, plan)?;
+
+    // TODO: trait API and provide extension on the Optimizer to define own validations?


Here is a mention of the extensibility of invariants. Options include:

for general invariants:

defined as being checked before/after each OptimizerRule, and applied here in check_plan() (or equivalent code)

we could provide Optimizer.invariants = Vec<Arc<dyn InvariantCheck>> for user-defined invariants

for invariants specific for a given OptimizerRule:

we could provide OptimizerRule::check_invariants() such that certain invariants are only checked for a given rule (instead of all rules)

for a user-defined OptimizerRule, users can also check their own invariants

Ditto for the AnalyzerRule passes. Altho I wasn't sure about how much is added complexity and planning time overhead - as @Omega359 mentions we could make it configurable (e.g. run for CI and debugging in downstream projects).

This WIP is about proposing different ideas of what we could do. 🤔

Maybe it can be controlled through environment variables, similar to RUST_LOG or RUST_BACKTRACE. Enable it for debugging when problems are encountered or during an upgrade.

do you have example use-case for user-defined plan invariants?

We have some special invariants for our SortPreservingMerge replacement, ProgressiveEval (related to time ranges of parquet files) that would be great to be able to encode

Maybe it can be controlled through environment variables, similar to RUST_LOG or RUST_BACKTRACE. Enable it for debugging when problems are encountered or during an upgrade.

We could also add it as a debug_assert! after each optimizer pass and call the real validation

After analyze

After all the optimizer passes

We have some special invariants for our SortPreservingMerge replacement, ProgressiveEval (related to time ranges of parquet files) that would be great to be able to encode

is this about LogicalPlan::Extension? I agree it makes sense to support validation of these if we validate the overall plan.

I've added this to the followup items list: #13652 (comment)

datafusion/optimizer/src/optimizer.rs

findepi · 2024-12-05T13:28:31Z

datafusion/optimizer/src/optimizer.rs

+/// This invariant is subject to change.
+/// refer: <https://github.com/apache/datafusion/issues/13525#issuecomment-2494046463>
+fn assert_unique_field_names(plan: &LogicalPlan) -> Result<()> {
+    plan.schema().check_names()?;


Is check_names also called whenever creating new DFSchema?

Yes, on every creation. But not after every merge. Which should be ok if no bug is introduced in the merge -- altho I would prefer to add the check there.

findepi · 2024-12-05T13:31:26Z

I like the verification. The only concern is the overhead of doing it, especially for large plans. Every optimizer pass gets a new verification of all plan nodes. i hope one day we go towards more iterative optimization and then we should be able to verify plan invariants locally. Which would guarantee that the verification overhead is linearly proportional to number of modifications applied to the plan. This would require eg that get_type is a constant operation (#12604), since plan verification should also include that the types do match.

alamb · 2024-12-06T20:53:29Z

I think it would also be great if we could consider moving the invariant check directly into LogicalPlan -- for exmaple
LogicalPlan::check_invariants

This would make this function easier to discover and for others to understand what the invariants are.

…sus assertions made after a given analyzer or optimizer pass

datafusion/optimizer/src/analyzer/mod.rs

datafusion/expr/src/logical_plan/plan.rs

wiedld · 2024-12-16T05:52:44Z

The updates focus on dividing out LP invariants, vs analyzer pass checks, vs optimizer pass checks (including invariants listed as the analyzer's responsibility). I added in reference links to invariants in the docs, in an attempt to delineate an invariant vs a valid plan. But I'm muddy on this boundary.

I'll make it configurable (or debug only) after I get feedback on this^^. 🙏🏼

alamb

This is looking great @wiedld -- thank you

I think the basic structure of assert_invariants is looking good.

I had a suggestion for how to combine the two different invariant checks into a single API.

In terms of planning:

Perhaps we can begin creating an epic for "Enforcing Invariants" and list items there to follow up on (like "Define the invariants for Union plans" for example)

In terms of this PR, once we solidify the API the only remaining concern I have is with potential performance slowdown of re-checking the invariants. When we have the code ready we can run the sql_planning benchmark to make sure this PR doesn't slow down planning. If it does, we can perhaps only run the invariant checks in debug builds

datafusion/optimizer/src/analyzer/mod.rs

datafusion/expr/src/logical_plan/plan.rs

datafusion/expr/src/logical_plan/mod.rs

…nges

…e (valid semantic plan) vs basic LP invariants

…o a TableScan's filter clause

datafusion/expr/src/logical_plan/invariants.rs

wiedld · 2024-12-18T00:17:41Z

Ran all sql_planner benchmarks once, and then re-ran anything showing 1.09 to 1.24 ratio difference (max 3.3ms change in 17ms total).

all benchmarks

c3d-standard-8 debian-12

group                                         main                                   with_invariants_check
-----                                         ----                                   ---------------------
logical_aggregate_with_join                   1.00   1015.8±9.56µs        ? ?/sec    1.00  1015.9±10.30µs        ? ?/sec
logical_select_all_from_1000                  1.00      4.7±0.03ms        ? ?/sec    1.01      4.8±0.01ms        ? ?/sec
logical_select_one_from_700                   1.00    762.6±7.60µs        ? ?/sec    1.00    763.6±5.10µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    729.3±5.35µs        ? ?/sec    1.00    732.4±4.83µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    712.5±4.97µs        ? ?/sec    1.00    714.2±5.51µs        ? ?/sec
physical_intersection                         1.00   1729.6±9.00µs        ? ?/sec    1.08   1872.4±9.49µs        ? ?/sec
physical_join_consider_sort                   1.00      2.4±0.02ms        ? ?/sec    1.19      2.9±0.02ms        ? ?/sec
physical_join_distinct                        1.00    702.1±8.19µs        ? ?/sec    1.01    706.8±6.23µs        ? ?/sec
physical_many_self_joins                      1.00     13.8±0.09ms        ? ?/sec    1.24     17.1±0.16ms        ? ?/sec
physical_plan_clickbench_all                  1.00    187.3±0.61ms        ? ?/sec    1.00    187.4±0.46ms        ? ?/sec
physical_plan_clickbench_q1                   1.01      2.7±0.02ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q10                  1.01      3.6±0.04ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
physical_plan_clickbench_q11                  1.00      3.6±0.04ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec
physical_plan_clickbench_q12                  1.00      3.8±0.02ms        ? ?/sec    1.00      3.8±0.02ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      3.4±0.02ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      3.6±0.02ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec
physical_plan_clickbench_q15                  1.00      3.5±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      3.0±0.02ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_clickbench_q17                  1.00      3.1±0.03ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_clickbench_q18                  1.00      2.9±0.02ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
physical_plan_clickbench_q19                  1.00      3.6±0.02ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec
physical_plan_clickbench_q2                   1.01      2.9±0.03ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
physical_plan_clickbench_q20                  1.00      2.7±0.02ms        ? ?/sec    1.01      2.7±0.02ms        ? ?/sec
physical_plan_clickbench_q21                  1.00      2.9±0.02ms        ? ?/sec    1.01      2.9±0.02ms        ? ?/sec
physical_plan_clickbench_q22                  1.00      3.7±0.02ms        ? ?/sec    1.01      3.7±0.02ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      4.1±0.02ms        ? ?/sec    1.01      4.2±0.02ms        ? ?/sec
physical_plan_clickbench_q24                  1.00      4.9±0.02ms        ? ?/sec    1.04      5.0±0.02ms        ? ?/sec
physical_plan_clickbench_q25                  1.00      3.2±0.02ms        ? ?/sec    1.02      3.2±0.02ms        ? ?/sec
physical_plan_clickbench_q26                  1.00      2.9±0.01ms        ? ?/sec    1.02      3.0±0.02ms        ? ?/sec
physical_plan_clickbench_q27                  1.00      3.2±0.02ms        ? ?/sec    1.02      3.3±0.02ms        ? ?/sec
physical_plan_clickbench_q28                  1.00      3.8±0.02ms        ? ?/sec    1.02      3.9±0.02ms        ? ?/sec
physical_plan_clickbench_q29                  1.00      4.8±0.03ms        ? ?/sec    1.01      4.9±0.02ms        ? ?/sec
physical_plan_clickbench_q3                   1.02      2.9±0.04ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
physical_plan_clickbench_q30                  1.00     16.0±0.06ms        ? ?/sec    1.00     16.0±0.07ms        ? ?/sec
physical_plan_clickbench_q31                  1.00      3.9±0.02ms        ? ?/sec    1.01      3.9±0.02ms        ? ?/sec
physical_plan_clickbench_q32                  1.00      3.9±0.02ms        ? ?/sec    1.01      3.9±0.02ms        ? ?/sec
physical_plan_clickbench_q33                  1.00      3.5±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
physical_plan_clickbench_q34                  1.00      3.1±0.01ms        ? ?/sec    1.01      3.1±0.02ms        ? ?/sec
physical_plan_clickbench_q35                  1.00      3.3±0.02ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_plan_clickbench_q36                  1.01      4.4±0.03ms        ? ?/sec    1.00      4.4±0.02ms        ? ?/sec
physical_plan_clickbench_q37                  1.00      4.4±0.02ms        ? ?/sec    1.01      4.4±0.02ms        ? ?/sec
physical_plan_clickbench_q38                  1.00      4.4±0.02ms        ? ?/sec    1.00      4.4±0.02ms        ? ?/sec
physical_plan_clickbench_q39                  1.00      4.0±0.02ms        ? ?/sec    1.01      4.0±0.02ms        ? ?/sec
physical_plan_clickbench_q4                   1.02      2.7±0.03ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
physical_plan_clickbench_q40                  1.00      4.5±0.02ms        ? ?/sec    1.00      4.5±0.02ms        ? ?/sec
physical_plan_clickbench_q41                  1.00      4.3±0.02ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec
physical_plan_clickbench_q42                  1.00      4.1±0.02ms        ? ?/sec    1.01      4.1±0.02ms        ? ?/sec
physical_plan_clickbench_q43                  1.00      4.1±0.02ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
physical_plan_clickbench_q44                  1.00      2.8±0.02ms        ? ?/sec    1.00      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q45                  1.00      2.8±0.01ms        ? ?/sec    1.00      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q46                  1.01      3.3±0.02ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_plan_clickbench_q47                  1.00      3.9±0.02ms        ? ?/sec    1.00      3.9±0.02ms        ? ?/sec
physical_plan_clickbench_q48                  1.00      4.4±0.02ms        ? ?/sec    1.00      4.4±0.02ms        ? ?/sec
physical_plan_clickbench_q49                  1.00      4.6±0.03ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
physical_plan_clickbench_q5                   1.01      2.9±0.02ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q6                   1.01      2.9±0.02ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
physical_plan_clickbench_q7                   1.02      3.4±0.03ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
physical_plan_clickbench_q8                   1.01      3.1±0.03ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_clickbench_q9                   1.02      3.4±0.03ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_plan_tpcds_all                       1.00   1178.8±3.25ms        ? ?/sec    1.00   1183.7±0.95ms        ? ?/sec
physical_plan_tpch_all                        1.00     73.1±0.31ms        ? ?/sec    1.00     73.1±0.21ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.6±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.6±0.01ms        ? ?/sec    1.01      3.6±0.01ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.1±0.01ms        ? ?/sec    1.01      3.1±0.01ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.5±0.01ms        ? ?/sec    1.01      2.5±0.01ms        ? ?/sec
physical_plan_tpch_q13                        1.00   1833.9±8.83µs        ? ?/sec    1.00  1830.2±10.09µs        ? ?/sec
physical_plan_tpch_q14                        1.00      2.2±0.01ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.2±0.01ms        ? ?/sec    1.01      3.2±0.01ms        ? ?/sec
physical_plan_tpch_q17                        1.00      2.9±0.01ms        ? ?/sec    1.01      2.9±0.01ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.2±0.01ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
physical_plan_tpch_q19                        1.00      5.4±0.02ms        ? ?/sec    1.01      5.4±0.02ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.3±0.03ms        ? ?/sec    1.01      6.3±0.02ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.8±0.01ms        ? ?/sec    1.00      3.8±0.01ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.1±0.02ms        ? ?/sec    1.00      5.1±0.01ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.9±0.01ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.5±0.01ms        ? ?/sec    1.01      2.6±0.01ms        ? ?/sec
physical_plan_tpch_q4                         1.00  1995.3±10.65µs        ? ?/sec    1.01      2.0±0.01ms        ? ?/sec
physical_plan_tpch_q5                         1.00      3.5±0.01ms        ? ?/sec    1.02      3.6±0.01ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1364.7±12.81µs        ? ?/sec    1.01   1375.4±8.17µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.7±0.02ms        ? ?/sec    1.02      4.8±0.02ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.6±0.02ms        ? ?/sec    1.01      5.7±0.03ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.3±0.02ms        ? ?/sec    1.01      4.4±0.01ms        ? ?/sec
physical_select_aggregates_from_200           1.00     24.6±0.09ms        ? ?/sec    1.01     24.7±0.09ms        ? ?/sec
physical_select_all_from_1000                 1.00     44.8±0.44ms        ? ?/sec    1.03     46.3±0.27ms        ? ?/sec
physical_select_one_from_700                  1.00      2.5±0.01ms        ? ?/sec    1.15      2.9±0.02ms        ? ?/sec
physical_theta_join_consider_sort             1.00      2.7±0.01ms        ? ?/sec    1.17      3.2±0.02ms        ? ?/sec
physical_unnest_to_join                       1.00      2.5±0.01ms        ? ?/sec    1.09      2.7±0.02ms        ? ?/sec
with_param_values_many_columns                1.02    206.4±1.18µs        ? ?/sec    1.00    202.4±0.89µs        ? ?/sec

confirmation run

group                                         main                                   with_invariants_check
-----                                         ----                                   ---------------------
physical_join_consider_sort                   1.00      2.4±0.01ms        ? ?/sec    1.19      2.9±0.01ms        ? ?/sec
physical_many_self_joins                      1.00     13.9±0.09ms        ? ?/sec    1.23     17.0±0.11ms        ? ?/sec
physical_select_one_from_700                  1.00      2.5±0.01ms        ? ?/sec    1.15      2.9±0.02ms        ? ?/sec
physical_theta_join_consider_sort             1.00      2.7±0.02ms        ? ?/sec    1.15      3.2±0.01ms        ? ?/sec
physical_unnest_to_join                       1.00      2.4±0.01ms        ? ?/sec    1.10      2.7±0.02ms        ? ?/sec

The performance change was minimized because we didn't add that many extra timepoints of checking. Specifically:

(new check): the subset of InvariantLevel::Always checks occurs once before the analyzer starts
(existing/main check): the full InvariantLevel::Executable check occurs once after analyzer is done.
(existing/main check): each optimizer pass only does a check that the schema is not changed
(new check): the full InvariantLevel::Executable check occurs once before all optimizer passes start
(existing/main check): the full InvariantLevel::Executable check occurs once after all optimizer passes start

I propose that we add another full InvariantLevel::Executable between each optimization pass, that only runs in debug mode (since that will be a performance impact as @findepi mentioned).

I can also make some of these check listed above into debug only, if the current performance change is unacceptable.

wiedld · 2024-12-18T06:28:27Z

(existing/main check): each optimizer pass only does a check that the schema is not changed

We could also reduce the time on this one by making it compile time with Arc::ptr_eq, but that means (a) a stricter contract (metadata & nullability cannot change) and (b) it requires code changes to chase down the current pointer_eq failures. 🤔

alamb · 2024-12-20T21:45:49Z

I propose that we add another full InvariantLevel::Executable between each optimization pass, that only runs in debug mode (since that will be a performance impact as @findepi mentioned).

Perhaps we can do this as a follow on PR

I can also make some of these check listed above into debug only, if the current performance change is unacceptable.

Yes, I think this is important for two reasons:

I think we can't introduce a 25% planning performance regression
Adidng a check to help development that will generate errors in plans that used to run to completion will likely be seen as a regression for users

I am checking out the rest of this PR now as well

alamb

Thank you @wiedld - I think this is a nice step towards keeping Datafusion stable for building other systems. Other than the performance regression I think this PR is ready to go.

datafusion/expr/src/logical_plan/invariants.rs

datafusion/optimizer/src/analyzer/mod.rs

datafusion/optimizer/src/optimizer.rs

…nges

…ter all optimizer passes, except in debug mode it runs after each pass.

wiedld · 2024-12-24T04:08:03Z

Fixed the performance regression. It wasn't where we thought it was.

The problem was a recursive check (down the LP) of the check_fields within the assert_unique_field_names(). I've removed the recursive nature of this check.

Final numbers, no regression

$ critcmp main invariants__no_recursive_namecheck__schema_check_per_optimizer_pass
group                                         invariants__no_recursive_namecheck__schema_check_per_optimizer_pass    main
-----                                         -------------------------------------------------------------------    ----
logical_aggregate_with_join                   1.00    987.0±5.66µs        ? ?/sec                                    1.02  1007.8±10.29µs        ? ?/sec
logical_select_all_from_1000                  1.04      5.0±0.04ms        ? ?/sec                                    1.00      4.8±0.04ms        ? ?/sec
logical_select_one_from_700                   1.00    739.4±5.38µs        ? ?/sec                                    1.01    743.5±5.70µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    703.2±5.00µs        ? ?/sec                                    1.02    718.2±4.64µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    685.1±5.28µs        ? ?/sec                                    1.02    699.6±3.40µs        ? ?/sec
physical_intersection                         1.00  1726.0±10.72µs        ? ?/sec                                    1.00  1726.6±15.43µs        ? ?/sec
physical_join_consider_sort                   1.00      2.4±0.02ms        ? ?/sec                                    1.02      2.5±0.01ms        ? ?/sec
physical_join_distinct                        1.00    677.2±5.94µs        ? ?/sec                                    1.02    689.2±7.02µs        ? ?/sec
physical_many_self_joins                      1.00     13.6±0.07ms        ? ?/sec                                    1.02     13.9±0.07ms        ? ?/sec
physical_plan_clickbench_all                  1.00    181.7±0.39ms        ? ?/sec                                    1.00    181.5±0.36ms        ? ?/sec
physical_plan_clickbench_q1                   1.00      2.6±0.01ms        ? ?/sec                                    1.00      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q10                  1.00      3.5±0.01ms        ? ?/sec                                    1.00      3.5±0.01ms        ? ?/sec
physical_plan_clickbench_q11                  1.00      3.5±0.01ms        ? ?/sec                                    1.00      3.5±0.01ms        ? ?/sec
physical_plan_clickbench_q12                  1.00      3.6±0.01ms        ? ?/sec                                    1.01      3.7±0.22ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      3.3±0.01ms        ? ?/sec                                    1.00      3.3±0.01ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      3.5±0.01ms        ? ?/sec                                    1.00      3.5±0.01ms        ? ?/sec
physical_plan_clickbench_q15                  1.00      3.4±0.01ms        ? ?/sec                                    1.00      3.4±0.01ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      3.0±0.01ms        ? ?/sec                                    1.00      2.9±0.01ms        ? ?/sec
physical_plan_clickbench_q17                  1.00      3.0±0.01ms        ? ?/sec                                    1.00      3.0±0.01ms        ? ?/sec
physical_plan_clickbench_q18                  1.01      2.8±0.01ms        ? ?/sec                                    1.00      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q19                  1.00      3.5±0.01ms        ? ?/sec                                    1.00      3.4±0.01ms        ? ?/sec
physical_plan_clickbench_q2                   1.00      2.8±0.01ms        ? ?/sec                                    1.00      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q20                  1.00      2.6±0.01ms        ? ?/sec                                    1.00      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q21                  1.01      2.8±0.01ms        ? ?/sec                                    1.00      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q22                  1.00      3.6±0.01ms        ? ?/sec                                    1.01      3.6±0.01ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      4.0±0.01ms        ? ?/sec                                    1.00      4.0±0.02ms        ? ?/sec
physical_plan_clickbench_q24                  1.01      4.8±0.02ms        ? ?/sec                                    1.00      4.7±0.01ms        ? ?/sec
physical_plan_clickbench_q25                  1.01      3.2±0.02ms        ? ?/sec                                    1.00      3.1±0.01ms        ? ?/sec
physical_plan_clickbench_q26                  1.01      2.9±0.01ms        ? ?/sec                                    1.00      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q27                  1.00      3.2±0.01ms        ? ?/sec                                    1.00      3.2±0.01ms        ? ?/sec
physical_plan_clickbench_q28                  1.00      3.7±0.01ms        ? ?/sec                                    1.00      3.7±0.02ms        ? ?/sec
physical_plan_clickbench_q29                  1.00      4.7±0.02ms        ? ?/sec                                    1.00      4.7±0.01ms        ? ?/sec
physical_plan_clickbench_q3                   1.01      2.8±0.01ms        ? ?/sec                                    1.00      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q30                  1.01     15.5±0.41ms        ? ?/sec                                    1.00     15.3±0.04ms        ? ?/sec
physical_plan_clickbench_q31                  1.00      3.8±0.02ms        ? ?/sec                                    1.00      3.8±0.01ms        ? ?/sec
physical_plan_clickbench_q32                  1.00      3.8±0.01ms        ? ?/sec                                    1.00      3.8±0.01ms        ? ?/sec
physical_plan_clickbench_q33                  1.00      3.4±0.01ms        ? ?/sec                                    1.00      3.4±0.01ms        ? ?/sec
physical_plan_clickbench_q34                  1.00      3.1±0.01ms        ? ?/sec                                    1.00      3.1±0.01ms        ? ?/sec
physical_plan_clickbench_q35                  1.00      3.2±0.01ms        ? ?/sec                                    1.00      3.2±0.01ms        ? ?/sec
physical_plan_clickbench_q36                  1.00      4.2±0.01ms        ? ?/sec                                    1.00      4.2±0.02ms        ? ?/sec
physical_plan_clickbench_q37                  1.00      4.3±0.01ms        ? ?/sec                                    1.00      4.3±0.02ms        ? ?/sec
physical_plan_clickbench_q38                  1.00      4.3±0.02ms        ? ?/sec                                    1.00      4.3±0.01ms        ? ?/sec
physical_plan_clickbench_q39                  1.00      3.8±0.01ms        ? ?/sec                                    1.00      3.8±0.01ms        ? ?/sec
physical_plan_clickbench_q4                   1.00      2.6±0.01ms        ? ?/sec                                    1.00      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q40                  1.00      4.3±0.01ms        ? ?/sec                                    1.00      4.3±0.02ms        ? ?/sec
physical_plan_clickbench_q41                  1.00      4.1±0.01ms        ? ?/sec                                    1.00      4.1±0.02ms        ? ?/sec
physical_plan_clickbench_q42                  1.00      4.0±0.02ms        ? ?/sec                                    1.00      4.0±0.01ms        ? ?/sec
physical_plan_clickbench_q43                  1.00      4.0±0.01ms        ? ?/sec                                    1.00      4.0±0.01ms        ? ?/sec
physical_plan_clickbench_q44                  1.00      2.7±0.01ms        ? ?/sec                                    1.00      2.7±0.01ms        ? ?/sec
physical_plan_clickbench_q45                  1.00      2.8±0.01ms        ? ?/sec                                    1.00      2.7±0.01ms        ? ?/sec
physical_plan_clickbench_q46                  1.00      3.2±0.01ms        ? ?/sec                                    1.00      3.3±0.01ms        ? ?/sec
physical_plan_clickbench_q47                  1.00      3.8±0.01ms        ? ?/sec                                    1.00      3.8±0.01ms        ? ?/sec
physical_plan_clickbench_q48                  1.00      4.3±0.01ms        ? ?/sec                                    1.00      4.3±0.01ms        ? ?/sec
physical_plan_clickbench_q49                  1.00      4.5±0.02ms        ? ?/sec                                    1.00      4.5±0.01ms        ? ?/sec
physical_plan_clickbench_q5                   1.00      2.8±0.01ms        ? ?/sec                                    1.00      2.7±0.01ms        ? ?/sec
physical_plan_clickbench_q6                   1.00      2.8±0.01ms        ? ?/sec                                    1.00      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q7                   1.01      3.3±0.01ms        ? ?/sec                                    1.00      3.3±0.01ms        ? ?/sec
physical_plan_clickbench_q8                   1.00      3.0±0.01ms        ? ?/sec                                    1.00      3.0±0.01ms        ? ?/sec
physical_plan_clickbench_q9                   1.00      3.2±0.02ms        ? ?/sec                                    1.00      3.3±0.02ms        ? ?/sec
physical_plan_tpcds_all                       1.00   1160.9±2.22ms        ? ?/sec                                    1.00   1161.8±5.74ms        ? ?/sec
physical_plan_tpch_all                        1.00     72.5±0.23ms        ? ?/sec                                    1.00     72.7±0.20ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.6±0.01ms        ? ?/sec                                    1.00      2.6±0.01ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.5±0.01ms        ? ?/sec                                    1.00      3.5±0.01ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.1±0.01ms        ? ?/sec                                    1.01      3.2±0.17ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.5±0.01ms        ? ?/sec                                    1.00      2.5±0.01ms        ? ?/sec
physical_plan_tpch_q13                        1.00   1832.8±8.27µs        ? ?/sec                                    1.00   1832.2±7.35µs        ? ?/sec
physical_plan_tpch_q14                        1.00      2.2±0.01ms        ? ?/sec                                    1.00      2.2±0.01ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.2±0.01ms        ? ?/sec                                    1.00      3.2±0.01ms        ? ?/sec
physical_plan_tpch_q17                        1.00      2.9±0.01ms        ? ?/sec                                    1.00      2.9±0.01ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.2±0.01ms        ? ?/sec                                    1.00      3.2±0.01ms        ? ?/sec
physical_plan_tpch_q19                        1.00      5.3±0.01ms        ? ?/sec                                    1.00      5.3±0.02ms        ? ?/sec
physical_plan_tpch_q2                         1.01      6.4±0.03ms        ? ?/sec                                    1.00      6.3±0.02ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.9±0.01ms        ? ?/sec                                    1.00      3.8±0.02ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.1±0.01ms        ? ?/sec                                    1.00      5.1±0.02ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.9±0.01ms        ? ?/sec                                    1.00      2.9±0.01ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.5±0.01ms        ? ?/sec                                    1.00      2.5±0.01ms        ? ?/sec
physical_plan_tpch_q4                         1.00   1994.6±9.01µs        ? ?/sec                                    1.00   1993.7±9.04µs        ? ?/sec
physical_plan_tpch_q5                         1.01      3.6±0.02ms        ? ?/sec                                    1.00      3.5±0.01ms        ? ?/sec
physical_plan_tpch_q6                         1.00   1350.1±7.99µs        ? ?/sec                                    1.00   1349.8±7.07µs        ? ?/sec
physical_plan_tpch_q7                         1.01      4.7±0.02ms        ? ?/sec                                    1.00      4.7±0.01ms        ? ?/sec
physical_plan_tpch_q8                         1.01      5.6±0.03ms        ? ?/sec                                    1.00      5.6±0.02ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.3±0.02ms        ? ?/sec                                    1.00      4.3±0.01ms        ? ?/sec
physical_select_aggregates_from_200           1.00     24.3±0.09ms        ? ?/sec                                    1.00     24.3±0.05ms        ? ?/sec
physical_select_all_from_1000                 1.01     43.2±0.35ms        ? ?/sec                                    1.00     42.6±0.41ms        ? ?/sec
physical_select_one_from_700                  1.00      2.5±0.01ms        ? ?/sec                                    1.02      2.5±0.01ms        ? ?/sec
physical_theta_join_consider_sort             1.00      2.7±0.02ms        ? ?/sec                                    1.02      2.8±0.01ms        ? ?/sec
physical_unnest_to_join                       1.00      2.4±0.01ms        ? ?/sec                                    1.02      2.5±0.01ms        ? ?/sec
with_param_values_many_columns                1.00    183.9±0.62µs        ? ?/sec                                    1.00    183.2±0.56µs        ? ?/sec

The release vs debug mode only has a single difference in the checks. The debug mode will run a full InvariantLevel::Executable check per each optimizer pass (instead of after all pass).

…cks without impa ct: * assert_valid_optimization can run each optimizer pass * remove the recursive cehck_fields, which caused the performance regression * the full LP Invariants::Executable can only run in debug

alamb

Thank you @wiedld -- this looks like a great improvement

I left some comments but I don't they are required

In my opinion this is needed before merging:

I will wait 24 to allow time for anyone else to comment
I am also double-checking the benchmarks

If you have time I do think some of my comments would improve the usability of this PR as well (e.g. the message changes)

Also, shall we file some follow on tickets for follow on work?

datafusion/expr/src/logical_plan/invariants.rs

datafusion/expr/src/utils.rs

datafusion/optimizer/src/analyzer/mod.rs

datafusion/optimizer/src/optimizer.rs

alamb · 2024-12-24T12:53:13Z

datafusion/optimizer/src/optimizer.rs

+                        })?;
+
+                    // run LP invariant checks only in debug
+                    #[cfg(debug_assertions)]


I also double checked this is the right name: https://doc.rust-lang.org/reference/conditional-compilation.html#debug_assertions

datafusion/optimizer/src/optimizer.rs

alamb · 2024-12-24T12:56:24Z

datafusion/optimizer/src/optimizer.rs

@@ -529,7 +567,9 @@ mod tests {
        let err = opt.optimize(plan, &config, &observe).unwrap_err();
        assert_eq!(
            "Optimizer rule 'get table_scan rule' failed\n\
-            caused by\nget table_scan rule\ncaused by\n\
+            caused by\n\
+            check_optimizer_specific_invariants after optimizer pass: get table_scan rule\n\


that is nicer

datafusion/expr/src/logical_plan/invariants.rs

berkaysynnada · 2024-12-24T13:33:32Z

datafusion/optimizer/src/optimizer.rs

                .and_then(|tnr| {
-                    assert_schema_is_the_same(rule.name(), &starting_schema, &tnr.data)?;
+                    // run checks optimizer invariant checks, per pass


what did you mean exactly here?

The correct statement is // run checks optimizer invariant checks, per optimizer rule applied. And then multiple rules are applied per pass. Statement is fixed. Thanks!.

alamb · 2024-12-24T13:58:56Z

My performance benchmarks show also show now difference ✅

++ critcmp main 13525_invariant-checking-for-implicit-LP-changes
group                                         13525_invariant-checking-for-implicit-LP-changes    main
-----                                         ------------------------------------------------    ----
logical_aggregate_with_join                   1.00  1498.7±38.07µs        ? ?/sec                 1.00  1494.0±18.89µs        ? ?/sec
logical_select_all_from_1000                  1.00      5.3±0.05ms        ? ?/sec                 1.00      5.3±0.05ms        ? ?/sec
logical_select_one_from_700                   1.00  1169.9±26.47µs        ? ?/sec                 1.01  1179.5±14.41µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00  1154.9±13.48µs        ? ?/sec                 1.00  1158.7±16.27µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00  1140.9±18.36µs        ? ?/sec                 1.00  1140.3±14.97µs        ? ?/sec
physical_intersection                         1.00      2.5±0.02ms        ? ?/sec                 1.00      2.5±0.02ms        ? ?/sec
physical_join_consider_sort                   1.00      3.3±0.02ms        ? ?/sec                 1.00      3.3±0.04ms        ? ?/sec
physical_join_distinct                        1.00  1130.5±14.83µs        ? ?/sec                 1.00  1133.2±15.97µs        ? ?/sec
physical_many_self_joins                      1.01     17.5±0.10ms        ? ?/sec                 1.00     17.4±0.08ms        ? ?/sec
physical_plan_clickbench_all                  1.00    227.8±1.70ms        ? ?/sec                 1.01    230.0±3.38ms        ? ?/sec

jonahgao

LGTM👍, thanks @wiedld

datafusion/expr/src/logical_plan/invariants.rs

datafusion/optimizer/src/optimizer.rs

wiedld · 2024-12-24T21:25:14Z

Thank you for the reviews.
I made the requested changes to the terminology and error messages.

.

Also, shall we file some follow on tickets for follow on work?

I have a task lists here for follow up items. I was planning to dig more into code to assess viability (& I can make tickets then too) as I start implementing.

datafusion/sqllogictest/test_files/subquery.slt

…nges

alamb · 2024-12-25T12:10:20Z

Looks like Ci is failing on some tests

alamb

Thank you @wiedld -- I think this is a nice step forward in helping avoid bugs in DataFusion ❤️

Thanks @jonahgao @findepi @Sl1mb0 and @berkaysynnada

I'll merge this PR now and we can address any other suggestions as a follow on PR

wiedld added 3 commits December 4, 2024 15:24

minor(13525): perform LP validation before and after each possible mu…

6d43dc2

…tation

minor(13525): validate unique field names on query and subquery schem…

a855811

…as, after each optimizer pass

minor(13525): validate union after each optimizer passes

0163a40

github-actions bot added the optimizer Optimizer rules label Dec 5, 2024

wiedld commented Dec 5, 2024

View reviewed changes

datafusion/optimizer/src/analyzer/mod.rs Outdated Show resolved Hide resolved

wiedld commented Dec 5, 2024

View reviewed changes

datafusion/optimizer/src/optimizer.rs Outdated Show resolved Hide resolved

alamb mentioned this pull request Dec 5, 2024

Automatically check "invariants" #13652

Open

alamb reviewed Dec 5, 2024

View reviewed changes

datafusion/optimizer/src/optimizer.rs Outdated Show resolved Hide resolved

wiedld commented Dec 5, 2024

View reviewed changes

findepi reviewed Dec 5, 2024

View reviewed changes

refactor: make explicit what is an invariant of the logical plan, ver…

bee7e92

…sus assertions made after a given analyzer or optimizer pass

github-actions bot added the logical-expr Logical plan and expressions label Dec 16, 2024

wiedld commented Dec 16, 2024

View reviewed changes

datafusion/optimizer/src/analyzer/mod.rs Outdated Show resolved Hide resolved

chore: add link to invariant docs

4eee9c4

wiedld commented Dec 16, 2024

View reviewed changes

datafusion/expr/src/logical_plan/plan.rs Outdated Show resolved Hide resolved

alamb reviewed Dec 16, 2024

View reviewed changes

fix: add new invariants module

a7d9770

alamb mentioned this pull request Dec 16, 2024

Minor: improve Analyzer docs #13798

Merged

wiedld added 4 commits December 17, 2024 12:39

Merge branch 'main' into 13525/invariant-checking-for-implicit-LP-cha…

72718ad

…nges

refactor: move all LP invariant checking into LP, delineate executabl…

2002b1a

…e (valid semantic plan) vs basic LP invariants

test: update test for slight error message change

fbc9c46

fix: push_down_filter optimization pass can push a IN(<subquery>) int…

e52187e

…o a TableScan's filter clause

wiedld force-pushed the 13525/invariant-checking-for-implicit-LP-changes branch from 651f5d3 to e52187e Compare December 17, 2024 23:35

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 17, 2024

wiedld commented Dec 17, 2024

View reviewed changes

datafusion/expr/src/logical_plan/invariants.rs Show resolved Hide resolved

wiedld marked this pull request as ready for review December 18, 2024 05:58

alamb changed the title ~~Define LogicalPlan invariants, delineated by invariant type.~~ Introduce LogicalPlan invariants, begin automatically checking them Dec 20, 2024

alamb reviewed Dec 20, 2024

View reviewed changes

wiedld added 2 commits December 23, 2024 11:04

Merge branch 'main' into 13525/invariant-checking-for-implicit-LP-cha…

ba26f13

…nges

refactor: move collect_subquery_cols() to common utils crate

ad1a1f8

wiedld marked this pull request as draft December 23, 2024 20:15

wiedld force-pushed the 13525/invariant-checking-for-implicit-LP-changes branch from 10cd5d5 to 1164a7b Compare December 23, 2024 20:26

refactor: clarify the purpose of assert_valid_optimization(), runs af…

1164a7b

…ter all optimizer passes, except in debug mode it runs after each pass.

wiedld marked this pull request as ready for review December 24, 2024 04:10

alamb approved these changes Dec 24, 2024

View reviewed changes

berkaysynnada reviewed Dec 24, 2024

View reviewed changes

datafusion/expr/src/logical_plan/invariants.rs Outdated Show resolved Hide resolved

berkaysynnada reviewed Dec 24, 2024

View reviewed changes

jonahgao approved these changes Dec 24, 2024

View reviewed changes

datafusion/expr/src/logical_plan/invariants.rs Outdated Show resolved Hide resolved

datafusion/optimizer/src/optimizer.rs Outdated Show resolved Hide resolved

wiedld force-pushed the 13525/invariant-checking-for-implicit-LP-changes branch from 1884564 to 9bca470 Compare December 24, 2024 21:24

wiedld commented Dec 24, 2024

View reviewed changes

datafusion/sqllogictest/test_files/subquery.slt Outdated Show resolved Hide resolved

wiedld added 5 commits December 24, 2024 14:50

chore: update error naming and terminology used in code comments

911d4b8

refactor: use proper error methods

810246d

chore: more cleanup of error messages

9842d19

Merge branch 'main' into 13525/invariant-checking-for-implicit-LP-cha…

00700ae

…nges

chore: handle option trailer to error message

9bca470

test: update sqllogictests tests to not use multiline

529ac3e

alamb approved these changes Dec 26, 2024

View reviewed changes

alamb merged commit cf8f2f8 into apache:main Dec 26, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce LogicalPlan invariants, begin automatically checking them #13651

Introduce LogicalPlan invariants, begin automatically checking them #13651

wiedld commented Dec 5, 2024 •

edited

Loading

alamb left a comment

Omega359 commented Dec 5, 2024

wiedld Dec 5, 2024

jonahgao Dec 5, 2024 •

edited

Loading

findepi Dec 5, 2024

alamb Dec 6, 2024

alamb Dec 6, 2024

findepi Dec 7, 2024

wiedld Dec 18, 2024

findepi Dec 5, 2024

wiedld Dec 16, 2024

findepi commented Dec 5, 2024

alamb commented Dec 6, 2024

wiedld commented Dec 16, 2024 •

edited

Loading

alamb left a comment

wiedld commented Dec 18, 2024 •

edited

Loading

wiedld commented Dec 18, 2024

alamb commented Dec 20, 2024

alamb left a comment

wiedld commented Dec 24, 2024 •

edited

Loading

alamb left a comment •

edited

Loading

alamb Dec 24, 2024

alamb Dec 24, 2024

berkaysynnada Dec 24, 2024

wiedld Dec 24, 2024

alamb commented Dec 24, 2024

jonahgao left a comment

wiedld commented Dec 24, 2024

alamb commented Dec 25, 2024 •

edited

Loading

alamb left a comment

Introduce LogicalPlan invariants, begin automatically checking them #13651

Introduce LogicalPlan invariants, begin automatically checking them #13651

Conversation

wiedld commented Dec 5, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

Omega359 commented Dec 5, 2024

Choose a reason for hiding this comment

jonahgao Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

findepi commented Dec 5, 2024

alamb commented Dec 6, 2024

wiedld commented Dec 16, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

wiedld commented Dec 18, 2024 • edited Loading

wiedld commented Dec 18, 2024

alamb commented Dec 20, 2024

alamb left a comment

Choose a reason for hiding this comment

wiedld commented Dec 24, 2024 • edited Loading

alamb left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Dec 24, 2024

jonahgao left a comment

Choose a reason for hiding this comment

wiedld commented Dec 24, 2024

alamb commented Dec 25, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

wiedld commented Dec 5, 2024 •

edited

Loading

jonahgao Dec 5, 2024 •

edited

Loading

wiedld commented Dec 16, 2024 •

edited

Loading

wiedld commented Dec 18, 2024 •

edited

Loading

wiedld commented Dec 24, 2024 •

edited

Loading

alamb left a comment •

edited

Loading

alamb commented Dec 25, 2024 •

edited

Loading