Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't enforce null checks on fields inside nullable structs #1296

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Kimahriman
Copy link
Contributor

Description

Resolves #860, (and maybe #1239 too)

Currently a non-nullable field inside a nullable struct is enforced to be not null. However, the way Spark handles nullability, a non-nullable field could be "null" if its parent is null. The non-nullable on the inner field simply means "if my parent is non-null then I will definitely be non-null". This PR turns a non-nullable field inside a nullable struct into an unenforced not null.

How was this patch tested?

Updated existing unit tests to make sure the parent structs are non-nullable for the same current behavior, and new UTs to test the new behavior.

Does this PR introduce any user-facing changes?

Yes, non-nullable fields inside nullable structs become unenforced not nulls, thereby failing to create the schema by default without the allowUnenforcedNotNull setting enabled, instead of creating runtime null checks for the field.

@Kimahriman
Copy link
Contributor Author

@zsxwing let me know what you think of this. I was hoping to potentially support updating the null checks to properly check the way Spark handles things, but that was a bit too much. Also open to feature flagging this if you think it's acceptable but don't want this to be the default behavior.

@Kimahriman Kimahriman force-pushed the fix-null-constraint branch from 472dea6 to 84a0b17 Compare July 27, 2022 22:49
@tdas tdas requested a review from zsxwing July 28, 2022 18:14
@zsxwing
Copy link
Member

zsxwing commented Sep 2, 2022

Sorry for the late reply. I'm inclined to not make this change because

  • This will be a behavior change. This is an existing behavior in Delta for years and people may already rely on it.
  • Although we could add a flag to make the new change a non default behavior, there would be a lot of work. We would need to make all engines (presto, flink, etc.) support two different behaviors of not null constraint.
  • There is already a workaround which is using check constraint like nested is null or nested.id is not null rather than not null constraint.

@Kimahriman
Copy link
Contributor Author

Does it make sense then to, only on the Spark side, make any children of nullable fields nullable creating schemas for a table (or considering changed schemas on new writes)?

It just seems weird to leave a case that can knowingly trigger an error that's not really a bug (for Spark at least)

@Kimahriman
Copy link
Contributor Author

Coming back to this since we continue to have issues with this and several others have reported this as erroneous behavior as well

  • There is already a workaround which is using check constraint like nested is null or nested.id is not null rather than not null constraint.

I'm not sure why I didn't respond to this part before but this is the whole issue. Null invariants are added automatically for non null fields, with no way to disable them. So there is no workaround.

So it'd be great if you had a change of heart about this. If not I'll make a separate PR that lets users just disable null invariants completely so there actually is a workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not null invariant incorrectly fails on non-nullable field inside a nullable struct
2 participants