-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Drop feature support in DeltaTable Scala/Python APIs #3952
[Spark] Drop feature support in DeltaTable Scala/Python APIs #3952
Conversation
python/delta/tables.py
Outdated
higher than necessary. | ||
|
||
Normalization can also decrease the reader version of a table features protocol when it is | ||
higher than necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Should we use "Example::", ":param", ":return:" and "rtype", with a Example io.delta.tables.DeltaTable.dropFeatureSupport("RowTracking")
?
python/delta/tables.py
Outdated
Normalization can also decrease the reader version of a table features protocol when it is | ||
higher than necessary. | ||
|
||
Normalization can also decrease the reader version of a table features protocol when it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the repeat of " Normalization can also decrease the reader version of a table features protocol when it is
higher than necessary." intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Fixed it.
@@ -1188,16 +1188,19 @@ def test_protocolUpgrade(self) -> None: | |||
with self.assertRaisesRegex(ValueError, "writerVersion"): | |||
dt.upgradeTableProtocol(1, {}) # type: ignore[arg-type] | |||
|
|||
def test_addFeatureSupport(self) -> None: | |||
def __create_df_for_feature_tests(self) -> DeltaTable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also use this refactor __create_df_for_feature_tests
in other place like test_protocolUpgrade
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I did not see the code is (almost) the same there. Yes.
TestRemovableWriterFeature))) | ||
|
||
// Drop feature. | ||
table.dropFeatureSupport(TestRemovableWriterFeature.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test where after dropping the feature, the protocol doesn't change? Like dropping a readerWriterFeature from a set of 2 readerWriterFeatures should remain (3, 7)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All there cases (and more) are covered in the DeltaProtocolVersionSuite
. Here I tried to focus on whether the DeltaTable
API works by testing some basic functionality.
@since(4.0) # type: ignore[arg-type] | ||
def dropFeatureSupport(self, featureName: str) -> None: | ||
""" | ||
Modify the protocol to drop a supported feature. The operation always normalizes the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify the protocol to drop a supported feature. The operation always normalizes the | |
Modify the protocol to drop an existing supported feature. The operation always normalizes the |
Nit: Similar to the alterDeltaTableCommand
's description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find where this is mentioned in alterDeltaTableCommand
but "existing" is redundant in that case since we are already using "supported."
* | ||
* For example, consider protocol (1, 7, None, {Invariants, AppendOnly, TestWriter}. | ||
* Dropping the testWriter feature results to protocol (1, 2). This is because the implicit | ||
* features of the legacy protocol (1, 2) exactly match the explicit features of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "exactly match", we are referring to the protocol versions right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We are referring to the feature set. For example:
- Legacy Protocol(1, 2) implicitly supports invariants and appendOnly features.
- Table Features protocol (1, 7, invariants, appendOnly) explicitly supports invariants and appendOnly features.
These two protocols are considered equivalent but in different form.
protocol to the weakest possible form. This primarily refers to converting a table features | ||
protocol to a legacy protocol. A Table Features protocol can be represented with the legacy | ||
representation only when the features set of the former exactly matches a legacy protocol. | ||
Normalization can also decrease the reader version of a table features protocol when it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we make the Scala text description and Python text description identical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial intention but I diverted later on :D.
Could we add "[Spark]" to the description and also specify that this is for Spark Classic in the title and description as well? |
@@ -322,7 +322,7 @@ case class AlterTableDropFeatureDeltaCommand( | |||
// Check whether the protocol contains the feature in either the writer features list or | |||
// the reader+writer features list. Note, protocol needs to denormalized to allow dropping | |||
// features from legacy protocols. | |||
val protocol = table.initialSnapshot.protocol | |||
val protocol = table.deltaLog.update().protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with changing this to deltaLog.update()
, but just wondering if there was any risk before with using initialSnapshot
. Don't know why we haven't ran into a problem with this before since we should always get the latest protocol, not the "The snapshot initially associated with this table. "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good question. That command was until now primarily only accessible with spark SQL. It seems that when using SQL the command would get table with a fresh snapshot. However, now with the DeltaTable API the user directly controls which table
instance is using. So unless the user cares to properly refresh the table instance, we can get a stale snapshot.
68567c1
to
d5991d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* | ||
* See online documentation for more details. | ||
* | ||
* @param featureName: The name of the feature to drop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreaschat-db It should be "featureName" instead of "featureName:"
python/delta/tables.py
Outdated
@@ -594,6 +594,30 @@ def addFeatureSupport(self, featureName: str) -> None: | |||
DeltaTable._verify_type_str(featureName, "featureName") | |||
self._jdt.addFeatureSupport(featureName) | |||
|
|||
@since(4.0) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreaschat-db I believe there will be a Delta 3.4 version before 4.0
* @param featureName: The name of the feature to drop. | ||
* @return None. | ||
* | ||
* @since 4.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, 3.4.0
416e097
to
075d86d
Compare
Which Delta project/connector is this regarding?
Description
This PR adds drop feature support in the DeltaTable API for both scala and python APIs.
How was this patch tested?
Added UTs.
Does this PR introduce any user-facing changes?
Yes. See description.