Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Drop feature support in DeltaTable Scala/Python APIs #3952

Merged

Conversation

andreaschat-db
Copy link
Contributor

@andreaschat-db andreaschat-db commented Dec 11, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR adds drop feature support in the DeltaTable API for both scala and python APIs.

How was this patch tested?

Added UTs.

Does this PR introduce any user-facing changes?

Yes. See description.

python/delta/tables.py Outdated Show resolved Hide resolved
higher than necessary.

Normalization can also decrease the reader version of a table features protocol when it is
higher than necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should we use "Example::", ":param", ":return:" and "rtype", with a Example io.delta.tables.DeltaTable.dropFeatureSupport("RowTracking")?

Normalization can also decrease the reader version of a table features protocol when it is
higher than necessary.

Normalization can also decrease the reader version of a table features protocol when it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the repeat of " Normalization can also decrease the reader version of a table features protocol when it is
higher than necessary." intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Fixed it.

@@ -1188,16 +1188,19 @@ def test_protocolUpgrade(self) -> None:
with self.assertRaisesRegex(ValueError, "writerVersion"):
dt.upgradeTableProtocol(1, {}) # type: ignore[arg-type]

def test_addFeatureSupport(self) -> None:
def __create_df_for_feature_tests(self) -> DeltaTable:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also use this refactor __create_df_for_feature_tests in other place like test_protocolUpgrade?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I did not see the code is (almost) the same there. Yes.

python/delta/tables.py Show resolved Hide resolved
python/delta/tables.py Show resolved Hide resolved
TestRemovableWriterFeature)))

// Drop feature.
table.dropFeatureSupport(TestRemovableWriterFeature.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test where after dropping the feature, the protocol doesn't change? Like dropping a readerWriterFeature from a set of 2 readerWriterFeatures should remain (3, 7)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All there cases (and more) are covered in the DeltaProtocolVersionSuite. Here I tried to focus on whether the DeltaTable API works by testing some basic functionality.

python/delta/tables.py Outdated Show resolved Hide resolved
@since(4.0) # type: ignore[arg-type]
def dropFeatureSupport(self, featureName: str) -> None:
"""
Modify the protocol to drop a supported feature. The operation always normalizes the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Modify the protocol to drop a supported feature. The operation always normalizes the
Modify the protocol to drop an existing supported feature. The operation always normalizes the

Nit: Similar to the alterDeltaTableCommand's description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find where this is mentioned in alterDeltaTableCommand but "existing" is redundant in that case since we are already using "supported."

*
* For example, consider protocol (1, 7, None, {Invariants, AppendOnly, TestWriter}.
* Dropping the testWriter feature results to protocol (1, 2). This is because the implicit
* features of the legacy protocol (1, 2) exactly match the explicit features of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "exactly match", we are referring to the protocol versions right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We are referring to the feature set. For example:

  • Legacy Protocol(1, 2) implicitly supports invariants and appendOnly features.
  • Table Features protocol (1, 7, invariants, appendOnly) explicitly supports invariants and appendOnly features.

These two protocols are considered equivalent but in different form.

protocol to the weakest possible form. This primarily refers to converting a table features
protocol to a legacy protocol. A Table Features protocol can be represented with the legacy
representation only when the features set of the former exactly matches a legacy protocol.
Normalization can also decrease the reader version of a table features protocol when it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we make the Scala text description and Python text description identical?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial intention but I diverted later on :D.

@longvu-db
Copy link
Contributor

Could we add "[Spark]" to the description and also specify that this is for Spark Classic in the title and description as well?

@@ -322,7 +322,7 @@ case class AlterTableDropFeatureDeltaCommand(
// Check whether the protocol contains the feature in either the writer features list or
// the reader+writer features list. Note, protocol needs to denormalized to allow dropping
// features from legacy protocols.
val protocol = table.initialSnapshot.protocol
val protocol = table.deltaLog.update().protocol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with changing this to deltaLog.update(), but just wondering if there was any risk before with using initialSnapshot. Don't know why we haven't ran into a problem with this before since we should always get the latest protocol, not the "The snapshot initially associated with this table. "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good question. That command was until now primarily only accessible with spark SQL. It seems that when using SQL the command would get table with a fresh snapshot. However, now with the DeltaTable API the user directly controls which table instance is using. So unless the user cares to properly refresh the table instance, we can get a stale snapshot.

@andreaschat-db andreaschat-db changed the title Drop feature support in DeltaTable API [Spark] Drop feature support in DeltaTable Scala/Python APIs Dec 12, 2024
@andreaschat-db andreaschat-db force-pushed the addDropFeatureToDeltaTableAPI branch from 68567c1 to d5991d7 Compare December 12, 2024 15:15
Copy link
Contributor

@longvu-db longvu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

*
* See online documentation for more details.
*
* @param featureName: The name of the feature to drop.
Copy link
Contributor

@longvu-db longvu-db Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreaschat-db It should be "featureName" instead of "featureName:"

@@ -594,6 +594,30 @@ def addFeatureSupport(self, featureName: str) -> None:
DeltaTable._verify_type_str(featureName, "featureName")
self._jdt.addFeatureSupport(featureName)

@since(4.0) # type: ignore[arg-type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreaschat-db I believe there will be a Delta 3.4 version before 4.0

* @param featureName: The name of the feature to drop.
* @return None.
*
* @since 4.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, 3.4.0

@andreaschat-db andreaschat-db force-pushed the addDropFeatureToDeltaTableAPI branch from 416e097 to 075d86d Compare December 18, 2024 18:08
@scottsand-db scottsand-db merged commit 1cd6fed into delta-io:master Dec 18, 2024
16 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants