Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Fix issue querying tables with spaces in the name #3291

Merged
merged 4 commits into from
Jun 24, 2024

Conversation

vkorukanti
Copy link
Collaborator

@vkorukanti vkorukanti commented Jun 21, 2024

Description

(Stacked on top of #3289 and #3290 )

Currently, Kernel uses a mix of path (file system path) or URI (in string format) in API interfaces, which causes confusion and bugs.

Context:
Path refers to a file system path which could have some characters that should be escaped when converted to URI
E.g. path: s3:/bucket/path to file/, URI for the same path: s3:/bucket/path%20to%20file/

Make it uniform everywhere to just use the paths (file system path).

How was this patch tested?

Additional tests with table path containing spaces.

@vkorukanti vkorukanti requested review from scottsand-db and allisonport-db and removed request for scottsand-db June 21, 2024 18:45

/**
* Escapes the given string to be used as a partition value in the path. Basically this escapes
* - characters that can't be in a file path. E.g. `a\nb` will be escaped to `a%0Ab`. -
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's up with the - usage in this comment? is it a dot job? the "-" literal character?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto-format. changed it to use the proper lists<ul>

@@ -56,7 +56,9 @@ public interface Scan {
* <li>name: {@code add}, type: {@code struct}</li>
* <li>Description: Represents `AddFile` DeltaLog action</li>
* <li><ul>
* <li>name: {@code path}, type: {@code string}, description: location of the file.</li>
* <li>name: {@code path}, type: {@code string}, description: location of the file.
* The path is a URI as specified by RFC 2396 URI Generic Syntax, which needs to be decoded
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the input to Table.forPath also be a String that represents a URI? Have we updated that documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not at the moment, and we can't change that. If we want to take URI as an input we should be explicit about and and it should be another API, something like Table.forURI(URI tableURI).

The path here comes from Delta Log and stored as a URI in Delta Log according to the protocol. The name is path but it is actually a URI. Just updating the documentation to reflect that.

Before the next release, I will have a design decision to change the path string to URI everywhere else (basically in the Engine interfaces). Until then this is the fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining!

@vkorukanti vkorukanti merged commit 05e647a into delta-io:master Jun 24, 2024
10 checks passed
@vkorukanti vkorukanti deleted the fixPathIssue branch July 12, 2024 19:50
vkorukanti added a commit to vkorukanti/delta that referenced this pull request Aug 30, 2024
…3291)

## Description
Currently, Kernel uses a mix of path (file system path) or URI (in
string format) in API interfaces, which causes confusion and bugs.

Context: 
Path refers to a file system path which could have some characters that
should be escaped when converted to URI
E.g. path: `s3:/bucket/path to file/`, URI for the same path:
`s3:/bucket/path%20to%20file/`

Make it uniform everywhere to just use the paths (file system path).

## How was this patch tested?
Additional tests with table path containing spaces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants