Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(connector): file scan use correct path #19793

Merged
merged 4 commits into from
Dec 17, 2024
Merged

fix(connector): file scan use correct path #19793

merged 4 commits into from
Dec 17, 2024

Conversation

wcy-fdu
Copy link
Contributor

@wcy-fdu wcy-fdu commented Dec 13, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

#19398 bring a bug that file_scan use incorrect path(With s3://bucket_name/ as prefix), so this pr fix this bug.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

@wcy-fdu wcy-fdu requested a review from a team as a code owner December 13, 2024 06:24
@wcy-fdu wcy-fdu requested a review from fuyufjh December 13, 2024 06:24
Cargo.lock Outdated Show resolved Hide resolved
@fuyufjh
Copy link
Member

fuyufjh commented Dec 13, 2024

What's the problem to be addressed? Please write some description.

The reviewer is automatically invited, will completed pr description after modifying Cargo.lock.

@wcy-fdu wcy-fdu removed the request for review from fuyufjh December 13, 2024 06:29
@wcy-fdu wcy-fdu force-pushed the wcy/fix_file_scan.pr branch from d7bef25 to a65ff03 Compare December 13, 2024 07:35
@wcy-fdu wcy-fdu removed the request for review from a team December 13, 2024 07:36
@wcy-fdu wcy-fdu marked this pull request as draft December 13, 2024 09:01
@wcy-fdu wcy-fdu requested review from xxhZs and chenzl25 December 17, 2024 07:48
@wcy-fdu wcy-fdu marked this pull request as ready for review December 17, 2024 07:49
@wcy-fdu wcy-fdu requested a review from hzxa21 December 17, 2024 07:49
@wcy-fdu
Copy link
Contributor Author

wcy-fdu commented Dec 17, 2024

Tess passed.

@github-actions github-actions bot added the type/fix Bug fix label Dec 17, 2024
@wcy-fdu wcy-fdu enabled auto-merge December 17, 2024 08:31
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. Should we add test cases for TVF file scan?

Comment on lines +216 to +232
.iter()
.filter_map(|column| {
parquet_column_names
.iter()
.position(|&name| name == column.name)
.and_then(|pos| {
let arrow_data_type: &risingwave_common::array::arrow::arrow_schema_udf::DataType = converted_arrow_schema.field(pos).data_type();
let rw_data_type: &risingwave_common::types::DataType = &column.data_type;

if is_parquet_schema_match_source_schema(arrow_data_type, rw_data_type) {
Some(pos)
} else {
None
}
})
})
.collect();
Ok(valid_column_indices)
}
None => Ok(vec![]),
}
if is_parquet_schema_match_source_schema(arrow_data_type, rw_data_type) {
Some(pos)
} else {
None
}
})
})
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt?

Copy link
Contributor Author

@wcy-fdu wcy-fdu Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, how did these lines escape cargo fmt escape. I think it's already formatted.

@wcy-fdu
Copy link
Contributor Author

wcy-fdu commented Dec 17, 2024

Rest LGTM. Should we add test cases for TVF file scan?

I tried to add some test in CI, but found that currently TVF file scan only support reading file in s3, but not support reading file in minio, see definition of endpoint, it is hard-coded into s3 instead of letting users pass it.
Due to environmental issues, similar tests (parquet file source and file sink) are tested on minio.

I think we can add more backend such as

file_scan(connector = minio)
file_scan(connector = gcs)

and then enhance ci test. This will take some time but this fix is a bit urgent, so I just test it locally and leave these in subsequent PRs.

@wcy-fdu wcy-fdu added this pull request to the merge queue Dec 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Dec 17, 2024
@wcy-fdu wcy-fdu added this pull request to the merge queue Dec 17, 2024
Merged via the queue into main with commit 4561813 Dec 17, 2024
28 of 30 checks passed
@wcy-fdu wcy-fdu deleted the wcy/fix_file_scan.pr branch December 17, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants