Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: redact SqlOption in SHOW CREATE #14310

Closed
wants to merge 13 commits into from
Closed

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Jan 2, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Redact all SqlOption when show create and query system catalog. Implement this idea.

This PR adds a helper function to redact identifiers, e.g. table names and column names, as well as with_option values, from CREATE TABLE/VIEW/INDEX/SOURCE/SINK/MATERIALIZED VIEW statement. ~~~

Implementation details:

  • For all types used in Statement type that requires redaction, implement Masking trait for them, which redacts the type in place. Then Masking::mask is called recursively from CREATE statement.
  • Masking::mask takes a MaskingContext as input, which maintains a String to int mapping. It's used to keep the relation between identifiers globally. e.g. Using the same MaskingContext, CREATE TABLE table_foo (bar int); CREATE MATERIALIZED VIEW mv_foo as SELECT bar from table_foo will be redacted to CREATE TABLE _1_ (_2_ int); CREATE MATERIALIZED VIEW _3_ as SELECT _2_ from _1_.
  • See test_masking for usage example.

#12673

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@zwang28 zwang28 changed the title feat: add support for redacting CREATE statement feat: redact CREATE statement Jan 2, 2024
@zwang28
Copy link
Contributor Author

zwang28 commented Jan 5, 2024

As an alternative solution, instead of implementing Masking for types case by case, I considered to make parse function able to redact all encountered Ident, ObjectName and SqlOption. Because they are the only primitive types being redacted. But it seems won't work, because not all these primitive type should be redacted, e.g. function name.

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach can't handle all cases (e.g., we can still use Display/Debug without redaction)i.e., we need to call it explicitly, but it's acceptable to me. @BugenZhao How do you think?

Regarding this implementation, what about adding a visitor, and impl redaction as a visitor?

@xxchan
Copy link
Member

xxchan commented Jan 8, 2024

Oh, upstream sqlparser-rs has visitor now... 🤣

apache/datafusion-sqlparser-rs#765

(But it might have limitations according to related issues. Haven't checked the details

@xxchan xxchan mentioned this pull request Mar 6, 2024
9 tasks
@zwang28
Copy link
Contributor Author

zwang28 commented Mar 7, 2024

Oh, upstream sqlparser-rs has visitor now... 🤣

sqlparser-rs/sqlparser-rs#765

(But it might have limitations according to related issues. Haven't checked the details

Yes, upstream's visit macro can significantly simplify the implementation. I'll try it.

@zwang28 zwang28 marked this pull request as draft March 7, 2024 09:55
@BugenZhao
Copy link
Member

Note that Display impl of AST is actually a visitor. Can we directly hack into that? I suppose there could be only few changes (like on Value) if we read a task-local flag to decide whether we should format with a redacted value.

@zwang28
Copy link
Contributor Author

zwang28 commented Mar 7, 2024

Note that Display impl of AST is actually a visitor. Can we directly hack into that? I suppose there could be only few changes (like on Value) if we read a task-local flag to decide whether we should format with a redacted value.

I'll look into it.

BTW, when taking redacting table names and column names into consideration, it becomes more complicated. That is, whether a type should be redacted also depends on its context. e.g. type ObjectName in a built-in Function shouldn't be redacted, while ObjectName in a CreateTable should be redacted because it's the table name.

Copy link

gitguardian bot commented Mar 7, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9425213 Triggered Generic Password f7fc8a5 ci/scripts/regress-test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

@zwang28
Copy link
Contributor Author

zwang28 commented Mar 7, 2024

This commit implements this idea, which customizes SqlOption's Display. It either redacts all options or none of them.

As SqlOption's Display will redact by default, we need to explicitly call Statement::to_unredacted_string instead of Statement::to_string here and there. Forget to call it will be a critical bug.

Also added an e2e test.

PTAL @xxchan @BugenZhao

Table names and column names are not redacted in this PR, as explained below.

BTW, when taking redacting table names and column names into consideration, it becomes more complicated. That is, whether a type should be redacted also depends on its context. e.g. type ObjectName in a built-in Function shouldn't be redacted, while ObjectName in a CreateTable should be redacted because it's the table name.

@zwang28 zwang28 marked this pull request as ready for review March 7, 2024 14:32
@zwang28 zwang28 requested a review from a team as a code owner March 7, 2024 14:32
@zwang28 zwang28 changed the title feat: redact CREATE statement feat: redact SqlOption in SHOW CREATE Mar 7, 2024
@zwang28 zwang28 requested review from BugenZhao and xxchan March 7, 2024 14:34
@zwang28 zwang28 force-pushed the wangzheng/redact_table_name branch from 2e4d10e to 8091858 Compare March 7, 2024 15:41
@@ -61,7 +63,12 @@ fn read_rw_sinks_info(reader: &SysCatalogReaderImpl) -> Result<Vec<RwSink>> {
.to_uppercase(),
sink_type: sink.sink_type.to_proto().as_str_name().into(),
connection_id: sink.connection_id.map(|id| id.connection_id() as i32),
definition: sink.create_sql(),
definition: if sink.owner.user_id == user_id {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this (only showing raw SQL if the user is the owner) is a common behavior in other DBMS? Even so, I guess it would be based on the permission system but not simply determined by the ownership. cc @yezizp2012

In my personal opinion I would suggest adding a new column of redacted_definition. Users are allowed to decide which form to reveal or share with us.

src/sqlparser/src/ast/mod.rs Show resolved Hide resolved
Ok(stmt.to_string())
Ok(stmt.to_unredacted_string())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask how you find all references of to_string call on Statement? That is to say, how could we avoid showing an unredacted string where we should redact it in the future or vice versa?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how you find all references of to_string call on Statement?

Most of them are found by test failure 🥵

how could we avoid showing an unredacted string where we should redact it

to_string or display are redacted by default.

or vice versa

Caller is responsible for explicitly using to_unredacted_string. Similar to crdb's behavior: "everything gets redacted except for those bits which we know are safe"

@zwang28
Copy link
Contributor Author

zwang28 commented Apr 8, 2024

go for risingwavelabs/rfcs#86

@zwang28 zwang28 closed this Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants