Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(catalog): add support for redacting sensitive connector properties #14193

Closed
wants to merge 3 commits into from

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Dec 25, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Review tips:

  • src/utils/redaction/redaction/src/lib.rs defines a trait MarkRedaction. It simply returns property names that need redaction.
  • src/utils/redaction/redaction_derive/src/lib.rs defines a derive macro to implement MarkRedaction.
  • src/frontend/src/utils/redaction.rs provides function to redact SQL.
  • Other LOC is mostly contributed by tests.
  • This PR doesn't actually replace any original SQL with redacted SQL when displayed, e.g. show create table, which should be another PR.

Basic idea:

To make it easier to redact sensitive connector properties from table definition when displaying,

  • Firstly implement a MarkRedaction trait for all connector properties types, using the new derive macro. Developers are responsible for marking properties that should be redacted. In this PR I marked AK/SK of S3Properties for example.
#[derive(MarkRedaction)]
pub struct S3Properties {
    ...
    #[serde(rename = "s3.credentials.access", default)]
    #[mark_redaction(rename = "s3.credentials.access")]
    pub access: Option<String>,
    #[serde(rename = "s3.credentials.secret", default)]
    #[mark_redaction(rename = "s3.credentials.secret")]
    pub secret: Option<String>,
   ...
  • Then to redact a SQL, try_redact_sql invokes MarkRedaction::marks to get all property names that should be redacted. See src/frontend/src/utils/redaction.rs.
  • More detail on the macro's usage can be found in the doc in src/utils/redaction/redaction_derive/src/lib.rs.

An example from test_redact_s3_source.

Original SQL:

CREATE SOURCE ad_click (
    ...
) WITH (
    connector = 's3',
    s3.credentials.access = 'test',
    s3.credentials.secret = 'test',
    s3.endpoint_url = 'http://localstack:4566'
    ...
) FORMAT PLAIN ENCODE JSON;

After redaction:

CREATE SOURCE ad_click (
    ...
) WITH (
    connector = 's3',
    s3.credentials.access = '[REDACTED]',
    s3.credentials.secret = '[REDACTED]',
    s3.endpoint_url = 'http://localstack:4566'
    ...
) FORMAT PLAIN ENCODE JSON;

#14115

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@zwang28 zwang28 force-pushed the wangzheng/redact_options branch 4 times, most recently from 7d01bee to 609503c Compare December 26, 2023 11:16
@zwang28 zwang28 force-pushed the wangzheng/redact_options branch from 609503c to 8c46491 Compare December 26, 2023 12:04
@zwang28 zwang28 changed the title feat(catalog): add support for redacted_definition feat(catalog): add support for redacting connector properties from SQL Dec 26, 2023
@zwang28 zwang28 marked this pull request as ready for review December 26, 2023 12:22
@zwang28 zwang28 requested a review from a team as a code owner December 26, 2023 12:22
@zwang28 zwang28 requested review from tabVersion and xxchan December 26, 2023 12:43
@zwang28 zwang28 changed the title feat(catalog): add support for redacting connector properties from SQL feat(catalog): add support for redacting sensitive connector properties Dec 26, 2023
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't took a look yet. QQ: Why did you use a trait, instead of the safer (IMO) approach like #13034?

@xxchan xxchan requested a review from BugenZhao December 26, 2023 14:30
@@ -33,8 +34,10 @@ pub struct S3Properties {
#[serde(rename = "match_pattern", default)]
pub match_pattern: Option<String>,
#[serde(rename = "s3.credentials.access", default)]
#[mark_redaction(rename = "s3.credentials.access")]
Copy link
Contributor

@tabVersion tabVersion Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

access key seems not that sensitive. cc @hzxa21

@zwang28
Copy link
Contributor Author

zwang28 commented Dec 27, 2023

Why did you use a trait, instead of the safer (IMO) approach like #13034?

Because to redact connector properties from a Statement, we need to rewrite some SqlOption::Value of the Statement's Vec<SqlOption>. So we need a way to know which SqlOption::name indicate properties that need redaction. These info spread across many *Properties structs.

This PR implements MarkRedaction for all those *Properties structs, so that altogether they can tell whether a property name need redaction, given a connector type and a property name of String type. This method doesn't care about types but names in *Properties, so I don't know how safe type like redact::secret in #13034 can help here.

See try_redact_statement for detail.

@xxchan @BugenZhao

@xxchan xxchan self-requested a review January 1, 2024 14:05
@BugenZhao
Copy link
Member

BugenZhao commented Jan 3, 2024

If we have...

impl serde::Serializer for Serializer {
    type Ok = Vec<SqlOption>;
    ..
}

then wrap secrets with Secret newtype and serialize back will work. We normalize the aliases by the way.

The most tricky part I can think of is to escape back the unescaped CstyleEscapedString when serializing it. When introducing it in #11323, we store the raw representation for the sake of simpliness.

#11195 (comment)

@@ -33,8 +34,10 @@ pub struct S3Properties {
#[serde(rename = "match_pattern", default)]
pub match_pattern: Option<String>,
#[serde(rename = "s3.credentials.access", default)]
#[mark_redaction(rename = "s3.credentials.access")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the serde attributes for mark_redaction to avoid duplication?

Comment on lines +31 to +48
let mut stmt = stmt.clone();
let sql_options = match &mut stmt {
Statement::CreateSource { stmt } => &mut stmt.with_properties.0,
Statement::CreateTable { with_options, .. } => with_options,
Statement::CreateSink { stmt } => &mut stmt.with_properties.0,
_ => {
return Some(stmt);
}
};
let Ok(mut with_properties) =
WithOptions::try_from(sql_options.as_slice()).map(WithOptions::into_inner)
else {
return None;
};
let Some(connector) = with_properties.remove(UPSTREAM_SOURCE_KEY) else {
tracing::error!("Must specify 'connector' in WITH clause");
return None;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xxchan
Copy link
Member

xxchan commented Jan 8, 2024

(Sorry for the late review.)

Oh I get it. The type-safe approach is good for redacting the Property structs themselves (e.g., Debug/Display them in log). However, we can't redact the type-unsafe raw string/hashmap input when the structs are created (the output of sqlparser AST). The problem is mainly because sqlparser is unaware of the properties. 😟

Comment on lines +49 to +55
fn redact_marked_field(options: &mut Vec<SqlOption>, marks: std::collections::HashSet<String>) {
for option in options {
if marks.contains(&option.name.real_value()) {
option.value = Value::SingleQuotedString("[REDACTED]".into());
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking do we really need to get the keys to redact precisely? What about simply pass ["secret", "credential", "key"] etc. ? 🤣

Bugen's idea on relying on serde is also acceptable.

@xxchan
Copy link
Member

xxchan commented Jan 8, 2024

The most tricky part I can think of is to escape back the unescaped CstyleEscapedString when serializing it.

Can you give an example of how this can go wrong? 👀

@xxchan
Copy link
Member

xxchan commented Jan 8, 2024

I see the alternative approach in #14310, this is quite a simple solution. 🤣

impl Masking for Vec<SqlOption> {
    fn mask<'a>(&mut self, _context: &mut MaskingContext) {
        for sql_option in self {
            sql_option.value = Value::SingleQuotedString("[REDACTED]".to_string());
        }
    }
}

@BugenZhao
Copy link
Member

I see the alternative approach in #14310, this is quite a simple solution. 🤣

impl Masking for Vec<SqlOption> {
    fn mask<'a>(&mut self, _context: &mut MaskingContext) {
        for sql_option in self {
            sql_option.value = Value::SingleQuotedString("[REDACTED]".to_string());
        }
    }
}

Will this be too aggressive? For example, connector = [REDACTED]. 😄

@BugenZhao
Copy link
Member

BugenZhao commented Jan 8, 2024

However, we can't redact the type-unsafe raw string/hashmap input when the structs are created (the output of sqlparser AST). The problem is mainly because sqlparser is unaware of the properties. 😟

It's us that convert it to type-unsafe maps. 😕

impl TryFrom<&[SqlOption]> for WithOptions {
type Error = RwError;
fn try_from(options: &[SqlOption]) -> Result<Self, Self::Error> {
let mut inner: BTreeMap<String, String> = BTreeMap::new();
for option in options {
let key = option.name.real_value();
let value: String = match option.value.clone() {
Value::CstyleEscapedString(s) => s.value,
Value::SingleQuotedString(s) => s,
Value::Number(n) => n,
Value::Boolean(b) => b.to_string(),
_ => {
return Err(RwError::from(ErrorCode::InvalidParameterValue(
"`with options` or `with properties` only support single quoted string value and C style escaped string"
.to_owned(),
)))
}
};
if inner.insert(key.clone(), value).is_some() {
return Err(RwError::from(ErrorCode::InvalidParameterValue(format!(
"Duplicated option: {}",
key
))));
}
}
Ok(Self { inner })
}
}

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zwang28
Copy link
Contributor Author

zwang28 commented Jan 12, 2024

can we also cover some struct from sdk, such as

https://github.com/risingwavelabs/risingwave/blob/be30f20e2c14b5d12efd5b33a4ac9e537602fc37/src/connector/src/source/kafka/mod.rs#L156https://github.com/risingwavelabs/risingwave/blob/be30f20e2c14b5d12efd5b33a4ac9e537602fc37/src/connector/src/source/kafka/mod.rs#L156

This approach is more universal, which can solve the redaction of structs in connector crate as well. WIP
#14193 (comment)

@zwang28
Copy link
Contributor Author

zwang28 commented Jan 12, 2024

#14193 (comment)

@zwang28 zwang28 closed this Jan 12, 2024
@zwang28 zwang28 mentioned this pull request Jan 24, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants