Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: remove meta's dependency on sink implementation to reduce compile time #12981

Closed
wants to merge 22 commits into from

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Oct 20, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Move the implementation of all sinks to a separate crate. The original risingwave_connector is splited into 2 crates: risingwave_connector, risingwave_sink_impl.

Originally, risingwave_meta depends on the whole risingwave_connector, in which case it should wait for the long compilation of the whole risingwave_connector. After this PR, the risingwave_meta only depends on a much smaller risingwave_connector with sink implementation removed from it, and risingwave_meta can start its compilation much earlier.

The risingwave_meta depends on the coordinator and validation logic of each concrete specific sink implementation, which was a reason for risingwave_meta to inevitably depend on the logic in current risingwave_sink_impl. This PR decouples risingwave_meta with risingwave_sink_impl by exposing the functions for coordinator and validation logic as function pointers in risingwave_connector, and via macro #[ctor], at initialization risingwave_sink_impl will inject the function implementation to the function pointers declared in risingwave_connector.

CI build component: 3m44s -> 2m37s

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@wenym1 wenym1 requested a review from a team as a code owner October 20, 2023 13:41
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 4288 files.

Valid Invalid Ignored Fixed
1907 1 2380 0
Click to see the invalid file list
  • src/connector/connector_common/src/kafka/mod.rs

src/connector/connector_common/src/kafka/mod.rs Outdated Show resolved Hide resolved
wenym1 and others added 3 commits October 20, 2023 21:43
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 4290 files.

Valid Invalid Ignored Fixed
1908 1 2381 0
Click to see the invalid file list
  • src/connector/sink_impl/src/lib.rs

src/connector/sink_impl/src/lib.rs Show resolved Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@BugenZhao
Copy link
Member

To be honest, the extern way looks too hacky to me, which reminds me of the good old days when writing C... 🤣

I would suggest the approach below:

  • In common:
#[async_trait]
trait Validate {
  fn validate(_: &PbSink) -> Result<()>;
}

static VALIDATOR: OnceLock<Box<dyn Validate>> = OnceLock::new();

pub fn validator() -> &dyn Validate {
  VALIDATOR.get().unwrap()
}
  • In sink_impl:
struct Validator { .. }

impl Validate for Validator { .. }

#[ctor]
fn init_validator() {
  VALIDATOR.set(Validator::new());
}
  • In meta:
validator().validate(..);

The idea will be very similar to @wangrunji0408's function registry.

/// The global registry of all function signatures.
pub static FUNCTION_REGISTRY: LazyLock<FunctionRegistry> = LazyLock::new(|| unsafe {
// SAFETY: this function is called after all `#[ctor]` functions are called.
let mut map = FunctionRegistry::default();
tracing::info!("found {} functions", FUNCTION_REGISTRY_INIT.len());
for sig in FUNCTION_REGISTRY_INIT.drain(..) {
map.insert(sig);
}
map
});

@wenym1
Copy link
Contributor Author

wenym1 commented Oct 23, 2023

I would suggest the approach below:

Actually for the suggested approach, we don't need to define an extra trait. A Box<dyn Fn(&PbSink) -> BoxFuture<'_, Result<()>>> is enough, which is very similar to current implementation. The only difference is the stage to inject the function pointer to the function entry point defined in the common crate. The current implementation injects the function pointer at link time, while the suggested approach injects the function pointer at the initialization of runtime. If we forget to declare the function pointer injection, in the current implementation we will get an link time error, while the suggested approach will get the error at runtime when we attempt to call the function. Both approaches LGTM.

@wenym1
Copy link
Contributor Author

wenym1 commented Oct 23, 2023

The risingwave_connector and risingwave_connector_common are merged back together. There seems no need to split them. The newly added risingwave_sink_impl is now depending on risingwave_connector.

@wenym1
Copy link
Contributor Author

wenym1 commented Oct 23, 2023

It seems that under the current implementation, some test binaries that implicitly introduces the dependency on risingwave_connector will fail to be built if the risingwave_sink_impl is not included in the dependency, even though the methods are not used, which is not a good develop experience. I've changed to use the suggested approach that injects the function pointers at init time.

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@xxchan
Copy link
Member

xxchan commented Oct 24, 2023

Can you update the PR description to reflect latest changes?

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. This rocks! 🤘

To be honest, the extern way looks too hacky to me, which reminds me of the good old days when writing C... 🤣

My thoughts: I don't think the extern way is that hacky, since at the end of the day we are always doing linking.. Actually #[ctor] looks even more magical to me 🤡. But the linking error is indeed a pain point.

[package.metadata.cargo-udeps.ignore]
normal = ["workspace-hack"]

[dependencies]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you help prune the dependencies here (and the connector crate) (via cargo machete or cargo udeps)?

Not a requirement since it won't have negative affects immediately. I can also do it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed some unused deps with the help of cargo-machete. Thanks for suggesting the tools!

@@ -27,6 +27,7 @@ use risingwave_common::{GIT_SHA, RW_VERSION};
use risingwave_common_heap_profiling::HeapProfiler;
use risingwave_meta::*;
use risingwave_meta_service::*;
use risingwave_sink_impl as _;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth documenting the reason of the import here. Otherwise it can be confusing for others.

Suggested change
use risingwave_sink_impl as _;
// Impls are registered at load time (via `#[ctor]`). This only happens
// if the impl crate is linked, which only happens if this crate is `use`d.
use risingwave_sink_impl as _;

Might also wrapped like risingwave_expr_impl::enable!() and document there (I'm ok with both approaches).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe can also documented at the crate level in sink_impl (via //!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I guess we can delay this to cmd. No need to put it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have introduced a risingwave_sink_impl::enable macro and add doc there. And move the dependency from meta_node to cmd.

src/connector/sink_impl/src/lib.rs Outdated Show resolved Hide resolved
src/connector/src/sink/mod.rs Show resolved Hide resolved
Comment on lines -30 to +31
use risingwave_connector::sink::{
build_sink, LogSinker, Sink, SinkImpl, SinkParam, SinkWriterParam,
};
use risingwave_connector::sink::{LogSinker, Sink, SinkParam, SinkWriterParam};
use risingwave_sink_impl::dispatch_sink;
use risingwave_sink_impl::sink::{build_sink, SinkImpl};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering can we also use the similar tricks for stream and frontend? 🤔️ If not, interested to know why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's doable for frontend. I have removed frontend's dependency for frontend.

For stream it's currently not doable because in the trait of LogSinker it currently takes a generic parameter LogReader, which is not fixed type to form a function pointer.

To enable this trick for stream, we can have two approaches. First, we can introduce boxed log reader and do dynamic dispatch on the log reader parameter, but the box log reader will create a box future when its method gets called and involves an extra memory allocation and may have performance issue. Second, we can first separate the implementation of log store from the stream crate to a new crate, and wrap all implementation of log reader with an enum, and the connector crate declares a function pointer with such log reader enum as input parameter.

The first method may have performance issue. The second method involves much more work. We can probably investigate the potential in future PR.

@BugenZhao
Copy link
Member

My thoughts: I don't think the extern way is that hacky, since at the end of the day we are always doing linking.. Actually #[ctor] looks even more magical to me 🤡

Actually, I'm unsure if it's totally safe to call an extern "Rust" fn as it's marked as unsafe. For example, does lifetime checker still work in this case?

@wenym1
Copy link
Contributor Author

wenym1 commented Oct 24, 2023

Actually, I'm unsure if it's totally safe to call an extern "Rust" fn as it's marked as unsafe. For example, does lifetime checker still work in this case?

The lifetime checker only checks the function interface and does not couple the function declaration and implementation, so if the extern fn is declared in the same signature as its implementation, I think it's safe.

The reason for marking extern fn as unsafe might be that, the linker links the object file by symbol and has no way to do any type check. So it will be unsafe if the extern fn declaration is of different signature as the its implementation.

@BugenZhao
Copy link
Member

Actually, I'm unsure if it's totally safe to call an extern "Rust" fn as it's marked as unsafe. For example, does lifetime checker still work in this case?

The lifetime checker only checks the function interface and does not couple the function declaration and implementation, so if the extern fn is declared in the same signature as its implementation, I think it's safe.

The reason for marking extern fn as unsafe might be that, the linker links the object file by symbol and has no way to do any type check. So it will be unsafe if the extern fn declaration is of different signature as the its implementation.

Get it. This makes much sense!

@gitguardian
Copy link

gitguardian bot commented Oct 25, 2023

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id Secret Commit Filename
7648795 Generic CLI Secret 3471bbe integration_tests/iceberg-cdc/run_test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

Attention: 52 lines in your changes are missing coverage. Please review.

Comparison is base (2fba274) 68.31% compared to head (e9c1154) 68.59%.

❗ Current head e9c1154 differs from pull request most recent head a9535c2. Consider uploading reports for the commit a9535c2 to get more accurate results

Files Patch % Lines
src/connector/sink_impl/src/lib.rs 79.48% 16 Missing ⚠️
src/connector/src/common.rs 0.00% 9 Missing ⚠️
src/connector/sink_impl/src/sink/mod.rs 75.00% 7 Missing ⚠️
src/connector/src/sink/mod.rs 70.83% 7 Missing ⚠️
src/frontend/src/handler/create_sink.rs 0.00% 6 Missing ⚠️
...rc/frontend/src/optimizer/plan_node/stream_sink.rs 84.21% 3 Missing ⚠️
...rc/manager/sink_coordination/coordinator_worker.rs 40.00% 3 Missing ⚠️
src/connector/src/lib.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12981      +/-   ##
==========================================
+ Coverage   68.31%   68.59%   +0.28%     
==========================================
  Files        1527     1497      -30     
  Lines      262564   251496   -11068     
==========================================
- Hits       179366   172514    -6852     
+ Misses      83198    78982    -4216     
Flag Coverage Δ
rust 68.59% <69.94%> (+0.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from my perspective, as no crud code changes😇

@wenym1 wenym1 force-pushed the yiming/separate-connector-crate branch from dfaa912 to e9c1154 Compare October 26, 2023 06:36
@fuyufjh
Copy link
Member

fuyufjh commented Nov 10, 2023

Any updates? Saw lots of conflicts 🥹

@xxchan
Copy link
Member

xxchan commented Dec 6, 2023

Just merged main and resolved conflicts? @wenym1 can you take a final quick look and let's merge it?

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ran a benchmark on my mac and surprisingly there are no improvements... 😇 (also tried on EC2, same)

> hyperfine -w1 -r3 -p 'cargo clean' 'git checkout yiming/separate-connector-crate && cargo build -p risingwave_cmd_all' 'git checkout main && cargo build -p risingwave_cmd_all'

Benchmark 1: git checkout yiming/separate-connector-crate && cargo build -p risingwave_cmd_all
  Time (mean ± σ):     226.308 s ±  0.162 s    [User: 1608.343 s, System: 176.206 s]
  Range (min … max):   226.123 s … 226.423 s    3 runs
 
Benchmark 2: git checkout main && cargo build -p risingwave_cmd_all
  Time (mean ± σ):     226.048 s ±  1.033 s    [User: 1616.179 s, System: 176.638 s]
  Range (min … max):   225.405 s … 227.240 s    3 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (227.240 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Summary
  'git checkout main && cargo build -p risingwave_cmd_all' ran
    1.00 ± 0.00 times faster than 'git checkout yiming/separate-connector-crate && cargo build -p risingwave_cmd_all'

@xxchan
Copy link
Member

xxchan commented Dec 6, 2023

main
image

this branch
image

😇😇😇

(btw, why is meta so slow again?....)

@xxchan
Copy link
Member

xxchan commented Dec 6, 2023

To clarify, I think it's still worth merging if it can help better organize code (I'm not sure whether it's the case for sink). And it may also help compile time in the future when the code bloats...

@wenym1
Copy link
Contributor Author

wenym1 commented Dec 7, 2023

I had the similar observation in my previous test, and then I chose the hold the PR to see whether we can get any improvement with this PR in the future, but it seems that there is still no improvement.

Besides, with this PR the code is less organized because we introduce a tricky hack, and the original crate is actually split at a unclear boundary.

So if we are not able to gain improvement at compile time, I think we'd better close this PR or hold it for later reference, and further investigate the reason for slow compile time on meta.

@xxchan
Copy link
Member

xxchan commented Dec 7, 2023

Besides, with this PR the code is less organized because we introduce a tricky hack, and the original crate is actually split at a unclear boundary.

Maybe we'd better seek for other ways to split the code.

https://materialize.com/blog/engineering/compile-times-and-code-graphs/

For each unit of business logic foo, separate crates for:

  • Types: for Plain Old Data, protobuf, traits that users of foo implement, etc.
  • Interface: for the public API without an implementation. Foursquare called this FooService. Materialize calls it foo-client.
  • Implementation: for the implementation of the public API. Foursquare called this FooConcrete. Materialize calls it foo.
  • Note that not every foo will have all three of these, and some will be more complicated, but I’ve found these three to be a reasonable default.

@xxchan xxchan closed this Dec 7, 2023
@xxchan xxchan deleted the yiming/separate-connector-crate branch April 18, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants