-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: remove meta's dependency on sink implementation to reduce compile time #12981
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has totally checked 4288 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
1907 | 1 | 2380 | 0 |
Click to see the invalid file list
- src/connector/connector_common/src/kafka/mod.rs
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has totally checked 4290 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
1908 | 1 | 2381 | 0 |
Click to see the invalid file list
- src/connector/sink_impl/src/lib.rs
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
To be honest, the I would suggest the approach below:
#[async_trait]
trait Validate {
fn validate(_: &PbSink) -> Result<()>;
}
static VALIDATOR: OnceLock<Box<dyn Validate>> = OnceLock::new();
pub fn validator() -> &dyn Validate {
VALIDATOR.get().unwrap()
}
struct Validator { .. }
impl Validate for Validator { .. }
#[ctor]
fn init_validator() {
VALIDATOR.set(Validator::new());
}
validator().validate(..); The idea will be very similar to @wangrunji0408's function registry. risingwave/src/expr/core/src/sig/mod.rs Lines 34 to 43 in f6e3444
|
Actually for the suggested approach, we don't need to define an extra trait. A |
The |
It seems that under the current implementation, some test binaries that implicitly introduces the dependency on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Can you update the PR description to reflect latest changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. This rocks! 🤘
To be honest, the extern way looks too hacky to me, which reminds me of the good old days when writing C... 🤣
My thoughts: I don't think the extern
way is that hacky, since at the end of the day we are always doing linking.. Actually #[ctor]
looks even more magical to me 🤡. But the linking error is indeed a pain point.
[package.metadata.cargo-udeps.ignore] | ||
normal = ["workspace-hack"] | ||
|
||
[dependencies] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you help prune the dependencies here (and the connector crate) (via cargo machete
or cargo udeps
)?
Not a requirement since it won't have negative affects immediately. I can also do it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed some unused deps with the help of cargo-machete. Thanks for suggesting the tools!
src/meta/node/src/lib.rs
Outdated
@@ -27,6 +27,7 @@ use risingwave_common::{GIT_SHA, RW_VERSION}; | |||
use risingwave_common_heap_profiling::HeapProfiler; | |||
use risingwave_meta::*; | |||
use risingwave_meta_service::*; | |||
use risingwave_sink_impl as _; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth documenting the reason of the import here. Otherwise it can be confusing for others.
use risingwave_sink_impl as _; | |
// Impls are registered at load time (via `#[ctor]`). This only happens | |
// if the impl crate is linked, which only happens if this crate is `use`d. | |
use risingwave_sink_impl as _; |
Might also wrapped like risingwave_expr_impl::enable!()
and document there (I'm ok with both approaches).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe can also documented at the crate level in sink_impl
(via //!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I guess we can delay this to cmd
. No need to put it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have introduced a risingwave_sink_impl::enable
macro and add doc there. And move the dependency from meta_node to cmd.
use risingwave_connector::sink::{ | ||
build_sink, LogSinker, Sink, SinkImpl, SinkParam, SinkWriterParam, | ||
}; | ||
use risingwave_connector::sink::{LogSinker, Sink, SinkParam, SinkWriterParam}; | ||
use risingwave_sink_impl::dispatch_sink; | ||
use risingwave_sink_impl::sink::{build_sink, SinkImpl}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering can we also use the similar tricks for stream
and frontend
? 🤔️ If not, interested to know why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's doable for frontend. I have removed frontend's dependency for frontend.
For stream it's currently not doable because in the trait of LogSinker
it currently takes a generic parameter LogReader
, which is not fixed type to form a function pointer.
To enable this trick for stream, we can have two approaches. First, we can introduce boxed log reader and do dynamic dispatch on the log reader parameter, but the box log reader will create a box future when its method gets called and involves an extra memory allocation and may have performance issue. Second, we can first separate the implementation of log store from the stream crate to a new crate, and wrap all implementation of log reader with an enum, and the connector crate declares a function pointer with such log reader enum as input parameter.
The first method may have performance issue. The second method involves much more work. We can probably investigate the potential in future PR.
Actually, I'm unsure if it's totally safe to call an |
The lifetime checker only checks the function interface and does not couple the function declaration and implementation, so if the extern fn is declared in the same signature as its implementation, I think it's safe. The reason for marking extern fn as unsafe might be that, the linker links the object file by symbol and has no way to do any type check. So it will be unsafe if the extern fn declaration is of different signature as the its implementation. |
Get it. This makes much sense! |
|
GitGuardian id | Secret | Commit | Filename | |
---|---|---|---|---|
7648795 | Generic CLI Secret | 3471bbe | integration_tests/iceberg-cdc/run_test.sh | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #12981 +/- ##
==========================================
+ Coverage 68.31% 68.59% +0.28%
==========================================
Files 1527 1497 -30
Lines 262564 251496 -11068
==========================================
- Hits 179366 172514 -6852
+ Misses 83198 78982 -4216
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from my perspective, as no crud code changes😇
dfaa912
to
e9c1154
Compare
Any updates? Saw lots of conflicts 🥹 |
Just merged main and resolved conflicts? @wenym1 can you take a final quick look and let's merge it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ran a benchmark on my mac and surprisingly there are no improvements... 😇 (also tried on EC2, same)
> hyperfine -w1 -r3 -p 'cargo clean' 'git checkout yiming/separate-connector-crate && cargo build -p risingwave_cmd_all' 'git checkout main && cargo build -p risingwave_cmd_all'
Benchmark 1: git checkout yiming/separate-connector-crate && cargo build -p risingwave_cmd_all
Time (mean ± σ): 226.308 s ± 0.162 s [User: 1608.343 s, System: 176.206 s]
Range (min … max): 226.123 s … 226.423 s 3 runs
Benchmark 2: git checkout main && cargo build -p risingwave_cmd_all
Time (mean ± σ): 226.048 s ± 1.033 s [User: 1616.179 s, System: 176.638 s]
Range (min … max): 225.405 s … 227.240 s 3 runs
Warning: The first benchmarking run for this command was significantly slower than the rest (227.240 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
Summary
'git checkout main && cargo build -p risingwave_cmd_all' ran
1.00 ± 0.00 times faster than 'git checkout yiming/separate-connector-crate && cargo build -p risingwave_cmd_all'
To clarify, I think it's still worth merging if it can help better organize code (I'm not sure whether it's the case for sink). And it may also help compile time in the future when the code bloats... |
I had the similar observation in my previous test, and then I chose the hold the PR to see whether we can get any improvement with this PR in the future, but it seems that there is still no improvement. Besides, with this PR the code is less organized because we introduce a tricky hack, and the original crate is actually split at a unclear boundary. So if we are not able to gain improvement at compile time, I think we'd better close this PR or hold it for later reference, and further investigate the reason for slow compile time on meta. |
Maybe we'd better seek for other ways to split the code. https://materialize.com/blog/engineering/compile-times-and-code-graphs/
|
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Move the implementation of all sinks to a separate crate. The original
risingwave_connector
is splited into 2 crates:risingwave_connector
,risingwave_sink_impl
.Originally,
risingwave_meta
depends on the wholerisingwave_connector
, in which case it should wait for the long compilation of the wholerisingwave_connector
. After this PR, therisingwave_meta
only depends on a much smallerrisingwave_connector
with sink implementation removed from it, andrisingwave_meta
can start its compilation much earlier.The
risingwave_meta
depends on the coordinator and validation logic of each concrete specific sink implementation, which was a reason forrisingwave_meta
to inevitably depend on the logic in currentrisingwave_sink_impl
. This PR decouplesrisingwave_meta
withrisingwave_sink_impl
by exposing the functions for coordinator and validation logic as function pointers inrisingwave_connector
, and via macro#[ctor]
, at initializationrisingwave_sink_impl
will inject the function implementation to the function pointers declared inrisingwave_connector
.CI build component: 3m44s -> 2m37s
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.