Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for compaction offloading #1545

Open
3 tasks
LeslieKid opened this issue Jul 17, 2024 · 5 comments
Open
3 tasks

Tracking issue for compaction offloading #1545

LeslieKid opened this issue Jul 17, 2024 · 5 comments
Assignees

Comments

@LeslieKid
Copy link
Contributor

LeslieKid commented Jul 17, 2024

Describe This Problem

We found in production that the speed of sst compaction is unable to keep up with the speed of sst generation, leading to poor query performance. However we are unable give more resource to compaction to solve the problem because query/write is more important than compaction in the same node.

It is really hard to do a trade-off about resource allocation among query, write and compaction in lsm model. We want to compact the generated small ssts as fast as possible, but we can't tolerate its influence to query/write. And finally I think offload the compaction to the seperated nodes may be the key for it.

Proposal

The following is the architecture for compaction offloading.

Architecture.png

To support compaction offloading, we need:

Additional Context

This issue replaces issue #1480. Please close issue #1480 as it is outdated.

incubator-horaedb-proto#133 is highly related to this issue.

@Rachelint
Copy link
Contributor

looks great!

@Rachelint
Copy link
Contributor

Rachelint commented Jul 29, 2024

I think in the first stage, we can treat the compaction cluster as same as the horaedb cluster.
And we just reuse the ability about monitoring and scheduling for horaedb cluster in horaemeta.

And we can start to desigin the specific schedule strategy for compaction cluster after it can work.

@LeslieKid
Copy link
Contributor Author

I think in the first stage, we can treat the compaction cluster as same as the horaedb cluster. And we just reuse the ability about monitoring and scheduling for horaedb cluster in horaemeta.

And we can start to desigin the specific schedule strategy for compaction cluster after it can work.

Thanks for your suggestion! I think it's a good idea to first implement a simple working version.

@Rachelint
Copy link
Contributor

I think Horaedb node supports submitting the compaction task to remote may be similar as Compaction node supporting remote compaction service .

I guess the third step can be integration test, test is very important actually.

@Rachelint
Copy link
Contributor

Maybe you will be insterested when writing test
https://github.com/apache/horaedb/tree/main/integration_tests/dist_query

Rachelint added a commit that referenced this issue Oct 30, 2024
## Rationale
The subtask to support compaction offloading. See #1545 

## Detailed Changes
**Compaction node support remote compaction service**

- Define `CompactionServiceImpl` to support compaction rpc service.
- Introduce `NodeType` to distinguish compaction node and horaedb node.
Enable the deployment of compaction node.

- Impl `compaction_client` for horaedb node to access remote compaction
node.

**Horaedb node support compaction offload**

- Introduce `compaction_mode` in analytic engine's `Config` to determine
whether exec compaction offload or not.
- Define `CompactionNodePicker` trait, supporting get remote compaction
node info.
- Impl `RemoteCompactionRunner`, supporting pick remote node and pass
compaction task to the node.
- Add docs (e.g. `example-cluster-n.toml`) to explain how to deploy a
cluster supporting compaction offload.

## Test Plan

---------

Co-authored-by: kamille <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants