Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify compaction job queue size limit to be memory based #5186

Open
keith-turner opened this issue Dec 14, 2024 · 4 comments
Open

Modify compaction job queue size limit to be memory based #5186

keith-turner opened this issue Dec 14, 2024 · 4 comments
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Milestone

Comments

@keith-turner
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Currently the compaction job configuration is based on a range of entries. An individual entry in the queue can vary in size based on the number files in the tablet and the number of files in the compaction job. So it is hard to reason about entires. The goal of limiting the size is to limit memory usage.

Describe the solution you'd like

Have a single configuration that is a memory upper limit for compaction job queues. For example the configuration would allow the queue to use up to 50M of memory. This would be much easier to understand and would work much better at limiting memory used by the queue. The current configuration based on a range of entries sizes (like the queue can range from 10 to 10000) entries does not control memory usage in a predictable way.

@keith-turner keith-turner added the enhancement This issue describes a new feature, improvement, or optimization. label Dec 14, 2024
@keith-turner keith-turner added this to the 4.0.0 milestone Dec 14, 2024
@cshannon
Copy link
Contributor

How do you envision computing the memory for the compaction job? Caffeine has a nice Weigher API for computing the weight of an entry to control max size based on memory instead of total count if the memory footprint for each entry can vary so maybe we can do something similar here.

@keith-turner
Copy link
Contributor Author

How do you envision computing the memory for the compaction job? Caffeine has a nice Weigher API for computing the weight of an entry to control max size based on memory instead of total count if the memory footprint for each entry can vary so maybe we can do something similar here.

Would write some custom code to compute a data size estimate of the object similar to what would be done in the implementation of a Weigher

@cshannon
Copy link
Contributor

Should #5188 be implemented first before we do this? The MetaJob object stores both the CompactionJob and TabletMetadata now so if we drop the TabletMetadata thenthere will be some refactoring and will impact the code here. This issue also becomes a lot easier if we don't have to try and estimate the TabletMetadata size.

@keith-turner
Copy link
Contributor Author

Should #5188 be implemented first before we do this? The MetaJob object stores both the CompactionJob and TabletMetadata now so if we drop the TabletMetadata thenthere will be some refactoring and will impact the code here. This issue also becomes a lot easier if we don't have to try and estimate the TabletMetadata size.

If this is done first should not expend much on effort on computing the tabletmetadata size. Could call TabletMetadata.toString().lenght() to estimate its size. This is not efficient, but it is quick to write the code for something that will go away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Projects
None yet
Development

No branches or pull requests

2 participants