Modify compaction job queue size limit to be memory based #5186

keith-turner · 2024-12-14T21:52:46Z

Is your feature request related to a problem? Please describe.

Currently the compaction job configuration is based on a range of entries. An individual entry in the queue can vary in size based on the number files in the tablet and the number of files in the compaction job. So it is hard to reason about entires. The goal of limiting the size is to limit memory usage.

Describe the solution you'd like

Have a single configuration that is a memory upper limit for compaction job queues. For example the configuration would allow the queue to use up to 50M of memory. This would be much easier to understand and would work much better at limiting memory used by the queue. The current configuration based on a range of entries sizes (like the queue can range from 10 to 10000) entries does not control memory usage in a predictable way.

cshannon · 2024-12-15T17:50:21Z

How do you envision computing the memory for the compaction job? Caffeine has a nice Weigher API for computing the weight of an entry to control max size based on memory instead of total count if the memory footprint for each entry can vary so maybe we can do something similar here.

keith-turner · 2024-12-15T18:12:22Z

How do you envision computing the memory for the compaction job? Caffeine has a nice Weigher API for computing the weight of an entry to control max size based on memory instead of total count if the memory footprint for each entry can vary so maybe we can do something similar here.

Would write some custom code to compute a data size estimate of the object similar to what would be done in the implementation of a Weigher

cshannon · 2024-12-20T12:33:55Z

Should #5188 be implemented first before we do this? The MetaJob object stores both the CompactionJob and TabletMetadata now so if we drop the TabletMetadata thenthere will be some refactoring and will impact the code here. This issue also becomes a lot easier if we don't have to try and estimate the TabletMetadata size.

keith-turner · 2024-12-20T17:23:16Z

Should #5188 be implemented first before we do this? The MetaJob object stores both the CompactionJob and TabletMetadata now so if we drop the TabletMetadata thenthere will be some refactoring and will impact the code here. This issue also becomes a lot easier if we don't have to try and estimate the TabletMetadata size.

If this is done first should not expend much on effort on computing the tabletmetadata size. Could call TabletMetadata.toString().lenght() to estimate its size. This is not efficient, but it is quick to write the code for something that will go away.

keith-turner added the enhancement This issue describes a new feature, improvement, or optimization. label Dec 14, 2024

keith-turner added this to the 4.0.0 milestone Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify compaction job queue size limit to be memory based #5186

Modify compaction job queue size limit to be memory based #5186

keith-turner commented Dec 14, 2024

cshannon commented Dec 15, 2024

keith-turner commented Dec 15, 2024

cshannon commented Dec 20, 2024

keith-turner commented Dec 20, 2024

Modify compaction job queue size limit to be memory based #5186

Modify compaction job queue size limit to be memory based #5186

Comments

keith-turner commented Dec 14, 2024

cshannon commented Dec 15, 2024

keith-turner commented Dec 15, 2024

cshannon commented Dec 20, 2024

keith-turner commented Dec 20, 2024