Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compat: why do I see this alternating chunk levels #7968

Open
calestyo opened this issue Dec 9, 2024 · 1 comment
Open

compat: why do I see this alternating chunk levels #7968

calestyo opened this issue Dec 9, 2024 · 1 comment

Comments

@calestyo
Copy link

calestyo commented Dec 9, 2024

Thanos, Prometheus and Golang version used:

thanos, version 0.37.1 (branch: HEAD, revision: e0812e2f46f81af3324686d910d885d8f2751d46)
  build user:       root@1294db3510d8
  build date:       20241204-08:25:27
  go version:       go1.23.3
  platform:         linux/amd64
  tags:             netgo

(this happened at least until 0.36.1, but I've upgraded to 0.37.1 today)

Object Storage Provider:
FILESYSTEM

What happened:

I basically see the following:
Screenshot from 2024-07-20 18-19-38
where, I blindly guess, the "doubled chunks" (at the same level) may however be the same issue than #7488 respectively fixed by #7492.

But haven't been able to confirm that, yet.

I do however also get:
Screenshot from 2024-07-20 18-19-19
Where you can see that ever few time spans, the:

  • level 4, with resolutions 0, 300000 and 3600000
    are "shifted" to:
  • level 5, with resolutions 0, 300000 and 3600000

One of the sources is from sidecar (the non-alternating one, 1 in the lower image) the other is via receive (the non-alternating one, 2 in the lower image).

What you expected to happen:

Well, not alternating?

Anything else we need to know:
Well, I (still) regularly suffer from #7197, i.e. for some reason, compact seems to need more and more space to do it's job (despite not more metrics/instances being scraped).

Things are described there in more detail, but in short, compact uses more and more space for no good reason, then fills the filesystem, doesn't even properly exit with an error, and I have to manually clean everything up (and add extra space).

Maybe something with the levels gets messed up when that happens?

Environment:

  • OS (e.g. from /etc/os-release):

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

- Kernel (e.g. `uname -a`): `Linux lcg-lrz-monitor-ext 6.1.0-27-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.115-1 (2024-11-01) x86_64 GNU/Linux`
Copy link

dosubot bot commented Dec 9, 2024

Hey there, @calestyo! I'm here to assist you with any bugs, questions, or contributions you might have. Let's tackle this issue together!

I found a similar unsolved discussion that might be relevant to your issue: Compact: critical error detected; halting (pre compaction overlap check: overlaps found while gathering blocks), which was last updated on April 26, 2024 [1].

Regarding the alternating chunk levels you are experiencing, the resolutions '0', '300000', and '3600000' relate to different levels of data downsampling in Thanos. Resolution '0' is raw data, '300000' is downsampled to 5-minute intervals, and '3600000' is typically downsampled to 1-hour intervals [2].

The issue with alternating levels between '4' and '5' might be related to how Thanos handles compaction and transitions between levels. Thanos uses criteria such as block size, count, and parent range filling to determine when to transition between compaction levels. It also handles gaps and overlaps carefully to avoid improper compaction [3].

The problem with increasing space usage during compaction could be linked to known issues such as overlapping blocks or large total index sizes, which can prevent proper compaction and lead to increased space usage [4].

The 'sidecar' and 'receive' components have distinct roles in data ingestion and storage, which might affect how data is compacted. The 'sidecar' uploads data from Prometheus to cloud storage, while the 'receive' ingests samples from remote Prometheus instances. Both components upload data to cloud storage for the compactor to process, but they do not perform compaction themselves [5][6].

To address these issues, you might want to investigate the specific configurations and interactions between these components and the compactor, as well as any potential overlaps or large index sizes in your data blocks.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant