Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Tablets split and merge #8948

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yarongilor
Copy link
Contributor

@yarongilor yarongilor commented Oct 8, 2024

A test of high load, causing rapid tablet splits.
Then have a deletion burst and a major compaction to trigger rapid merges as well.

Test description doc.

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@soyacz
Copy link
Contributor

soyacz commented Oct 9, 2024

@yarongilor what's the point of creating PR's without any description?

@yarongilor yarongilor force-pushed the tablets_split_merge branch 2 times, most recently from d09408f to 6a436a0 Compare October 9, 2024 10:40
@yarongilor yarongilor changed the title WIP Test Tablets split and merge Oct 9, 2024
@yarongilor yarongilor force-pushed the tablets_split_merge branch 3 times, most recently from e9c26e8 to 82ca4f6 Compare October 9, 2024 14:52
@yarongilor yarongilor requested review from raphaelsc and pehala October 9, 2024 16:56
@yarongilor
Copy link
Contributor Author

yarongilor commented Oct 9, 2024

Started testing for the new PR test in https://argus.scylladb.com/test/e6972842-591d-472f-9e39-f196d3670053/runs?additionalRuns[]=93284dff-cfe7-4b45-883f-53509355301d

since split-merge code is not merged to master, the test probably failed as expected with:

Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/tablets_split_merge_test.py", line 121, in test_tablets_split_merge
self._wait_for_more_tablet_than(tablets_number_split_threshold)
File "/home/ubuntu/scylla-cluster-tests/tablets_split_merge_test.py", line 143, in _wait_for_more_tablet_than
wait.wait_for(func=lambda: self._get_tablets_number() > tablets_num, step=60,
File "/home/ubuntu/scylla-cluster-tests/sdcm/wait.py", line 86, in wait_for
raise raising_exc from ex
sdcm.exceptions.WaitForTimeoutError: Wait for: Waiting for a tablets number bigger than 2: timeout - 1800 seconds - expired

Waiting for a private build from Raphael in https://jenkins.scylladb.com/job/releng/job/create-private-build/306/

Not sure the PR can fully reviewed before it is tested.

@soyacz
Copy link
Contributor

soyacz commented Oct 10, 2024

@yarongilor you can test using SCT custom branches: https://docs.google.com/presentation/d/1P5xofncoTkUI-uQ5ilG9eRhRoEfYVdJRhdps4qQ6Mx0/edit?pli=1#slide=id.g2e83887fc9a_0_0
No need to create private builds anymore. (when it builds once, reuse produced AMI)

@yarongilor
Copy link
Contributor Author

Private build https://jenkins.scylladb.com/job/releng/job/create-private-build/307/
is tested (using update_db_package in https://argus.scylladb.com/test/e6972842-591d-472f-9e39-f196d3670053/runs?additionalRuns[]=169fabaf-bcda-4e16-88c0-ebec5fa30595
and its setup failed with:

2024-10-10 08:47:48.848: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=8c8f6ee8-af11-470c-b8cb-173b559053dc, source=TabletsSplitMergeTest.SetUp()
exception=[Node tablets-split-merge-master-db-node-169fabaf-3 [18.203.126.69 | 10.4.0.78]] NodeSetupFailed: Encountered a bad command exit code!
Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get -o DPkg::Lock::Timeout=120 -o Dpkg::Options::="--force-confold" -o Dpkg::Options::="--force-confdef" install -y lsof net-tools'
Exit code: 100
Stdout:
Reading state information...
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
scylla-tools-core : Depends: openjdk-8-jre-headless but it is not going to be installed or
openjdk-8-jre but it is not going to be installed or
oracle-java8-set-default but it is not installable or
adoptopenjdk-8-hotspot-jre but it is not installable or
openjdk-11-jre-headless but it is not going to be installed or
openjdk-11-jre but it is not going to be installed or
oracle-java11-set-default but it is not installable
Stderr:
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 3880, in node_setup
cl_inst.node_setup(_node, **setup_kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 4651, in node_setup
node.install_package('lsof net-tools')
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 72, in inner
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 1847, in install_package
self.remoter.sudo(f'{pkg_cmd} install -y {package_name}', ignore_status=ignore_status)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/base.py", line 123, in sudo
return self.run(cmd=cmd,
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 653, in run
result = _run()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 72, in inner
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 644, in _run
return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 577, in _run_execute
result = connection.run(**command_kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 620, in run
return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 655, in _complete_run
raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!
Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get -o DPkg::Lock::Timeout=120 -o Dpkg::Options::="--force-confold" -o Dpkg::Options::="--force-confdef" install -y lsof net-tools'
Exit code: 100
Stdout:
Reading state information...
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
scylla-tools-core : Depends: openjdk-8-jre-headless but it is not going to be installed or
openjdk-8-jre but it is not going to be installed or
oracle-java8-set-default but it is not installable or
adoptopenjdk-8-hotspot-jre but it is not installable or
openjdk-11-jre-headless but it is not going to be installed or
openjdk-11-jre but it is not going to be installed or
oracle-java11-set-default but it is not installable
Stderr:
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

@raphaelsc , can you please see why it's broken?
You can also try building a private AMI using above suggestion from @soyacz .

@soyacz
Copy link
Contributor

soyacz commented Oct 10, 2024

E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).


@raphaelsc , can you please see why it's broken? You can also try building a private AMI using above suggestion from @soyacz .

Need to remove all packages that require java. It was raised, but closed as it should be soon fixed: #8474

@raphaelsc
Copy link
Member

E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).


@raphaelsc , can you please see why it's broken? You can also try building a private AMI using above suggestion from @soyacz .

Need to remove all packages that require java. It was raised, but closed as it should be soon fixed: #8474

@yarongilor @soyacz I don't understand what I have to do to fix this. I need instructions. We need a custom branch of mine (i.e. work that is not yet on master), which is why I had to produce a private build.

@soyacz
Copy link
Contributor

soyacz commented Oct 10, 2024

E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).


@raphaelsc , can you please see why it's broken? You can also try building a private AMI using above suggestion from @soyacz .

Need to remove all packages that require java. It was raised, but closed as it should be soon fixed: #8474

@yarongilor @soyacz I don't understand what I have to do to fix this. I need instructions. We need a custom branch of mine (i.e. work that is not yet on master), which is why I had to produce a private build.

two possibilities:

  1. remove all packages that require java from the pack you sent to s3
  2. Run SCT with specifying your private branch in BYO section of jenkins pipeline parameters - as in presentation linked by me (work with @yarongilor if you need more instructions). This way, SCT will build AMI from your branch and test against it. Next time you need to rerun test with the same binaries, grab the AMI and use it to save time on building again.

@raphaelsc
Copy link
Member

E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).


@raphaelsc , can you please see why it's broken? You can also try building a private AMI using above suggestion from @soyacz .

Need to remove all packages that require java. It was raised, but closed as it should be soon fixed: #8474

@yarongilor @soyacz I don't understand what I have to do to fix this. I need instructions. We need a custom branch of mine (i.e. work that is not yet on master), which is why I had to produce a private build.

two possibilities:

  1. remove all packages that require java from the pack you sent to s3
  2. Run SCT with specifying your private branch in BYO section of jenkins pipeline parameters - as in presentation linked by me (work with @yarongilor if you need more instructions). This way, SCT will build AMI from your branch and test against it. Next time you need to rerun test with the same binaries, grab the AMI and use it to save time on building again.

@yarongilor It seems the 2nd sounds easier. link to the branch: https://github.com/scylladb/scylla-dev/tree/tablet-merge

@yarongilor
Copy link
Contributor Author

The deletions in c-s are not good enough and might become a bottle neck or fail.
Better swith to use a configuration of scylla-bench + large-partitions, so deletion could be much faster.

@yarongilor yarongilor force-pushed the tablets_split_merge branch 2 times, most recently from 8c6a9ac to d94a726 Compare October 10, 2024 19:35
@yarongilor
Copy link
Contributor Author

yarongilor commented Oct 13, 2024

E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).


@raphaelsc , can you please see why it's broken? You can also try building a private AMI using above suggestion from @soyacz .

Need to remove all packages that require java. It was raised, but closed as it should be soon fixed: #8474

@yarongilor @soyacz I don't understand what I have to do to fix this. I need instructions. We need a custom branch of mine (i.e. work that is not yet on master), which is why I had to produce a private build.

two possibilities:

  1. remove all packages that require java from the pack you sent to s3
  2. Run SCT with specifying your private branch in BYO section of jenkins pipeline parameters - as in presentation linked by me (work with @yarongilor if you need more instructions). This way, SCT will build AMI from your branch and test against it. Next time you need to rerun test with the same binaries, grab the AMI and use it to save time on building again.

@yarongilor It seems the 2nd sounds easier. link to the branch: https://github.com/scylladb/scylla-dev/tree/tablet-merge

Building an AMI in:
https://jenkins.scylladb.com/job/scylla-staging/job/yarongilor/job/longevity-10gb-3h-test-yg/2/
and internally in https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2538/:
us-east-1-x86_64 = ami-07af88cdc0b3d8af9

image

retested

@yarongilor yarongilor force-pushed the tablets_split_merge branch 3 times, most recently from 7ce8506 to 063f0c7 Compare October 13, 2024 10:18
@yarongilor yarongilor requested a review from bhalevy October 13, 2024 12:48
@yarongilor
Copy link
Contributor Author

@raphaelsc , @bhalevy , the test seem to fail, since it doesn't get more than 1 tablet:

2024-10-13 14:21:00.942: (InfoEvent Severity.ERROR) period_type=not-set event_id=2fdc3fcb-2b96-455a-8e04-648efe224792: message=Waiting for a tablets number bigger than 4 FAILED.

But there might be a different issue faced here, related to the replication strategy.
i opened scylladb/scylladb#21084 for it.

@yarongilor yarongilor marked this pull request as ready for review October 14, 2024 09:21
@yarongilor
Copy link
Contributor Author

opened an issue for missing split as well: scylladb/scylladb#21092

@yarongilor
Copy link
Contributor Author

yarongilor commented Nov 28, 2024

There's no need for the test to be focused on implementation details of split and merge. It's actually bad for the long term.

OK, thanks, what i realized from the above comments is that the trigger for split could be the data size on disk and not the "pure" dataset size.

So i changed the test to become simpler in a way. It will just run a background thread that samples tablets-number every few seconds, saving the maximum number over time.
I'll retest if it now correctly find the maximum tablets number and confirms split and merge.

Test passes ok now.

@yarongilor yarongilor force-pushed the tablets_split_merge branch 8 times, most recently from 206bca7 to a12ff32 Compare December 1, 2024 11:13
@yarongilor
Copy link
Contributor Author

The test fails now since a tablets merge never happen.
@raphaelsc , i put all scenario data in an issue for a better tracking -
scylladb/scylladb#21736

@raphaelsc
Copy link
Member

The test fails now since a tablets merge never happen. @raphaelsc , i put all scenario data in an issue for a better tracking - scylladb/scylladb#21736

i don't think we should open issues against scylla, since merge is not merged to master yet. so there's nothing to be done there.

@yarongilor yarongilor force-pushed the tablets_split_merge branch 3 times, most recently from 2a4b60e to 13a0c0d Compare December 2, 2024 10:23
@yarongilor
Copy link
Contributor Author

yarongilor commented Dec 2, 2024

The latest run encountered token_group_based_splitting_mutation_writer:

2024-12-02 09:05:37.687
Received: 2024-12-02 09:05:36.743
one-time
tablets-split-merge-tablets--db-node-9aefb63e-1
2024-12-02 09:05:37.687 <2024-12-02 09:05:36.743>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=ccc086b4-06a0-459c-96ca-fec9f783c231: type=DATABASE_ERROR regex=(^ERROR|!\s*?ERR).*\[shard.*\] line_number=81020 node=tablets-split-merge-tablets--db-node-9aefb63e-1
2024-12-02T09:05:36.743+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1      !ERR | scylla[4975]:  [shard  4:strm] token_group_based_splitting_mutation_writer - Token group id cannot go backwards, current=0, previous=1, at: 0x621849e 0x6218ac0 0x6218dc8 0x5cd9fa7 0x4638fe3 0x4638a4c 0x463a7a5 0x143226a 0x5d18adf 0x5d1a05a 0x5d1b237 0x5d3fd70 0x5cdacba /opt/scylladb/libreloc/libc.so.6+0x976d6 /opt/scylladb/libreloc/libc.so.6+0x11b60b

and then a major compaction failed due to a core dump:

2024-12-02 09:05:37.743
Received: 2024-12-02 09:05:36.744
one-time
tablets-split-merge-tablets--db-node-9aefb63e-1
2024-12-02 09:05:37.743 <2024-12-02 09:05:36.744>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=e9b0cd64-ed59-4e42-88fa-7467b06797df: type=ABORTING_ON_SHARD regex=Aborting on shard line_number=81041 node=tablets-split-merge-tablets--db-node-9aefb63e-1
2024-12-02T09:05:36.744+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]: Aborting on shard 4, in scheduling group streaming.
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at reactor.cc:?
seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at reactor.cc:?
seastar::install_oneshot_signal_handler<6, (void (*)())(&seastar::sigabrt_action)>()::{lambda(int, siginfo_t*, void*)#1}::__invoke(int, siginfo_t*, void*) at reactor.cc:?
?? ??:0
?? ??:0
?? ??:0
?? ??:0
seastar::on_internal_error(seastar::logger&, std::basic_string_view<char, std::char_traits<char> >) at on_internal_error.cc:?
mutation_writer::token_group_based_splitting_mutation_writer::consume(partition_start&&) at token_group_based_splitting_writer.cc:?
_ZNO20mutation_fragment_v27consumeIN15mutation_writer43token_group_based_splitting_mutation_writerEQ26MutationFragmentConsumerV2IT_DTcldtclsr3stdE7declvalIS3_EE7consumeclL_ZSt7declvalI22range_tombstone_changeEDTcl9__declvalIS3_ELi0EEEvEEEEEEEDcRS3_ at token_group_based_splitting_writer.cc:?
_ZN15mutation_writer11feed_writerINS_43token_group_based_splitting_mutation_writerEQ26MutationFragmentConsumerV2IT_N7seastar6futureIvEEEEES5_O15mutation_readerS2_.resume at token_group_based_splitting_writer.cc:?
seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at main.cc:?
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at reactor.cc:?
seastar::reactor::run_some_tasks() at reactor.cc:?
seastar::reactor::do_run() at reactor.cc:?
std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke(std::_Any_data const&) at reactor.cc:?
seastar::posix_thread::start_routine(void*) at posix.cc:?
?? ??:0
?? ??:0
2024-12-02 09:10:01.684
end
tablets-split-merge-tablets--db-node-9aefb63e-1
2024-12-02 09:10:01.684: (NodetoolEvent Severity.ERROR) period_type=end event_id=0627dfcf-897e-430b-804d-3c21332639e2 duration=11m7s: nodetool_command=compact node=tablets-split-merge-tablets--db-node-9aefb63e-1 errors=["Encountered a bad command exit code!\n\nCommand: '/usr/bin/nodetool  compact scylla_bench test'\n\nExit code: 2\n\nStdout:\n\n\n\nStderr:\n\nerror running operation: std::system_error (error system:103, Software caused connection abort)\n\n", 'Traceback (most recent call last):\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2676, in run_nodetool\n    runner(cmd, timeout=timeout, ignore_status=ignore_status, verbose=verbose, retry=retry)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 653, in run\n    result = _run()\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 72, in inner\n    return func(*args, **kwargs)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 644, in _run\n    return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 577, in _run_execute\n    result = connection.run(**command_kwargs)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 625, in run\n    return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 660, in _complete_run\n    raise UnexpectedExit(result)\nsdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!\n\nCommand: \'/usr/bin/nodetool  compact scylla_bench test\'\n\nExit code: 2\n\nStdout:\n\n\n\nStderr:\n\nerror running operation: std::system_error (error system:103, Software caused connection abort)\n\n\n']
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2676, in run_nodetool
runner(cmd, timeout=timeout, ignore_status=ignore_status, verbose=verbose, retry=retry)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 653, in run
result = _run()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 72, in inner
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 644, in _run
return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 577, in _run_execute
result = connection.run(**command_kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 625, in run
return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 660, in _complete_run
raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!
Command: '/usr/bin/nodetool  compact scylla_bench test'
Exit code: 2
Stdout:
Stderr:
error running operation: std::system_error (error system:103, Software caused connection abort)
2024-12-02 09:10:55.703 <2024-12-02 09:05:36.000>: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=aa7bdb71-f5c1-446f-96bf-e1a7210b86d2 node=Node tablets-split-merge-tablets--db-node-9aefb63e-1 [54.227.206.63 | 10.12.7.229]
corefile_url=https://storage.cloud.google.com/upload.scylladb.com/core.scylla.106.190b79ecb166414ebb252274f566e9b5.4975.1733130336000000./core.scylla.106.190b79ecb166414ebb252274f566e9b5.4975.1733130336000000.zst
backtrace=           PID: 4975 (scylla)
UID: 106 (scylla)
GID: 108 (scylla)
Signal: 6 (ABRT)
Timestamp: Mon 2024-12-02 09:05:36 UTC (5min ago)
Command Line: /usr/bin/scylla --blocked-reactor-notify-ms 25 --abort-on-lsa-bad-alloc 1 --abort-on-seastar-bad-alloc --abort-on-internal-error 1 --abort-on-ebadf 1 --enable-sstable-key-validation 1 --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 1-7,9-15 --lock-memory=1
Executable: /opt/scylladb/libexec/scylla
Control Group: /scylla.slice/scylla-server.slice/scylla-server.service
Unit: scylla-server.service
Slice: scylla-server.slice
Boot ID: 190b79ecb166414ebb252274f566e9b5
Machine ID: ec2980c2333f1a59cff630bd96c8bdb8
Hostname: tablets-split-merge-tablets--db-node-9aefb63e-1
Storage: /var/lib/systemd/coredump/core.scylla.106.190b79ecb166414ebb252274f566e9b5.4975.1733130336000000.zst (present)
Size on Disk: 2.5G
Message: Process 4975 (scylla) of user 106 dumped core.
Module liblzma.so.5 from rpm xz-5.4.6-3.fc40.x86_64
Module libicudata.so.74 from rpm icu-74.2-1.fc40.x86_64
Module libcap.so.2 from rpm libcap-2.69-8.fc40.x86_64
Module libffi.so.8 from rpm libffi-3.4.4-7.fc40.x86_64
Module libboost_unit_test_framework.so.1.83.0 from rpm boost-1.83.0-5.fc40.x86_64
Module liblua-5.4.so from rpm lua-5.4.6-5.fc40.x86_64
Module libjsoncpp.so.25 from rpm jsoncpp-1.9.5-7.fc40.x86_64
Module libsystemd.so.0 from rpm systemd-255.12-1.fc40.x86_64
Module libdeflate.so.0 from rpm libdeflate-1.21-2.fc40.x86_64
Module libxxhash.so.0 from rpm xxhash-0.8.2-2.fc40.x86_64
Module libicui18n.so.74 from rpm icu-74.2-1.fc40.x86_64
Module libicuuc.so.74 from rpm icu-74.2-1.fc40.x86_64
Module libboost_regex.so.1.83.0 from rpm boost-1.83.0-5.fc40.x86_64
Module libboost_date_time.so.1.83.0 from rpm boost-1.83.0-5.fc40.x86_64
Module libcryptopp.so.8 from rpm cryptopp-8.8.0-7.fc40.x86_64
Module libcrypt.so.2 from rpm libxcrypt-4.4.36-5.fc40.x86_64
Module libsnappy.so.1 from rpm snappy-1.1.10-4.fc40.x86_64
Module libyaml-cpp.so.0.7 from rpm yaml-cpp-0.7.0-5.fc40.x86_64
Module libudev.so.1 from rpm systemd-255.12-1.fc40.x86_64
Module libhwloc.so.15 from rpm hwloc-2.11.1-1.fc40.x86_64
Module libprotobuf.so.30 from rpm protobuf-3.19.6-8.fc40.x86_64
Module libzstd.so.1 from rpm zstd-1.5.6-1.fc40.x86_64
Module libbrotlicommon.so.1 from rpm brotli-1.1.0-3.fc40.x86_64
Module libbrotlidec.so.1 from rpm brotli-1.1.0-3.fc40.x86_64
Module libbrotlienc.so.1 from rpm brotli-1.1.0-3.fc40.x86_64
Module libz.so.1 from rpm zlib-ng-2.1.7-2.fc40.x86_64
Module libp11-kit.so.0 from rpm p11-kit-0.25.5-1.fc40.x86_64
Module libidn2.so.0 from rpm libidn2-2.3.7-1.fc40.x86_64
Module libtasn1.so.6 from rpm libtasn1-4.19.0-6.fc40.x86_64
Module libnettle.so.8 from rpm nettle-3.9.1-6.fc40.x86_64
Module libhogweed.so.6 from rpm nettle-3.9.1-6.fc40.x86_64
Module libunistring.so.5 from rpm libunistring-1.1-7.fc40.x86_64
Module libgmp.so.10 from rpm gmp-6.2.1-8.fc40.x86_64
Module libgnutls.so.30 from rpm gnutls-3.8.6-1.fc40.x86_64
Module liblz4.so.1 from rpm lz4-1.9.4-6.fc40.x86_64
Module libnuma.so.1 from rpm numactl-2.0.16-5.fc40.x86_64
Module libsctp.so.1 from rpm lksctp-tools-1.0.19-6.fc40.x86_64
Module libfmt.so.10 from rpm fmt-10.2.1-4.fc40.x86_64
Module libcares.so.2 from rpm c-ares-1.28.1-1.fc40.x86_64
Module libboost_system.so.1.83.0 from rpm boost-1.83.0-5.fc40.x86_64
Module libboost_thread.so.1.83.0 from rpm boost-1.83.0-5.fc40.x86_64
Module libboost_program_options.so.1.83.0 from rpm boost-1.83.0-5.fc40.x86_64
Stack trace of thread 4979:
#0  0x00007e7123aa8664 __pthread_kill_implementation (libc.so.6 + 0x99664)
#1  0x00007e7123a4fc4e raise (libc.so.6 + 0x40c4e)
#2  0x00007e7123a37902 abort (libc.so.6 + 0x28902)
#3  0x0000000005cda027 _ZN7seastar17on_internal_errorERNS_6loggerESt17basic_string_viewIcSt11char_traitsIcEE (scylla + 0x5ada027)
#4  0x0000000004638fe4 _ZN15mutation_writer43token_group_based_splitting_mutation_writer7consumeEO15partition_start (scylla + 0x4438fe4)
#5  0x0000000004638a4d _ZNO20mutation_fragment_v27consumeIN15mutation_writer43token_group_based_splitting_mutation_writerEQ26MutationFragmentConsumerV2IT_DTcldtclsr3stdE7declvalIS3_EE7consumeclL_ZSt7declvalI22range_tombstone_changeEDTcl9__declvalIS3_ELi0EEEvEEEEEEEDcRS3_ (scylla + 0x4438a4d)
#6  0x000000000463a7a6 _ZN15mutation_writer11feed_writerINS_43token_group_based_splitting_mutation_writerEQ26MutationFragmentConsumerV2IT_N7seastar6futureIvEEEEES5_O15mutation_readerS2_.resume (scylla + 0x443a7a6)
#7  0x000000000143226b _ZN7seastar8internal21coroutine_traits_baseIvE12promise_type15run_and_disposeEv (scylla + 0x123226b)
#8  0x0000000005d18ae0 _ZN7seastar7reactor9run_tasksERNS0_10task_queueE (scylla + 0x5b18ae0)
#9  0x0000000005d1a05b _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x5b1a05b)
#10 0x0000000005d1b238 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b238)
#11 0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#12 0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#13 0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#14 0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4989:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4975:
#0  0x00000000060f62f2 _ZN7seastar8io_queue13queue_requestENS_8internal14priority_classENS1_23io_direction_and_lengthENS1_10io_requestEPNS_9io_intentESt6vectorI5iovecSaIS8_EE (scylla + 0x5ef62f2)
#1  0x00000000060f6945 _ZN7seastar8io_queue14submit_io_readENS_8internal14priority_classEmNS1_10io_requestEPNS_9io_intentESt6vectorI5iovecSaIS7_EE (scylla + 0x5ef6945)
#2  0x0000000005ca7dea _ZN7seastar20posix_file_real_impl8read_dmaEmPvmPNS_9io_intentE (scylla + 0x5aa7dea)
#3  0x0000000005c94e0f _ZN7seastar15posix_file_impl16do_dma_read_bulkEmmNS_8internal24maybe_priority_class_refEPNS_9io_intentE (scylla + 0x5a94e0f)
#4  0x0000000005ca7e69 _ZN7seastar20posix_file_real_impl13dma_read_bulkEmmPNS_9io_intentE (scylla + 0x5aa7e69)
#5  0x0000000001c1eaae _ZN17checked_file_impl13dma_read_bulkEmmPN7seastar9io_intentE (scylla + 0x1a1eaae)
#6  0x00000000045b1f33 _ZN18tracking_file_impl13dma_read_bulkEmmPN7seastar9io_intentE (scylla + 0x43b1f33)
#7  0x0000000005c9a9de _ZN7seastar4file18dma_read_bulk_implEmmNS_8internal24maybe_priority_class_refEPNS_9io_intentE (scylla + 0x5a9a9de)
#8  0x0000000005cb48e4 _ZN7seastar21file_data_source_impl17issue_read_aheadsEj (scylla + 0x5ab48e4)
#9  0x0000000005cb3f5f _ZN7seastar21file_data_source_impl3getEv (scylla + 0x5ab3f5f)
#10 0x00000000022b6f4a _ZN7seastar12input_streamIcE12read_exactlyEm (scylla + 0x20b6f4a)
#11 0x000000000253d682 _ZN32compressed_file_data_source_implI11crc32_utilsLb0EL24compressed_checksum_mode1EE3getEv (scylla + 0x233d682)
#12 0x000000000249cca0 _ZZN7seastar12input_streamIcE7consumeISt17reference_wrapperIN8sstables2mx27data_consume_rows_context_mINS5_17mp_row_consumer_mEEEEQoo19InputStreamConsumerITL0__T_E27ObsoleteInputStreamConsumerISA_SB_EEENS_6futureIvEEOSB_ENUlvE_clEv (scylla + 0x229cca0)
#13 0x000000000249f3fb _ZN7seastar8internal8repeaterIZNS_12input_streamIcE7consumeISt17reference_wrapperIN8sstables2mx27data_consume_rows_context_mINS7_17mp_row_consumer_mEEEEQoo19InputStreamConsumerITL0__T_E27ObsoleteInputStreamConsumerISC_SD_EEENS_6futureIvEEOSD_EUlvE_E15run_and_disposeEv (scylla + 0x229f3fb)
#14 0x0000000005d18ae0 _ZN7seastar7reactor9run_tasksERNS0_10task_queueE (scylla + 0x5b18ae0)
#15 0x0000000005d1a05b _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x5b1a05b)
#16 0x0000000005d1b238 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b238)
#17 0x0000000005d1a5e9 _ZN7seastar7reactor3runEv (scylla + 0x5b1a5e9)
#18 0x0000000005ca9cc4 _ZN7seastar12app_template14run_deprecatedEiPPcOSt8functionIFvvEE (scylla + 0x5aa9cc4)
#19 0x0000000005ca9024 _ZN7seastar12app_template3runEiPPcOSt8functionIFNS_6futureIiEEvEE (scylla + 0x5aa9024)
#20 0x00000000013deb26 _ZL11scylla_mainiPPc (scylla + 0x11deb26)
#21 0x00000000013e04e1 _ZNKSt8functionIFiiPPcEEclEiS1_ (scylla + 0x11e04e1)
#22 0x00000000013dcf44 main (scylla + 0x11dcf44)
#23 0x00007e7123a39088 __libc_start_call_main (libc.so.6 + 0x2a088)
#24 0x00007e7123a3914b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a14b)
#25 0x00000000013da5c5 _start (scylla + 0x11da5c5)
Stack trace of thread 4999:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4976:
#0  0x0000000005d3ee90 _ZNSt17_Function_handlerIFbvEZN7seastar7reactor6do_runEvE3$_6E10_M_managerERSt9_Any_dataRKS5_St18_Manager_operation (scylla + 0x5b3ee90)
#1  0x0000000005d5fba4 _ZN7seastar20noncopyable_functionIFbvEE17direct_vtable_forISt8functionIS1_EE7destroyEPNS_8internal25noncopyable_function_baseE (scylla + 0x5b5fba4)
#2  0x0000000005d1b36c _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b36c)
#3  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#4  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#5  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#6  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4977:
#0  0x0000000005d1b24c _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b24c)
#1  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#2  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#3  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#4  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4978:
#0  0x00000000023f098d _ZNSt8_Rb_treeIN7seastar13lw_shared_ptrIN8sstables7sstableEEES4_St9_IdentityIS4_ENS2_33sstable_first_key_less_comparatorESaIS4_EE8_M_eraseEPSt13_Rb_tree_nodeIS4_E (scylla + 0x21f098d)
#1  0x00000000023eff1e _ZN8sstables23partitioned_sstable_setD0Ev (scylla + 0x21eff1e)
#2  0x00000000023f036d _ZN8sstables20compound_sstable_setD0Ev (scylla + 0x21f036d)
#3  0x00000000023d49e3 _ZN8sstables11sstable_setD1Ev (scylla + 0x21d49e3)
#4  0x0000000001cdd6b9 _ZN7replica18tablet_sstable_setD0Ev (scylla + 0x1add6b9)
#5  0x00000000023d49e3 _ZN8sstables11sstable_setD1Ev (scylla + 0x21d49e3)
#6  0x0000000001c56418 _ZNSt17_Function_handlerIFSt8functionIF33partition_presence_checker_resultRKN3dht13decorated_keyEEEvEZZN7replica5table27sstables_as_snapshot_sourceEvENK3$_0clEvEUlvE_E10_M_managerERSt9_Any_dataRKSE_St18_Manager_operation (scylla + 0x1a56418)
#7  0x00000000020149c0 _ZN9row_cache9do_updateENS_16external_updaterESt8functionIFN7seastar6futureIvEEvEE (scylla + 0x1e149c0)
#8  0x0000000002013a7e _ZN9row_cache10invalidateENS_16external_updaterEOSt6vectorI8intervalIN3dht13ring_positionEESaIS5_EE (scylla + 0x1e13a7e)
#9  0x0000000001bb1941 _ZN7replica16compaction_group49update_main_sstable_list_on_compaction_completionEN8sstables26compaction_completion_descE (scylla + 0x19b1941)
#10 0x0000000001c1acbe _ZN7replica16compaction_group11table_state24on_compaction_completionEN8sstables26compaction_completion_descEN7seastar10bool_classINS2_15offstrategy_tagEEE (scylla + 0x1a1acbe)
#11 0x00000000026e2f6f _ZN10compaction30split_compaction_task_executor18do_rewrite_sstableEN7seastar13lw_shared_ptrIN8sstables7sstableEEE (scylla + 0x24e2f6f)
#12 0x00000000026e2672 _ZN10compaction30split_compaction_task_executor15rewrite_sstableEN7seastar13lw_shared_ptrIN8sstables7sstableEEE (scylla + 0x24e2672)
#13 0x0000000002711716 _ZN10compaction41rewrite_sstables_compaction_task_executor6do_runEv.resume (scylla + 0x2511716)
#14 0x0000000002690a3b _ZN7seastar8internal21coroutine_traits_baseISt8optionalIN8sstables16compaction_statsEEE12promise_type15run_and_disposeEv (scylla + 0x2490a3b)
#15 0x0000000005d18ae0 _ZN7seastar7reactor9run_tasksERNS0_10task_queueE (scylla + 0x5b18ae0)
#16 0x0000000005d1a05b _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x5b1a05b)
#17 0x0000000005d1b238 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b238)
#18 0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#19 0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#20 0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#21 0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4981:
#0  0x0000000005d64b06 _ZN7seastar19aio_storage_context11submit_workEv (scylla + 0x5b64b06)
#1  0x0000000005d65f10 _ZN7seastar19reactor_backend_aio18kernel_submit_workEv (scylla + 0x5b65f10)
#2  0x0000000005d3eda9 _ZNSt17_Function_handlerIFbvEZN7seastar7reactor6do_runEvE3$_5E9_M_invokeERKSt9_Any_data (scylla + 0x5b3eda9)
#3  0x0000000005d1b275 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b275)
#4  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#5  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#6  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#7  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4982:
#0  0x0000000005d1d410 _ZN7seastar17smp_message_queue19process_completionsEj (scylla + 0x5b1d410)
#1  0x0000000005d28055 _ZN7seastar3smp11poll_queuesEv (scylla + 0x5b28055)
#2  0x0000000005d4efcb _ZN7seastar7reactor10smp_pollfn4pollEv (scylla + 0x5b4efcb)
#3  0x0000000005d3eda9 _ZNSt17_Function_handlerIFbvEZN7seastar7reactor6do_runEvE3$_5E9_M_invokeERKSt9_Any_data (scylla + 0x5b3eda9)
#4  0x0000000005d1b275 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b275)
#5  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#6  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#7  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#8  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4983:
#0  0x00000000060fc235 _ZN7seastar10fair_queue17dispatch_requestsESt8functionIFvRNS_16fair_queue_entryEEE (scylla + 0x5efc235)
#1  0x00000000060f6a54 _ZN7seastar8io_queue13poll_io_queueEv (scylla + 0x5ef6a54)
#2  0x0000000005d4f119 _ZN7seastar7reactor26io_queue_submission_pollfn4pollEv (scylla + 0x5b4f119)
#3  0x0000000005d3eda9 _ZNSt17_Function_handlerIFbvEZN7seastar7reactor6do_runEvE3$_5E9_M_invokeERKSt9_Any_data (scylla + 0x5b3eda9)
#4  0x0000000005d1b275 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b275)
#5  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#6  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#7  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#8  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4986:
#0  0x0000000005d1b251 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b251)
#1  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#2  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#3  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#4  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4987:
#0  0x0000000005d1d22a _ZN7seastar17smp_message_queue20flush_response_batchEv (scylla + 0x5b1d22a)
#1  0x0000000005d2809e _ZN7seastar3smp11poll_queuesEv (scylla + 0x5b2809e)
#2  0x0000000005d4efcb _ZN7seastar7reactor10smp_pollfn4pollEv (scylla + 0x5b4efcb)
#3  0x0000000005d3eda9 _ZNSt17_Function_handlerIFbvEZN7seastar7reactor6do_runEvE3$_5E9_M_invokeERKSt9_Any_data (scylla + 0x5b3eda9)
#4  0x0000000005d1b275 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b275)
#5  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#6  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#7  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#8  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4985:
#0  0x0000000005d65f18 _ZN7seastar19reactor_backend_aio18kernel_submit_workEv (scylla + 0x5b65f18)
#1  0x0000000005d3eda9 _ZNSt17_Function_handlerIFbvEZN7seastar7reactor6do_runEvE3$_5E9_M_invokeERKSt9_Any_data (scylla + 0x5b3eda9)
#2  0x0000000005d1b275 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b275)
#3  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#4  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#5  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#6  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4988:
#0  0x00007ffdc1af5b03 n/a (linux-vdso.so.1 + 0xb03)
#1  0x00007e7123af2afd clock_gettime@@GLIBC_2.17 (libc.so.6 + 0xe3afd)
#2  0x00007e7123cdac02 _ZNSt6chrono3_V212system_clock3nowEv (libstdc++.so.6 + 0xdac02)
#3  0x0000000005d1b259 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b259)
#4  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#5  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#6  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#7  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4990:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4994:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 5002:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4993:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 5000:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4995:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4996:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4991:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 5001:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4984:
#0  0x0000000005d65e10 _ZN7seastar19reactor_backend_aio23reap_kernel_completionsEv (scylla + 0x5b65e10)
#1  0x0000000005d3eda9 _ZNSt17_Function_handlerIFbvEZN7seastar7reactor6do_runEvE3$_5E9_M_invokeERKSt9_Any_data (scylla + 0x5b3eda9)
#2  0x0000000005d1b275 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b275)
#3  0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#4  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#5  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#6  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4997:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4980:
#0  0x0000000002485dda _ZN8sstables2mx17mp_row_consumer_m17consume_row_startERKSt6vectorI23basic_fragmented_bufferIN7seastar16temporary_bufferIcEEESaIS7_EE (scylla + 0x2285dda)
#1  0x00000000024b858f _ZN8sstables2mx27data_consume_rows_context_mINS0_17mp_row_consumer_mEE16do_process_stateEv.resume (scylla + 0x22b858f)
#2  0x000000000249d5db _ZN13data_consumer24continuous_data_consumerIN8sstables2mx27data_consume_rows_context_mINS2_17mp_row_consumer_mEEEEclEN7seastar16temporary_bufferIcEE (scylla + 0x229d5db)
#3  0x000000000249cadf _ZZN7seastar12input_streamIcE7consumeISt17reference_wrapperIN8sstables2mx27data_consume_rows_context_mINS5_17mp_row_consumer_mEEEEQoo19InputStreamConsumerITL0__T_E27ObsoleteInputStreamConsumerISA_SB_EEENS_6futureIvEEOSB_ENUlvE_clEv (scylla + 0x229cadf)
#4  0x000000000249b128 _ZN7seastar8do_untilIZZN8sstables2mx26mx_sstable_mutation_reader11fill_bufferEvENKUlvE0_clEvEUlvE0_ZZNS3_11fill_bufferEvENKS4_clEvEUlvE_Qaasr3stdE16is_invocable_r_vIbT0_Esr3stdE16is_invocable_r_vINS_6futureIvEET_EEES9_S7_SA_ (scylla + 0x229b128)
#5  0x00000000024778be _ZN8sstables2mx26mx_sstable_mutation_reader11fill_bufferEv (scylla + 0x22778be)
#6  0x000000000247971d _ZThn80_N8sstables2mx26mx_sstable_mutation_reader11fill_bufferEv (scylla + 0x227971d)
#7  0x00000000021aadca _ZN12_GLOBAL__N_117compacting_reader11fill_bufferEv (scylla + 0x1faadca)
#8  0x000000000463a567 _ZN15mutation_writer11feed_writerINS_43token_group_based_splitting_mutation_writerEQ26MutationFragmentConsumerV2IT_N7seastar6futureIvEEEEES5_O15mutation_readerS2_.resume (scylla + 0x443a567)
#9  0x000000000143226b _ZN7seastar8internal21coroutine_traits_baseIvE12promise_type15run_and_disposeEv (scylla + 0x123226b)
#10 0x0000000005d18ae0 _ZN7seastar7reactor9run_tasksERNS0_10task_queueE (scylla + 0x5b18ae0)
#11 0x0000000005d1a05b _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x5b1a05b)
#12 0x0000000005d1b238 _ZN7seastar7reactor6do_runEv (scylla + 0x5b1b238)
#13 0x0000000005d3fd71 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b3fd71)
#14 0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#15 0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#16 0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4998:
#0  0x00007e7123b1ce4a read (libc.so.6 + 0x10de4a)
#1  0x0000000005d63cd5 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63cd5)
#2  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#3  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#4  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#5  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
Stack trace of thread 4992:
#0  0x00007e7123b2006b ftruncate (libc.so.6 + 0x11106b)
#1  0x0000000005c9c050 _ZN7seastar20noncopyable_functionIFNS_14syscall_resultIiEEvEE17direct_vtable_forIZNS_15posix_file_impl8truncateEmE3$_1E4callEPKS4_ (scylla + 0x5a9c050)
#2  0x0000000005ca565e _ZN7seastar18syscall_work_queue19work_item_returningINS_14syscall_resultIiEEE7processEv (scylla + 0x5aa565e)
#3  0x0000000005d63e1a _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x5b63e1a)
#4  0x0000000005d63fe3 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1ERNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x5b63fe3)
#5  0x0000000005cdacbb _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x5adacbb)
#6  0x00007e7123aa66d7 start_thread (libc.so.6 + 0x976d7)
#7  0x00007e7123b2a60c __clone3 (libc.so.6 + 0x11b60c)
ELF object binary architecture: AMD x86-64
Info about modules can be found in SCT logs by search for 'Coredump Modules info'
download_instructions:
gsutil cp gs://upload.scylladb.com/core.scylla.106.190b79ecb166414ebb252274f566e9b5.4975.1733130336000000./core.scylla.106.190b79ecb166414ebb252274f566e9b5.4975.1733130336000000.zst .
unzstd core.scylla.106.190b79ecb166414ebb252274f566e9b5.4975.1733130336000000.zst

and a commit log error:

2024-12-02 09:13:59.233: (CommitLogCheckErrorEvent Severity.ERROR) period_type=one-time event_id=d2d1ae0a-1c91-4963-ab13-af751ffbb73e: message=commit log directory exceed the limit longer that expected. Prometheus response: [{'metric': {'__name__': 'scylla_commitlog_disk_total_bytes', 'cluster': 'my-cluster', 'dc': 'us-east-1', 'instance': '10.12.7.229', 'job': 'scylla', 'shard': '0'}, 'values': [[1733130626.532, '8522825728'], [1733130627.532, '8522825728'], [1733130628.532, '8522825728'], [1733130629.532, '8522825728'], [1733130630.532, '8522825728'], [1733130631.532, '8522825728'], [1733130632.532, '8522825728'], [1733130633.532, '8522825728'], [1733130634.532, '8522825728'], [1733130635.532, '8522825728'], [1733130636.532, '8522825728'], [1733130637.532, '8522825728'], [1733130638.532, '8522825728'], [1733130639.532, '8522825728'], [1733130640.532, '8522825728'], [1733130641.532, '8522825728'], [1733130642.532, '8522825728'], [1733130643.532, '8522825728'], [1733130644.532, '8522825728'], [1733130645.532, '8522825728'], [1733130646.532, '8522825728'], [1733130647.532, '8522825728'], [1733130648.532, '8522825728'], [1733130649.532, '8522825728'], [1733130650.532, '8522825728'], [1733130651.532, '8522825728'], [1733130652.532, '8522825728'], [1733130653.532, '8522825728'], [1733130654.532, '8522825728'], [1733130655.532, '8522825728'], [1733130656.532, '8522825728'], [1733130657.532, '8522825728'], [1733130658.532, '8522825728'], [1733130659.532, '8522825728'], [1733130660.532, '8522825728'], [1733130661.532, '8522825728'], [1733130662.532, '8522825728'], [1733130663.532, '8522825728'], [1733130664.532, '8522825728'], [1733130665.532, '8522825728'], [1733130666.532, '8522825728'], [1733130667.532, '8522825728'], [1733130668.532, '8522825728'], [1733130669.532, '8522825728'], [1733130670.532, '8522825728'], [1733130671.532, '8522825728'], [1733130672.532, '8522825728'], [1733130673.532, '8522825728'], [1733130674.532, '8522825728'], [1733130675.532, '8522825728'], [1733130676.532, '8522825728'], [1733130677.532, '8522825728'], [1733130678.532, '8522825728'], [1733130679.532, '8522825728'], [1733130680.532, '8522825728'], [1733130681.532, '8522825728'], [1733130682.532, '8522825728'], [1733130683.532, '8522825728'], [1733130684.532, '8522825728'], [1733130685.532, '8522825728'], [1733130686.532, '8522825728'], [1733130687.532, '8522825728'], [1733130688.532, '8522825728'], [1733130689.532, '8522825728'], [1733130690.532, '8522825728'], [1733130691.532, '8522825728'], [1733130692.532, '8522825728'], [1733130693.532, '8522825728'], [1733130694.532, '8522825728'], [1733130695.532, '8522825728'], [1733130696.532, '8522825728'], [1733130697.532, '8522825728'], [1733130698.532, '8522825728'], [1733130699.532, '8522825728'], [1733130700.532, '8522825728'], [1733130701.532, '8522825728'], [1733130702.532, '8522825728'], [1733130703.532, '8522825728'], [1733130704.532, '8522825728'], [1733130705.532, '8522825728'], [1733130706.532, '8522825728'], [1733130707.532, '8522825728'], [1733130708.532, '8522825728'], [1733130709.532, '8522825728'], [1733130710.532, '8522825728'], [1733130711.532, '8522825728'], [1733130712.532, '8522825728'], [1733130713.532, '8522825728'], [1733130714.532, '8522825728'], [1733130715.532, '8522825728'], [1733130716.532, '8522825728'], [1733130717.532, '8522825728'], [1733130718.532, '8522825728'], [1733130719.532, '8522825728'], [1733130720.532, '8522825728'], [1733130721.532, '8522825728'], [1733130722.532, '8522825728'], [1733130723.532, '8522825728'], [1733130724.532, '8522825728'], [1733130725.532, '8522825728'], [1733130726.532, '8522825728'], [1733130727.532, '8522825728'], [1733130728.532, '8522825728'], [1733130729.532, '8522825728'], [1733130730.532, '8522825728'], [1733130731.532, '8522825728'], [1733130732.532, '8522825728'], [1733130733.532, '8522825728'], [1733130734.532, '8522825728'], [1733130735.532, '8522825728'], [1733130736.532, '8522825728'], [1733130737.532, '8522825728'], [1733130738.532, '8522825728'], [1733130739.532, '8522825728'], [1733130740.532, '8522825728'], [1733130741.532, '8522825728'], [1733130742.532, '8522825728'], [1733130743.532, '8522825728'], [1733130744.532, '8522825728'], [1733130745.532, '8522825728'], [1733130746.532, '8522825728'], [1733130747.532, '8522825728'], [1733130748.532, '8522825728'], [1733130749.532, '8522825728'], [1733130750.532, '8522825728'], [1733130751.532, '8522825728'], [1733130752.532, '8522825728'], [1733130753.532, '8522825728'], [1733130754.532, '8522825728'], [1733130755.532, '8522825728'], [1733130756.532, '8522825728'], [1733130757.532, '8522825728'], [1733130758.532, '8522825728'], [1733130759.532, '8522825728'], [1733130760.532, '8522825728'], [1733130761.532, '8522825728'], [1733130762.532, '8522825728'], [1733130763.532, '8522825728'], [1733130764.532, '8522825728'], [1733130765.532, '8522825728'], [1733130766.532, '8522825728'], [1733130767.532, '8522825728'], [1733130768.532, '8522825728'], [1733130769.532, '8522825728'], [1733130770.532, '8522825728'], [1733130771.532, '8522825728'], [1733130772.532, '8522825728'], [1733130773.532, '8522825728'], [1733130774.532, '8522825728'], [1733130775.532, '8522825728'], [1733130776.532, '8522825728'], [1733130777.532, '8522825728'], [1733130778.532, '8522825728'], [1733130779.532, '8522825728'], [1733130780.532, '8522825728'], [1733130781.532, '8522825728'], [1733130782.532, '8522825728'], [1733130783.532, '8522825728'], [1733130784.532, '8522825728'], [1733130785.532, '8522825728'], [1733130786.532, '8522825728'], [1733130787.532, '8522825728'], [1733130788.532, '8522825728'], [1733130789.532, '8522825728'], [1733130790.532, '8522825728'], [1733130791.532, '8522825728'], [1733130792.532, '8522825728'], [1733130793.532, '8522825728'], [1733130794.532, '8522825728'], [1733130795.532, '8522825728'], [1733130796.532, '8522825728'], [1733130797.532, '8522825728'], [1733130798.532, '8522825728'], [1733130799.532, '8522825728'], [1733130800.532, '8522825728'], [1733130801.532, '8522825728'], [1733130802.532, '8522825728'], [1733130803.532, '8522825728'], [1733130804.532, '8522825728'], [1733130805.532, '8522825728'], [1733130806.532, '8522825728'], [1733130807.532, '8522825728'], [1733130808.532, '8522825728'], [1733130809.532, '8522825728'], [1733130810.532, '8522825728'], [1733130811.532, '8522825728'], [1733130812.532, '8522825728'], [1733130813.532, '8522825728'], [1733130814.532, '8522825728'], [1733130815.532, '8522825728'], [1733130816.532, '8522825728'], [1733130817.532, '8522825728'], [1733130818.532, '8522825728'], [1733130819.532, '8522825728'], [1733130820.532, '8522825728'], [1733130821.532, '8522825728'], [1733130822.532, '8522825728'], [1733130823.532, '8522825728'], [1733130824.532, '8522825728'], [1733130825.532, '8522825728'], [1733130826.532, '8522825728'], [1733130827.532, '8522825728'], [1733130828.532, '8522825728'], [1733130829.532, '8522825728'], [1733130830.532, '8522825728'], [1733130831.532, '8522825728'], [1733130832.532, '8522825728'], [1733130833.532, '8522825728'], [1733130834.532, '8522825728'], [1733130835.532, '8522825728'], [1733130836.532, '8522825728'], [1733130837.532, '8522825728'], [1733130838.532, '8522825728'], [1733130839.532, '8522825728'], [1733130840.532, '8522825728'], [1733130841.532, '8522825728'], [1733130842.532, '8522825728'], [1733130843.532, '8522825728'], [1733130844.532, '8522825728'], [1733130845.532, '8522825728'], [1733130846.532, '8522825728'], [1733130847.532, '8522825728'], [1733130848.532, '8522825728'], [1733130849.532, '8522825728'], [1733130850.532, '8522825728'], [1733130851.532, '8522825728'], [1733130852.532, '8522825728'], [1733130853.532, '8522825728'], [1733130854.532, '8522825728'], [1733130855.532, '8522825728'], [1733130856.532, '8522825728'], [1733130857.532, '8522825728'], [1733130858.532, '8522825728'], [1733130859.532, '8522825728'], [1733130860.532, '8522825728'], [1733130861.532, '8522825728'], [1733130862.532, '8522825728'], [1733130863.532, '8522825728'], [1733130864.532, '8522825728'], [1733130865.532, '8522825728'], [1733130866.532, '8522825728'], [1733130867.532, '8522825728'], [1733130868.532, '8522825728'], [1733130869.532, '8522825728'], [1733130870.532, '8522825728'], [1733130871.532, '8522825728'], [1733130872.532, '8522825728'], [1733130873.532, '8522825728'], [1733130874.532, '8522825728'], [1733130875.532, '8522825728'], [1733130876.532, '8522825728'], [1733130877.532, '8522825728'], [1733130878.532, '8522825728'], [1733130879.532, '8522825728'], [1733130880.532, '8522825728'], [1733130881.532, '8522825728'], [1733130882.532, '8522825728'], [1733130883.532, '8522825728'], [1733130884.532, '8522825728'], [1733130885.532, '8522825728'], [1733130886.532, '8522825728'], [1733130887.532, '8522825728'], [1733130888.532, '8522825728'], [1733130889.532, '8522825728'], [1733130890.532, '8522825728'], [1733130891.532, '8522825728'], [1733130892.532, '8522825728'], [1733130893.532, '8522825728']]}]

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Log links for testrun with test id 9aefb63e-6435-4add-8768-86f467343257 |
+-----------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Date | Log type | Link |
+-----------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 20241202_091415 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/9aefb63e-6435-4add-8768-86f467343257/20241202_091415/grafana-screenshot-longevity-10gb-3h-test-yg-scylla-per-server-metrics-nemesis-20241202_091445-tablets-split-merge-tablets--monitor-node-9aefb63e-1.png |
| 20241202_123441 | db-cluster | https://cloudius-jenkins-test.s3.amazonaws.com/9aefb63e-6435-4add-8768-86f467343257/20241202_123441/db-cluster-9aefb63e.tar.gz |
| 20241202_123441 | loader-set | https://cloudius-jenkins-test.s3.amazonaws.com/9aefb63e-6435-4add-8768-86f467343257/20241202_123441/loader-set-9aefb63e.tar.gz |
| 20241202_123441 | monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/9aefb63e-6435-4add-8768-86f467343257/20241202_123441/monitor-set-9aefb63e.tar.gz |
| 20241202_123441 | sct | https://cloudius-jenkins-test.s3.amazonaws.com/9aefb63e-6435-4add-8768-86f467343257/20241202_123441/sct-9aefb63e.log.tar.gz |
| 20241202_123441 | event | https://cloudius-jenkins-test.s3.amazonaws.com/9aefb63e-6435-4add-8768-86f467343257/20241202_123441/sct-runner-events-9aefb63e.tar.gz |
+-----------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

@raphaelsc
Copy link
Member

raphaelsc commented Dec 2, 2024

oken_group_based_splitting_mutation_writer

2024-12-02T08:36:34.257+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
2024-12-02T08:42:33.503+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] load_balancer - Emitting resize decision of type split for table 5ee787b0-b00d-11ef-badc-4abe9bfd9e7c due to avg tablet size of 559228801
2024-12-02T09:04:33.750+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] load_balancer - Revoking resize decision for table 5ee787b0-b00d-11ef-badc-4abe9bfd9e7c due to avg tablet size of 251797252
2024-12-02T09:05:33.438+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] load_balancer - Emitting resize decision of type merge for table 5ee787b0-b00d-11ef-badc-4abe9bfd9e7c due to avg tablet size of 120877815
2024-12-02T09:05:36.723+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets

it's an issue with merge, split is revoked, merge is emitted, split compaction fails. it's not the end of the world (doesn't affect correctness of the system and in practice we won't abort), since split can be later retried but we should fix it.

@yarongilor
Copy link
Contributor Author

yarongilor commented Dec 3, 2024

table - Detected tablet merge for table

@raphaelsc , can you please explain why the first (and only) merge happened only after 4 full cycles? and only after 15 hours? isn't it expected to happen in some correlation with splits?

< t:2024-12-01 17:55:42,773 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-01 17:55:42.772: (InfoEvent Severity.NORMAL) period_type=not-set event_id=97662a34-0084-44c3-9f2b-bca4f2d690f1: message=Cycle 0, Initial tablets number before stress is: 64
< t:2024-12-01 18:39:15,843 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 18:39:39,370 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.229>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 18:40:01,400 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 18:40:32,243 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:40:32.187+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 64 to 128 tablets
< t:2024-12-01 18:40:32,333 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:40:32.286+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 64 to 128 tablets
< t:2024-12-01 18:40:32,433 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:40:32.363+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 64 to 128 tablets
< t:2024-12-01 18:43:32,242 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:43:32.194+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 128 to 256 tablets
< t:2024-12-01 18:43:32,363 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:43:32.286+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 128 to 256 tablets
< t:2024-12-01 18:43:32,515 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:43:32.437+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 128 to 256 tablets
< t:2024-12-01 22:41:26,476 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-01 22:41:26.475: (InfoEvent Severity.NORMAL) period_type=not-set event_id=4094bf86-d94a-4b2e-908d-d88a91da47b3: message=Cycle 1, Initial tablets number before stress is: 256
< t:2024-12-01 23:21:10,334 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.229>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 23:21:28,413 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 23:21:35,865 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 23:22:32,672 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T23:22:32.635+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 256 to 512 tablets
< t:2024-12-01 23:22:32,840 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T23:22:32.785+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 256 to 512 tablets
< t:2024-12-01 23:22:32,963 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T23:22:32.937+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 256 to 512 tablets
< t:2024-12-02 03:36:57,345 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-02 03:36:57.344: (InfoEvent Severity.NORMAL) period_type=not-set event_id=a0792736-1627-40ad-bf0b-518fb2b670a0: message=Cycle 2, Initial tablets number before stress is: 512
< t:2024-12-02 04:14:43,622 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 04:15:00,670 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.229>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 04:15:22,166 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 08:30:43,884 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-02 08:30:43.882: (InfoEvent Severity.NORMAL) period_type=not-set event_id=beaff9a8-c663-44ad-8968-ccb4a0dd24d6: message=Cycle 3, Initial tablets number before stress is: 512
< t:2024-12-02 08:36:34,312 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T08:36:34.257+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
< t:2024-12-02 08:36:34,520 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T08:36:34.438+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
< t:2024-12-02 08:36:34,559 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T08:36:34.536+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
< t:2024-12-02 09:05:36,778 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T09:05:36.723+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets
< t:2024-12-02 09:05:36,838 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 09:05:36,839 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 09:05:36,868 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T09:05:36.786+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets
< t:2024-12-02 09:05:37,031 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T09:05:36.938+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets

@yarongilor yarongilor force-pushed the tablets_split_merge branch 2 times, most recently from d9fb905 to 714b835 Compare December 3, 2024 13:19
@raphaelsc
Copy link
Member

table - Detected tablet merge for table

@raphaelsc , can you please explain why the first (and only) merge happened only after 4 full cycles? and only after 15 hours? isn't it expected to happen in some correlation with splits?

< t:2024-12-01 17:55:42,773 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-01 17:55:42.772: (InfoEvent Severity.NORMAL) period_type=not-set event_id=97662a34-0084-44c3-9f2b-bca4f2d690f1: message=Cycle 0, Initial tablets number before stress is: 64
< t:2024-12-01 18:39:15,843 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 18:39:39,370 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.229>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 18:40:01,400 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 18:40:32,243 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:40:32.187+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 64 to 128 tablets
< t:2024-12-01 18:40:32,333 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:40:32.286+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 64 to 128 tablets
< t:2024-12-01 18:40:32,433 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:40:32.363+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 64 to 128 tablets
< t:2024-12-01 18:43:32,242 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:43:32.194+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 128 to 256 tablets
< t:2024-12-01 18:43:32,363 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:43:32.286+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 128 to 256 tablets
< t:2024-12-01 18:43:32,515 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:43:32.437+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 128 to 256 tablets
< t:2024-12-01 22:41:26,476 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-01 22:41:26.475: (InfoEvent Severity.NORMAL) period_type=not-set event_id=4094bf86-d94a-4b2e-908d-d88a91da47b3: message=Cycle 1, Initial tablets number before stress is: 256
< t:2024-12-01 23:21:10,334 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.229>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 23:21:28,413 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 23:21:35,865 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 23:22:32,672 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T23:22:32.635+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 256 to 512 tablets
< t:2024-12-01 23:22:32,840 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T23:22:32.785+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 256 to 512 tablets
< t:2024-12-01 23:22:32,963 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T23:22:32.937+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 256 to 512 tablets
< t:2024-12-02 03:36:57,345 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-02 03:36:57.344: (InfoEvent Severity.NORMAL) period_type=not-set event_id=a0792736-1627-40ad-bf0b-518fb2b670a0: message=Cycle 2, Initial tablets number before stress is: 512
< t:2024-12-02 04:14:43,622 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 04:15:00,670 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.229>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 04:15:22,166 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 08:30:43,884 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-02 08:30:43.882: (InfoEvent Severity.NORMAL) period_type=not-set event_id=beaff9a8-c663-44ad-8968-ccb4a0dd24d6: message=Cycle 3, Initial tablets number before stress is: 512
< t:2024-12-02 08:36:34,312 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T08:36:34.257+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
< t:2024-12-02 08:36:34,520 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T08:36:34.438+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
< t:2024-12-02 08:36:34,559 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T08:36:34.536+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
< t:2024-12-02 09:05:36,778 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T09:05:36.723+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets
< t:2024-12-02 09:05:36,838 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 09:05:36,839 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 09:05:36,868 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T09:05:36.786+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets
< t:2024-12-02 09:05:37,031 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T09:05:36.938+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets

table - Detected tablet merge for table

@raphaelsc , can you please explain why the first (and only) merge happened only after 4 full cycles? and only after 15 hours? isn't it expected to happen in some correlation with splits?

< t:2024-12-01 17:55:42,773 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-01 17:55:42.772: (InfoEvent Severity.NORMAL) period_type=not-set event_id=97662a34-0084-44c3-9f2b-bca4f2d690f1: message=Cycle 0, Initial tablets number before stress is: 64
< t:2024-12-01 18:39:15,843 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 18:39:39,370 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.229>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 18:40:01,400 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 18:40:32,243 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:40:32.187+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 64 to 128 tablets
< t:2024-12-01 18:40:32,333 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:40:32.286+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 64 to 128 tablets
< t:2024-12-01 18:40:32,433 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:40:32.363+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 64 to 128 tablets
< t:2024-12-01 18:43:32,242 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:43:32.194+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 128 to 256 tablets
< t:2024-12-01 18:43:32,363 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:43:32.286+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 128 to 256 tablets
< t:2024-12-01 18:43:32,515 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T18:43:32.437+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 128 to 256 tablets
< t:2024-12-01 22:41:26,476 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-01 22:41:26.475: (InfoEvent Severity.NORMAL) period_type=not-set event_id=4094bf86-d94a-4b2e-908d-d88a91da47b3: message=Cycle 1, Initial tablets number before stress is: 256
< t:2024-12-01 23:21:10,334 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.229>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 23:21:28,413 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 23:21:35,865 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-01 23:22:32,672 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T23:22:32.635+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 256 to 512 tablets
< t:2024-12-01 23:22:32,840 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T23:22:32.785+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 256 to 512 tablets
< t:2024-12-01 23:22:32,963 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-01T23:22:32.937+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 256 to 512 tablets
< t:2024-12-02 03:36:57,345 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-02 03:36:57.344: (InfoEvent Severity.NORMAL) period_type=not-set event_id=a0792736-1627-40ad-bf0b-518fb2b670a0: message=Cycle 2, Initial tablets number before stress is: 512
< t:2024-12-02 04:14:43,622 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 04:15:00,670 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.229>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 04:15:22,166 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 08:30:43,884 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2024-12-02 08:30:43.882: (InfoEvent Severity.NORMAL) period_type=not-set event_id=beaff9a8-c663-44ad-8968-ccb4a0dd24d6: message=Cycle 3, Initial tablets number before stress is: 512
< t:2024-12-02 08:36:34,312 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T08:36:34.257+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
< t:2024-12-02 08:36:34,520 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T08:36:34.438+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
< t:2024-12-02 08:36:34,559 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T08:36:34.536+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
< t:2024-12-02 09:05:36,778 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T09:05:36.723+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets
< t:2024-12-02 09:05:36,838 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.78>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 09:05:36,839 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.7.17>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-02 09:05:36,868 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T09:05:36.786+00:00 tablets-split-merge-tablets--db-node-9aefb63e-2     !INFO | scylla[4976]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets
< t:2024-12-02 09:05:37,031 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-02T09:05:36.938+00:00 tablets-split-merge-tablets--db-node-9aefb63e-3     !INFO | scylla[4987]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets

@yarongilor

2024-12-01T22:41:32.753+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1 !INFO | scylla[4975]: [shard 0: gms] load_balancer - Table 5ee787b0-b00d-11ef-badc-4abe9bfd9e7c with tablet_count=256 has an average tablet size of 487239504

average size hasn't reached the threshold right before we started cycle 1. that's why I asked you to dump sstable files (and their sizes) before and after major for me, so we can try to deduce why the workload is not triggering merge.

@yarongilor
Copy link
Contributor Author

oken_group_based_splitting_mutation_writer

2024-12-02T08:36:34.257+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet split for table scylla_bench.test, increasing from 512 to 1024 tablets
2024-12-02T08:42:33.503+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] load_balancer - Emitting resize decision of type split for table 5ee787b0-b00d-11ef-badc-4abe9bfd9e7c due to avg tablet size of 559228801
2024-12-02T09:04:33.750+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] load_balancer - Revoking resize decision for table 5ee787b0-b00d-11ef-badc-4abe9bfd9e7c due to avg tablet size of 251797252
2024-12-02T09:05:33.438+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] load_balancer - Emitting resize decision of type merge for table 5ee787b0-b00d-11ef-badc-4abe9bfd9e7c due to avg tablet size of 120877815
2024-12-02T09:05:36.723+00:00 tablets-split-merge-tablets--db-node-9aefb63e-1     !INFO | scylla[4975]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets

it's an issue with merge, split is revoked, merge is emitted, split compaction fails. it's not the end of the world (doesn't affect correctness of the system and in practice we won't abort), since split can be later retried but we should fix it.

@raphaelsc , it looks like the core dump of major happens only following a tablets merge:

< t:2024-12-04 01:25:09,078 f:remote_base.py  l:560  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.76>: Running command "/usr/bin/nodetool  compact scylla_bench test"...
< t:2024-12-04 01:25:09,083 f:remote_base.py  l:560  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.71>: Running command "/usr/bin/nodetool  compact scylla_bench test"...
< t:2024-12-04 01:25:09,084 f:remote_base.py  l:560  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.5.66>: Running command "/usr/bin/nodetool  compact scylla_bench test"...
< t:2024-12-04 01:34:20,238 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.71>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-04 01:34:20,238 f:cluster.py      l:2678 c:sdcm.cluster_aws     p:DEBUG > Node tablets-split-merge-tablets--db-node-87728088-2 [54.173.183.122 | 10.12.4.71]: Command '/usr/bin/nodetool  compact scylla_bench test' duration -> 551.1543888679989 s
< t:2024-12-04 01:35:16,801 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.12.4.76>: Command "/usr/bin/nodetool  compact scylla_bench test" finished with status 0
< t:2024-12-04 01:35:16,801 f:cluster.py      l:2678 c:sdcm.cluster_aws     p:DEBUG > Node tablets-split-merge-tablets--db-node-87728088-1 [18.233.0.248 | 10.12.4.76]: Command '/usr/bin/nodetool  compact scylla_bench test' duration -> 607.7231507879987 s
< t:2024-12-04 01:36:21,142 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-04T01:36:21.127+00:00 tablets-split-merge-tablets--db-node-87728088-1     !INFO | scylla[5132]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets
< t:2024-12-04 01:36:21,351 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-04T01:36:21.269+00:00 tablets-split-merge-tablets--db-node-87728088-3     !INFO | scylla[4995]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets
< t:2024-12-04 01:36:21,464 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-04T01:36:21.320+00:00 tablets-split-merge-tablets--db-node-87728088-2     !INFO | scylla[4986]:  [shard  0: gms] table - Detected tablet merge for table scylla_bench.test, decreasing from 1024 to 512 tablets
< t:2024-12-04 01:43:17,884 f:base.py         l:147  c:RemoteLibSSH2CmdRunner p:ERROR > <10.12.5.66>: Error executing command: "/usr/bin/nodetool  compact scylla_bench test"; Exit status: 2

@pehala
Copy link
Contributor

pehala commented Dec 4, 2024

@yarongilor I am bit of confused about the problems you found. I understand you found two problems:

  1. Merge does not happen -> @raphaelsc needs sstable dump to continue investigating that
  2. coredump -> You are trying to find a reproducer?

@yarongilor yarongilor force-pushed the tablets_split_merge branch 2 times, most recently from 1579da8 to 9c053c3 Compare December 4, 2024 11:11
@yarongilor
Copy link
Contributor Author

@yarongilor I am bit of confused about the problems you found. I understand you found two problems:

  1. Merge does not happen -> @raphaelsc needs sstable dump to continue investigating that
  2. coredump -> You are trying to find a reproducer?

As for 1, yes, it is updated in scylladb/scylladb#21736
As for 2, there are several errors:

  • token_group_based_splitting_mutation_writer error
  • nodetool compact (major compaction) failure
  • A coredump
    all these errors might be related. reported in Test Tablets split and merge #8948 (comment)
    waiting for fix as well, since the test doesn't run properly when major fails.
    (I could see it happens following a tablets merge)

A test of high load, causing rapid tablet splits.
Then have a deletion burst and a major compaction to trigger rapid tablets merges.
@yarongilor
Copy link
Contributor Author

opened scylladb/scylladb#21867 for coredump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tablets backport/none Backport is not required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants