Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(syslogng): move all default filters to client side #7041

Closed
wants to merge 4 commits into from

Conversation

fruch
Copy link
Contributor

@fruch fruch commented Dec 31, 2023

since we have more and more filters on the log we apply by default trying to move them into client side, and not send them out at all, and not waste time apply those filter while reading the logs

Testing

PR pre-checks (self review)

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevent to this change (if needed)

@fruch fruch added the test-provision-aws Run provision test on AWS label Dec 31, 2023
@fruch fruch force-pushed the syslogng_client_filter branch 4 times, most recently from b0b71cd to df978f4 Compare January 2, 2024 21:47
@@ -59,38 +56,6 @@ def start_events_device(log_dir: Optional[Union[str, Path]] = None,
time.sleep(EVENTS_SUBSCRIBERS_START_DELAY)

# Default filters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some reference here that we also do filtering in syslog-ng clients

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some of the filter in place, since they are not db log or k8s specific (which works without syslog-ng, at least now)

@fruch fruch force-pushed the syslogng_client_filter branch from df978f4 to e76da56 Compare January 16, 2024 19:18
@fruch fruch added test-provision-gce Run provision test on GCE test-provision-azure labels Jan 16, 2024
@fruch fruch force-pushed the syslogng_client_filter branch 3 times, most recently from 00a329c to 65a2dbf Compare January 17, 2024 14:29
@fruch fruch force-pushed the syslogng_client_filter branch 6 times, most recently from 5ecab51 to 7d482b8 Compare January 23, 2024 07:21
@fruch fruch marked this pull request as ready for review January 23, 2024 13:22
@fruch fruch added backport/none Backport is not required and removed test-provision-aws Run provision test on AWS test-provision-gce Run provision test on GCE test-provision-azure labels Jan 23, 2024
Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how about K8s where we don't use syslog-ng?

sdcm/db_log_reader.py Show resolved Hide resolved
sdcm/sct_events/database.py Show resolved Hide resolved
@soyacz
Copy link
Contributor

soyacz commented Jan 24, 2024

how about K8s where we don't use syslog-ng?

@fruch
Copy link
Contributor Author

fruch commented Jan 24, 2024

how about K8s where we don't use syslog-ng?

there wouldn't be any filters. and we might get some of those issues surfacing

regardless we have issue with logging on k8s, which we might end using syslog-ng or any maybe vector for it.

let run some of the k8s provision test, and we'll see how bad it would be with those filters, and maybe we'll introduce them back only for k8s

@fruch fruch force-pushed the syslogng_client_filter branch from 7d482b8 to 2fc0117 Compare January 24, 2024 14:55
@fruch
Copy link
Contributor Author

fruch commented Jan 24, 2024

the first run with it:
image

cpu usage is good (2 cpus), and still have a few peaks that I'll look at (probably long log lines), and cpu usage is ~1% compared to 25% we were seeing without the this PR

@fruch
Copy link
Contributor Author

fruch commented Jan 24, 2024

doing one more run as we are using in #7128 so it would be comparable

@fruch
Copy link
Contributor Author

fruch commented Jan 25, 2024

@vponomaryov

do you know why the k8s-local-kind-aws doesn't work ?


[2024-01-24T15:17:51.651Z] Unable to find image 'scylladb/scylla-operator:latest' locally

[2024-01-24T15:17:51.651Z] Error response from daemon: unauthorized: authentication required

[2024-01-24T15:17:51.651Z] must specify at least one container source

@vponomaryov
Copy link
Contributor

@vponomaryov

do you know why the k8s-local-kind-aws doesn't work ?


[2024-01-24T15:17:51.651Z] Unable to find image 'scylladb/scylla-operator:latest' locally

[2024-01-24T15:17:51.651Z] Error response from daemon: unauthorized: authentication required

[2024-01-24T15:17:51.651Z] must specify at least one container source

The problem appeared using different image just trying to download it:

17:15:47  < t:2024-01-24 15:15:47,011 f:mini_k8s.py     l:500  c:sdcm.cluster_k8s     p:INFO  > local-dc-1: Start Kind cluster
17:17:28  < t:2024-01-24 15:17:26,591 f:mini_k8s.py     l:223  c:sdcm.cluster_k8s     p:INFO  > local-dc-1: Creating kubectl config
17:17:28  < t:2024-01-24 15:17:27,916 f:mini_k8s.py     l:263  c:sdcm.cluster_k8s     p:INFO  > local-dc-1: Pull `calico/kube-controllers:v3.24.5' to docker environment
17:17:28  < t:2024-01-24 15:17:28,805 f:base.py         l:146  c:LocalCmdRunner       p:ERROR > Error executing command: "docker pull -q calico/kube-controllers:v3.24.5"; Exit status: 1
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 2024-01-24 15:17:28.806: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=a4dfbe6b-f241-4ba2-a85b-b6c45db85844, source=ScyllaOperatorFunctionalClusterTester.SetUp()
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > exception=Encountered a bad command exit code!
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > Command: 'docker pull -q calico/kube-controllers:v3.24.5'
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > Exit code: 1
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > Stdout:
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > Stderr:
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 
17:17:28  < t:2024-01-24 15:17:28,807 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > Error response from daemon: unauthorized: authentication required

It is docker problem.

Then, several days ago was released new docker version on ubuntu - 25.0.0-1.
Which was broken in some place leading to problems with various docker images.

It was around for a day then appeared new, fixed, version - 25.0.1-1.
So, I expect it to be the reason.

So, I re-triggered the CI jobs for this PR assuming that fixed docker version will be used.
And there is no such an error anymore, in the ongoing CI jobs for this PR docker images get downloaded successfully.

sdcm/provision/common/utils.py Show resolved Hide resolved
sdcm/db_log_reader.py Outdated Show resolved Hide resolved
sdcm/provision/common/utils.py Outdated Show resolved Hide resolved
@fruch fruch force-pushed the syslogng_client_filter branch from 2fc0117 to a2f2fe8 Compare January 28, 2024 12:42
fruch added 4 commits January 28, 2024 21:25
since we have more and more filters on the log we apply by default
trying to move them into client side, and not send them out
at all, and not waste time apply those filter while reading the logs
so far those weren't print out in SCT log, now they wouldn't be printed
and would be handled by the event system, to stop wasting our time
on those line we are not importent enough for us.
taking all old events that are generating warning/normal events
and filter on the db nodes, while removing those events.

this would help improve the performence at the test end, and lower
the amount of regexes used on test runner, and lowe amount of warning event
generated
A new severity level, that would help us stop generating some events
we need those events for skipping some log line, so we never generate
error or critical events from those.

until now we generated an event for each one, and now we'll be able to skip them
since for some cases there are too many of those warnings.
@fruch fruch force-pushed the syslogng_client_filter branch from a2f2fe8 to 11139d2 Compare January 28, 2024 19:25
@fruch
Copy link
Contributor Author

fruch commented Jan 28, 2024

seems like this doesn't change much the CPU use, we still need %25 for reading the logs
image

@soyacz
Copy link
Contributor

soyacz commented Jan 29, 2024

seems like this doesn't change much the CPU use, we still need %25 for reading the logs

So, is it worth merging?

@fruch
Copy link
Contributor Author

fruch commented Jun 2, 2024

seems like this doesn't change much the CPU use, we still need %25 for reading the logs

So, is it worth merging?

looks like it's not really helpful, closing this for now

@fruch fruch closed this Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants