Document Rspamd Bayesian filter bulk training #152

DavidePrincipi · 2024-12-10T09:33:24Z

This PR documents the manual procedure to train the Bayesian filters from mail messages already stored in the disk.

It also adds a symbolic link to rspamc-wrapper command, which is currently used by the global Sieve filter to trigger the IMAP-driven learn procedure. The symbolic link is just convenience for the manual procedure.

Refs

Import Bayesian training section from NS7 ns8-docs#135

stephdl

we face two limitations

the path must be added, not convenient but we could live with it
how to train with external spam mails not inside the container

stephdl · 2024-12-10T11:02:04Z

README.md

+    runagent -m mail1 podman exec -i dovecot rspamc-wrapper --help
+


runagent -m mail1 podman exec -i dovecot /usr/local/lib/dovecot/sieve-pipe/rspamc-wrapper --help

the path is not found

[mail1@R1 toto]$ runagent -m mail1 podman exec -i dovecot rspamc-wrapper --help Error: crun: executable file `rspamc-wrapper` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found

stephdl · 2024-12-10T11:14:57Z

README.md

+    runagent -m mail1 podman exec -i dovecot rspamc-wrapper learn_spam < spamarchive.mbox
+


I cannot train rspamd like we did with NS7. The most hard part is that the spam are not inthe container

mkdir spam cd spam wget http://untroubled.org/spam/2024-01.7z 7z e 2024-01.7z # we need to push to the container the spam mail tar -cvf files.tar *.txt podman cp files.tar dovecot:/tmp podman exec dovecot tar -xvf /tmp/files.tar -C /tmp/spam # now we can make to learn podman exec -i dovecot /usr/local/lib/dovecot/sieve-pipe/rspamc-wrapper learn_spam /tmp/spam # now we can verify podman exec -i dovecot /usr/local/lib/dovecot/sieve-pipe/rspamc-wrapper stat

Results for command: stat (0.06 seconds) Messages scanned: 0 Messages learned: 475 Connections count: 0 Control connections count: 518 Pools allocated: 571 Pools freed: 536 Bytes allocated: 26470776 Memory chunks allocated: 293 Shared chunks allocated: 3 Chunks freed: 0 Oversized chunks: 333 Fuzzy hashes in storage "rspamd.com": 10610683149 Fuzzy hashes stored: 10610683149 Statfile: BAYES_SPAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 0; users: 0; languages: 0 Statfile: BAYES_HAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 0; users: 0; languages: 0 Total learns: 0

stephdl · 2024-12-10T11:19:03Z

README.md

+To read HAM message from `first.user`'s mailbox, specify its path,
+relative to Dovecot's working directory:
+
+    runagent -m mail1 podman exec -i dovecot rspamc-wrapper learn_ham first.user/Maildir/cur


runagent -m mail1 podman exec -i dovecot /usr/local/lib/dovecot/sieve-pipe/rspamc-wrapper learn_ham first.user/Maildir/cur

same issue I suppose, path and how to train with external mail

Docs. Rspamd Bayesian filter bulk training

bd0fb67

DavidePrincipi requested a review from stephdl December 10, 2024 09:33

DavidePrincipi self-assigned this Dec 10, 2024

stephdl requested changes Dec 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Rspamd Bayesian filter bulk training #152

Document Rspamd Bayesian filter bulk training #152

DavidePrincipi commented Dec 10, 2024

stephdl left a comment •

edited

Loading

stephdl Dec 10, 2024

stephdl Dec 10, 2024

stephdl Dec 10, 2024

		runagent -m mail1 podman exec -i dovecot rspamc-wrapper --help

Document Rspamd Bayesian filter bulk training #152

Are you sure you want to change the base?

Document Rspamd Bayesian filter bulk training #152

Conversation

DavidePrincipi commented Dec 10, 2024

stephdl left a comment • edited Loading

Choose a reason for hiding this comment

stephdl Dec 10, 2024

Choose a reason for hiding this comment

stephdl Dec 10, 2024

Choose a reason for hiding this comment

stephdl Dec 10, 2024

Choose a reason for hiding this comment

stephdl left a comment •

edited

Loading