Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Rspamd Bayesian filter bulk training #152

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DavidePrincipi
Copy link
Member

This PR documents the manual procedure to train the Bayesian filters from mail messages already stored in the disk.

It also adds a symbolic link to rspamc-wrapper command, which is currently used by the global Sieve filter to trigger the IMAP-driven learn procedure. The symbolic link is just convenience for the manual procedure.

Refs

@DavidePrincipi DavidePrincipi self-assigned this Dec 10, 2024
Copy link
Contributor

@stephdl stephdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we face two limitations

the path must be added, not convenient but we could live with it
how to train with external spam mails not inside the container

Comment on lines +200 to +201
runagent -m mail1 podman exec -i dovecot rspamc-wrapper --help

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runagent -m mail1 podman exec -i dovecot /usr/local/lib/dovecot/sieve-pipe/rspamc-wrapper --help

the path is not found

[mail1@R1 toto]$ runagent -m mail1 podman exec -i dovecot rspamc-wrapper --help
Error: crun: executable file `rspamc-wrapper` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found

Comment on lines +204 to +205
runagent -m mail1 podman exec -i dovecot rspamc-wrapper learn_spam < spamarchive.mbox

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot train rspamd like we did with NS7. The most hard part is that the spam are not inthe container

mkdir spam
cd spam
wget http://untroubled.org/spam/2024-01.7z
7z e 2024-01.7z
# we need to push to the container the spam mail
tar -cvf files.tar *.txt
podman cp files.tar dovecot:/tmp
podman exec dovecot tar -xvf /tmp/files.tar -C /tmp/spam
# now we can make to learn 
podman exec -i dovecot /usr/local/lib/dovecot/sieve-pipe/rspamc-wrapper learn_spam /tmp/spam
# now we can verify
podman exec -i dovecot /usr/local/lib/dovecot/sieve-pipe/rspamc-wrapper stat
Results for command: stat (0.06 seconds)
Messages scanned: 0
Messages learned: 475
Connections count: 0
Control connections count: 518
Pools allocated: 571
Pools freed: 536
Bytes allocated: 26470776
Memory chunks allocated: 293
Shared chunks allocated: 3
Chunks freed: 0
Oversized chunks: 333
Fuzzy hashes in storage "rspamd.com": 10610683149
Fuzzy hashes stored: 10610683149
Statfile: BAYES_SPAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 0; users: 0; languages: 0
Statfile: BAYES_HAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 0; users: 0; languages: 0
Total learns: 0

To read HAM message from `first.user`'s mailbox, specify its path,
relative to Dovecot's working directory:

runagent -m mail1 podman exec -i dovecot rspamc-wrapper learn_ham first.user/Maildir/cur
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runagent -m mail1 podman exec -i dovecot /usr/local/lib/dovecot/sieve-pipe/rspamc-wrapper learn_ham first.user/Maildir/cur

same issue I suppose, path and how to train with external mail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants