This post is very old and likely contains information that is no longer accurate and links which no longer work. Proceed with caution.
On a mail filter I maintain, there is a site-wide bayes database that is
periodically trained by hand. It sits quietly and doesn’t change much over time.
That is, until the Bayes database was moved to a new server. The SpamAssassin
configuration was identical between the old system and the new system, there was
just one problem: On the new server, the bayes_toks
file was rapidly growing
until it was quite large. Huge, in fact. Its size was expanding by several
gigabytes per hour.
I checked all the usual things: auto learning was off, auto expiration was off, the permissions and user were set correctly, and so on. And yet it grew, constantly and swiftly.
After hours of searching and not finding anything, and various methods of tinkering, I found the answer. I backed up and restored the bayes database like so:
sa-learn --backup > bayes_backup.txt
sa-learn --restore bayes_backup.txt
After that, the toks
file was once again left alone and didn’t grow. I suspect
the problem was due to moving from a 32-bit platform to a 64-bit platform but
that’s just speculation really, or it could be some other difference in the perl
versions and libraries on the two servers.
In case you couldn’t tell, I was trying to use a bunch of different ways to word this problem, going off of the various Google searches I did trying to track it down. Hopefully others will hit this post in the future and it will save them some time. :-)