Question about spamassassin using MySQL

Jason Stephenson jason at sigio.com
Mon Apr 25 23:57:01 EDT 2005


Benjamin Scott wrote:
> On Apr 25 at 3:13pm, Bruce Dawson wrote:
> 
>> Steven: Thanks for the clarification. I was under the impression that 
>> the milter is called only after the message had been received.
> 
> 
>     Obviously, in order to do content analysis or other magic on a 
> message, you have to receive the content.  As I understand it, what 
> these tools do is allow the SMTP "DATA" verb to be sent, and to receive 
> some or all of the data from the sender.  Then, before the SMTP result 
> code 250 ("Message accepted for delivery") code is sent, the filter runs 
> and makes a decision.  If the message fails, an SMTP error status code 
> is sent instead.

Yes, that is pretty much how spamass-milter and exim with exiscan-acls 
works.

> 
>     This is fine as long as your mail volume is reasonably low.  As mail 
> volumes increase, however, it becomes impractical to do this all in 
> "real time" on your MX.

We had serious problems at my day job using spamass-milter. Dunno if the 
problem was with our version of sendmail being buggy or what. (There are 
known stability problems with spamass-milter and certain versions of 
sendmail.) Sendmail would lock up, spamassassin would die, and 
occasionally the swap manager would start thrashing. Sometimes a 
shutdown -r was the quickest way to fix the mess. This tended to happen 
when either we were at our busiest part of the day (between 9:00 and 
10:00 a.m.) or when processing a message with a large attachment, large 
being a variable value depending on circumstances.

Switching to exim and adding RAM to the system really helped. The 
computer now has 1 GB of RAM (instead of 512 MB), and processes 500MB+ 
of mail per week, over 30,000 messages. I'd say that's a medium-sized 
installation.

(I should add that I switched to exim only after adding RAM and still 
having problems with spamass-milter. I'm not saying that it will do this 
in all installations, but it certainly did so in ours.)

So, if anyone is interested in exim ACLs to use spamassassin, as well as 
clamav, drop me a line. I've got a set up that works well for 600+ users.

Before I send this message, I'm going to pop back in here and add 
something that I think is appropriate.

I've used spamassassin and exim in 3 environments, now. With several 
different set-ups. What I've found is that spamassassin really shines 
when each user on the system has their own bayes db, preferences, and 
auto-whitelist. This is almost always the case in the procmail 
environment. Spamassassin was, I believe, designed to be used this way, 
and it is very, very accurate when processing a single user's mail.

When spamassassin is used on a system level, as is more often the case 
when it is run from an ACL or milter, it maintains 1 spam db, 1 set of 
preferences, and 1 AWL for all the users on the system. This is because 
spamd normally runs as some user (nobody in my case) on the mail server, 
and the MTA communicates with spamassassin via spamd.

In the case of my current day job, I have to run it this way, because 
the customers of my MTA (600+ librarians, a few of whom hate computers 
as an distraction from their "real job") don't have the ability to 
maintain their own bayes db, preferences and white list.--If it's not as 
simple as point and click, they won't do it, and why should they? It 
takes time to process all that spam every day.

Don't forget, too, that you should run the false positives back through 
the system so that spamassassin can adjust itself to be more accurate.

> 
>     That's why I have mixed feelings about sites that do unilateral 
> blocking based on blacklists.  Many of these systems find 75% of their 
> mail volume is bogus (spam, worms, phishing, and backscatter).  They get 
> faced with the proposition of lowering the load on their systems by 90% 
> at the cost of 5% of their legitimate mail.  If you're an ISP trying to 
> get by on paper thin margins, that might be considered "acceptable 
> losses".  Of course, that's cold comfort to those who (like me) *are* 
> the acceptable losses.  :(
> 
>     Spam sucks.
> 

Yep, it does.

With the default spamassassin set up, the RBLs only add to the score of 
a message, so it's not as bad as rejecting outright. I know many sites 
also configure to reject any connections from IPs on certain RBLS.

I don't use any third-party RBLs for out right blocking, but allow 
spamassassin to adjust a message's score. I do, however, maintain my own 
list of "known bad actor" IPs that I refuse connections from. Keeping 
track of this list is a bit tedious, but a couple of shell scripts help. 
Generally, you have to send us spam that makes it through the filter to 
end up on this list.

I know it isn't perfect because of collateral damage, and blocking by IP 
is practically pointless since IPs in dynamic blocks do actually change. 
I've considered removing the list for a week to see how it affects the 
system, and possibly removing it completely if the amount of spam 
getting through to my end users unfiltered doesn't appreciably change.

At home (sigio.com), I don't bother keeping a list.

Additionally, I highly recommend disconnecting the SMTP connection of 
any computer that uses your mail server's IP or host name in its 
HELO/EHLO that isn't on your allowed relay list. That alone cuts off a 
number of the spambots before they even get to say MAIL FROM.

It's also a good idea to enable whatever option makes your MTA somewhat 
pedantic about the SMTP protocol. This cuts off anyone who starts 
spewing data before your server can respond to their HELO. So far, the 
only "valid" MTA I've seen that consistently does this is moveon.org's 
web mail or mailing list sender. Other than that, it's been all spammers 
dropped by being pedantic.--I've tried mailing moveon.org to tell them 
that they need to fix their MTA, but they haven't responded. Just like a 
couple sites that use exchange and announce their host name with an _ in it.

Battling spam can be time consuming, but I feel like I've made some 
progress. So far, I've not liked many of the proposals for alternates to 
SMTP, or for the extensions like SPF or Domain Keys that I've read 
about. They all have draw backs and SPF and Domain Keys could very 
easily be used by spammers to "legitimize" their spam and then you're 
right back to having to block by IP address.

I've also been following the IM2000 mailing list discussions and I'm not 
sure that anything that I've heard about on there is the FUSSP. Not that 
I have one, myself. I thought I did, but I must have lost it. ;)

So anyway, just some pseudo-random thoughts on fighting spam from 
someone who has a bit of experience with spamassassin.

Cheers,
Jason

P.S.

That's it! I'm sending this message. The more I edit it, the more I 
think of to say. Well, I need to get to bed. I have work in the morning.

J.



More information about the gnhlug-discuss mailing list