Question about spamassassin using MySQL
Jason Stephenson
jason at sigio.com
Mon Apr 25 23:57:01 EDT 2005
Benjamin Scott wrote:
> On Apr 25 at 3:13pm, Bruce Dawson wrote:
>
>> Steven: Thanks for the clarification. I was under the impression that
>> the milter is called only after the message had been received.
>
>
> Obviously, in order to do content analysis or other magic on a
> message, you have to receive the content. As I understand it, what
> these tools do is allow the SMTP "DATA" verb to be sent, and to receive
> some or all of the data from the sender. Then, before the SMTP result
> code 250 ("Message accepted for delivery") code is sent, the filter runs
> and makes a decision. If the message fails, an SMTP error status code
> is sent instead.
Yes, that is pretty much how spamass-milter and exim with exiscan-acls
works.
>
> This is fine as long as your mail volume is reasonably low. As mail
> volumes increase, however, it becomes impractical to do this all in
> "real time" on your MX.
We had serious problems at my day job using spamass-milter. Dunno if the
problem was with our version of sendmail being buggy or what. (There are
known stability problems with spamass-milter and certain versions of
sendmail.) Sendmail would lock up, spamassassin would die, and
occasionally the swap manager would start thrashing. Sometimes a
shutdown -r was the quickest way to fix the mess. This tended to happen
when either we were at our busiest part of the day (between 9:00 and
10:00 a.m.) or when processing a message with a large attachment, large
being a variable value depending on circumstances.
Switching to exim and adding RAM to the system really helped. The
computer now has 1 GB of RAM (instead of 512 MB), and processes 500MB+
of mail per week, over 30,000 messages. I'd say that's a medium-sized
installation.
(I should add that I switched to exim only after adding RAM and still
having problems with spamass-milter. I'm not saying that it will do this
in all installations, but it certainly did so in ours.)
So, if anyone is interested in exim ACLs to use spamassassin, as well as
clamav, drop me a line. I've got a set up that works well for 600+ users.
Before I send this message, I'm going to pop back in here and add
something that I think is appropriate.
I've used spamassassin and exim in 3 environments, now. With several
different set-ups. What I've found is that spamassassin really shines
when each user on the system has their own bayes db, preferences, and
auto-whitelist. This is almost always the case in the procmail
environment. Spamassassin was, I believe, designed to be used this way,
and it is very, very accurate when processing a single user's mail.
When spamassassin is used on a system level, as is more often the case
when it is run from an ACL or milter, it maintains 1 spam db, 1 set of
preferences, and 1 AWL for all the users on the system. This is because
spamd normally runs as some user (nobody in my case) on the mail server,
and the MTA communicates with spamassassin via spamd.
In the case of my current day job, I have to run it this way, because
the customers of my MTA (600+ librarians, a few of whom hate computers
as an distraction from their "real job") don't have the ability to
maintain their own bayes db, preferences and white list.--If it's not as
simple as point and click, they won't do it, and why should they? It
takes time to process all that spam every day.
Don't forget, too, that you should run the false positives back through
the system so that spamassassin can adjust itself to be more accurate.
>
> That's why I have mixed feelings about sites that do unilateral
> blocking based on blacklists. Many of these systems find 75% of their
> mail volume is bogus (spam, worms, phishing, and backscatter). They get
> faced with the proposition of lowering the load on their systems by 90%
> at the cost of 5% of their legitimate mail. If you're an ISP trying to
> get by on paper thin margins, that might be considered "acceptable
> losses". Of course, that's cold comfort to those who (like me) *are*
> the acceptable losses. :(
>
> Spam sucks.
>
Yep, it does.
With the default spamassassin set up, the RBLs only add to the score of
a message, so it's not as bad as rejecting outright. I know many sites
also configure to reject any connections from IPs on certain RBLS.
I don't use any third-party RBLs for out right blocking, but allow
spamassassin to adjust a message's score. I do, however, maintain my own
list of "known bad actor" IPs that I refuse connections from. Keeping
track of this list is a bit tedious, but a couple of shell scripts help.
Generally, you have to send us spam that makes it through the filter to
end up on this list.
I know it isn't perfect because of collateral damage, and blocking by IP
is practically pointless since IPs in dynamic blocks do actually change.
I've considered removing the list for a week to see how it affects the
system, and possibly removing it completely if the amount of spam
getting through to my end users unfiltered doesn't appreciably change.
At home (sigio.com), I don't bother keeping a list.
Additionally, I highly recommend disconnecting the SMTP connection of
any computer that uses your mail server's IP or host name in its
HELO/EHLO that isn't on your allowed relay list. That alone cuts off a
number of the spambots before they even get to say MAIL FROM.
It's also a good idea to enable whatever option makes your MTA somewhat
pedantic about the SMTP protocol. This cuts off anyone who starts
spewing data before your server can respond to their HELO. So far, the
only "valid" MTA I've seen that consistently does this is moveon.org's
web mail or mailing list sender. Other than that, it's been all spammers
dropped by being pedantic.--I've tried mailing moveon.org to tell them
that they need to fix their MTA, but they haven't responded. Just like a
couple sites that use exchange and announce their host name with an _ in it.
Battling spam can be time consuming, but I feel like I've made some
progress. So far, I've not liked many of the proposals for alternates to
SMTP, or for the extensions like SPF or Domain Keys that I've read
about. They all have draw backs and SPF and Domain Keys could very
easily be used by spammers to "legitimize" their spam and then you're
right back to having to block by IP address.
I've also been following the IM2000 mailing list discussions and I'm not
sure that anything that I've heard about on there is the FUSSP. Not that
I have one, myself. I thought I did, but I must have lost it. ;)
So anyway, just some pseudo-random thoughts on fighting spam from
someone who has a bit of experience with spamassassin.
Cheers,
Jason
P.S.
That's it! I'm sending this message. The more I edit it, the more I
think of to say. Well, I need to get to bed. I have work in the morning.
J.
More information about the gnhlug-discuss
mailing list