Spam control (was: BitTorrent and Comcast?)

Bill Mullen moon at lunarhub.com
Wed Sep 29 14:28:01 EDT 2004


On Wed, 2004-09-29 at 12:57, travis at scootz.net wrote:

> There also seems to be the new trend of sending crap emails that have no
> content and random words.. I think those are just sent to verify email
> addresses, but then there's no product or service being sold.

I believe that he aim of such messages is to poison the cache of filters
such as SpamAssassin and POPFile, which use Bayesian techniques that are
based on the relative likelihood of certain words and phrases appearing
in spam messages vs. the chance of them showing up in "ham" messages.

By including a raft of words chosen at random, these mails skew the
results of the classifier, polluting the "likely to be spam" word list
with words that are not at all likely to be found in "real" spam, thus
diminishing the effectiveness of the filter - in theory, anyway.

I'm not at all sure that this sort of thing accomplishes its goal,
though, as many of the words that appear in these messages are in fact
rather obscure (that's what you get when you pick words at random from a
list), and are therefore rather less likely to appear in a legitimate
"ham" mail than they are to show up in one of these "pseudo-spam" ones.

So, it may just be a wash. On the off chance that the technique works, I
delete these "pseudo-spams" outright, rather than using them as fodder
for sa-learn; hopefully, this tactic circumvents their authors' intent.

-- 
Bill Mullen
RLU# 270075




More information about the gnhlug-discuss mailing list