Spam blocking (was Re: MySQL query [sic])

Jason Stephenson jason at sigio.com
Thu Nov 14 10:11:01 EST 2002


According to an article that I just read at Yahoo!:

http://story.news.yahoo.com/news?tmpl=story&ncid=77&e=1&cid=77&u=/mc/20021114/tc_mc/spam_wars__spammer_says_it_is_almost_impossible_to_stop

Some of the spam tools claim a high success rate (>90%) and really low 
false positive rate (<1%).

To answere your question, I think you have to decide what your 
priorities are. If you are personally getting so much spam that it is 
overwhelming your ability to deal with it, then getting rid of all the 
spam may be more of a priority than getting a minimum of false positives.

When I worked on the KMail project, and used KMail as my MUA, I had a 
set of filters set up that would catch about 80% of the spam that came 
in on my system and filter it into a spam folder. I never had a false 
positive with those filters. I might still have that stuff on an old 
backup somewhere if someone is interested.

I toyed with the idea of using Markov chains (see JWZ's dadadodo) in an 
implementation of a spam filter. I was thinking of adapting the Markov 
chain algorithm as a machine learning technique. AFAIK, nobody has 
suggested using Markov chains for much other than fun.

I never got past the implementation of a simple Markov chain and a 
little proggie that would take "random" text from a mail file full of 
spam and rearrange it into the world's ultra-spam.--Hmm, that might make 
a fun addition to my web site. I'd better dig the code out of the 
archives, along with the file of spam.

bscott at ntisys.com wrote:
> On Wed, 13 Nov 2002, at 9:27am, roger at bcah.com wrote:
> 
>>The biggest problem with a rule-based filter like SpamBouncer, of course,
>>is the large number of false positives.
> 
> 
>   That is my big concern.  The biggest thing keeping me from implementing 
> serious anti-spam on my inbox at work is the risk of false positives.  I get 
> email from customers, often from weird addresses, and I cannot risk ignoring 
> a potentially important email just because a customer can't spell.  :)
> 
>   Does anyone here have opinions on anti-spam solutions, when it is more 
> important to minimize false positives than to catch all spam?
> 




More information about the gnhlug-discuss mailing list