Microsoft flooding sites with fake traffic

Bill McGonigle bill at bfccomputing.com
Fri Feb 22 15:11:00 EST 2008


On Feb 21, 2008, at 10:00, Arc Riley wrote:

> msnbot accesses robots.txt more than any other
> search engine (seconded by Yahoo! Slurp).

I had an e-commerce client DoS'ed by MSNBot during the holiday  
season.  It was downloading 40GB of dynamic pages per day, for a site  
with 4GB of possible data (I crawled it myself to measure).  The site  
as-idle could handle that kind of traffic but during peak shopping it  
was the proverbial straw.

I wound up counting up the total number of possible URI's on the site  
and dividing it into the number of seconds in a month, and gave MSNBot:

   Crawl-delay: 320

in robots.txt to give it one copy per month.  It seems to have worked.

I found a webpage describing this problem that dated from Summer of  
'06.  Raise your hand if you're shocked...

-Bill

-----
Bill McGonigle, Owner           Work: 603.448.4440
BFC Computing, LLC              Home: 603.448.1668
bill at bfccomputing.com           Cell: 603.252.2606
http://www.bfccomputing.com/    Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf



More information about the gnhlug-discuss mailing list