Microsoft flooding sites with fake traffic

Ben Scott dragonhawk at gmail.com
Thu Feb 21 09:41:58 EST 2008


On Thu, Feb 21, 2008 at 9:18 AM, Arc Riley <arcriley at gmail.com> wrote:
>>   What's your robots.txt look like?  Does it forbid this kind of behavior?
>
>  The IPs in question never accessed robots.txt.
[...]
>  Many of the hits were also to pages specifically forbidden to * User-agent,
> such as Disallow: /*?

  Interesting.

liberty$ find -name access_log\* | xargs egrep -h
'^(131\.107|65.5[2-5])' | fgrep robots.txt  | wc -l
1453

  It appears your server is seeing different behavior than GNHLUG's
server.  I suppose that could be a malfunction on Microsoft's end, but
I can't think of why a cluster of crawlers would malfunction for just
some sites.  I suppose it could be intentional differentiation, but
what would be the point of that?

  Have you tried contact the help desk for Microsoft's crawler?

> It changes per day, but the day before yesterday 28.3%  of the pagehits
> were from those subnets.

  Yikes!

>  If this was an honesty checker, which Google does, verifying that the pages
> being sent to the crawlers is the same as being sent to normal web browsers,
> they wouldn't claim to be arriving via search.live.com.

  Why not?  I can think of a few scenarios where that might be legit.
Following links of saved searches, or some kind of follow-on driven by
search results of users.  The high request rate, and the apparent
ignorance of robots.txt, on the other hand...

> That could be true, if the reported search terms had anything to do with the
> content on the sites.  I could not find a single instance of any of the
> search terms on our site in the earlier searches, much of which were
> pornographic or sexual in nature.

  Are you sure you don't have a wiki or tag cloud or comment board or
file share or similar application that's been hijacked?  Scam artists
like to use such to host their content, or crank up their page rank,
or spam others.  I know they keep trying to hit GNHLUG (we watch it
fairly closely and remove any such attempts).  I know PySIG had to
shut down their wiki it got nailed so often.

-- Ben


More information about the gnhlug-discuss mailing list