Is this normal?
Michael Kazin
mkazin at gmail.com
Thu Jun 13 10:47:15 EDT 2013
>
> What is your take on the second and third questions? Again, this is
> just for amusing speculation on how the google bot programmers set up
> their system.
>
Oops. Sorry for ignoring those in my haste to rant about my lawn.
Question: Having gotten the link, what motivated them to follow
it, especially
> to go beyond the index.html page referenced?
As you stated "them" here is GoogleBot, which is an algorithm designed to
locate and index generic information from all over the web. So there's no
actual motivation per se, other than Google's bombastic mission statement
("to organize the world's information and make it universally accessible
and useful"). Going beyond index.html is simply the necessary default,
unless robots.txt instructs otherwise. I don't try very hard, but I
haven't yet caught GoogleBot violating my robots file, though I have seen
other crawlers do so. Detailed information on the algorithm's crawling
mechanism, and how to restrict its access are here:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=182072
Question: Since the site was searched 2 days ago, why don't I get my
> site as a hit when its unique terms are entered into Google search?
Good question. I don't have a good answer. My guess would be that it's
simply not ranking high enough to warrant either a search result (or in
human terms, it's "respecting" your implied desire for the page to not yet
be very public, since you haven't linked to it from anywhere on the site).
That's quite a long time, so I would tend to doubt it's still stuck in the
process of being indexed (the second stage following crawling, where the
page was downloaded), but the web is wide and constantly changing so it's
possible it may have been deemed low priority due to the URL's pedigree.
GoogleBot has a good understanding of page importance- it doesn't touch
each page on my site daily, only the main (root) page; but it will check a
specific page more frequently if it were updated often.
As for the final point about Google being the only one to access this page,
I'd attribute that to Google's scope, scalability, and willingness to spend
time and money (or waste, if you prefer) to really find and index bit of
information possible.
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.gnhlug.org/mailman/private/gnhlug-discuss/attachments/20130613/f40337ca/attachment-0001.html
More information about the gnhlug-discuss
mailing list