HTML scraping in python
Lloyd Kvam
python at venix.com
Thu Jun 11 12:04:18 EDT 2009
On Thu, 2009-06-11 at 08:59 -0400, Paul Lussier wrote:
> However, mechanize seems dependant upon ClientForm, and I can't figure
> out how to get the ClientForm*.egg installed. I placed it in
> sys.path, but it's not getting picked up, I tried to manually test
> that it would work using pkg_resources and require(), but got this:
>
> $ python
> Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39)
> [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
> Type "help", "copyright", "credits" or "license" for more
> information.
> >>> import pkg_resources
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ImportError: No module named pkg_resources
>
> When I look at sys.path, it seems as if it knows about mechanize, but
> not ClientForm, despite having copied ClientForm there:
The egg stuff is unlikely to work when manually copied around.
My configuration is:
> cat /usr/lib/python2.5/distutils/distutils.cfg
> [easy_install]
>
> zip_ok = False
The apache user has trouble dealing with zipped egg installations, so I
make sure that eggs are always unzipped.
easy_install mechanize
should simply do the right thing. If it does not, you're
probably better off doing a distutils install:
python setup.py install
from your privileged account after downloading the package.
I have a package from 3 years ago called mechanoid that I downloaded,
but deleted from the Python library. (I do not remember why I was
unhappy with it.) Looking at that package:
from mechanoid.clientform import ClientForm
was the proper import statement. Presumably you'd change mechanoid to
mechanize.
The Dive Into Python on-line book provides good examples using urllib2.
I've been using urllib2 with minor changes for all of my http
automation.
--
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://dlslug.org/library.html
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/rsshtml/recent/dlslug
http://www.librarything.com/rss/recent/dlslug
More information about the gnhlug-discuss
mailing list