HTML scraping in python

Lloyd Kvam python at venix.com
Thu Jun 11 12:04:18 EDT 2009


On Thu, 2009-06-11 at 08:59 -0400, Paul Lussier wrote:
> However, mechanize seems dependant upon ClientForm, and I can't figure
> out how to get the ClientForm*.egg installed.  I placed it in
> sys.path, but it's not getting picked up, I tried to manually test
> that it would work using pkg_resources and require(), but got this:
> 
>   $ python
>   Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39) 
>   [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
>   Type "help", "copyright", "credits" or "license" for more
> information.
>   >>> import pkg_resources
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in <module>
>   ImportError: No module named pkg_resources
> 
> When I look at sys.path, it seems as if it knows about mechanize, but
> not ClientForm, despite having copied ClientForm there:

The egg stuff is unlikely to work when manually copied around.

My configuration is:
> cat /usr/lib/python2.5/distutils/distutils.cfg
> [easy_install]
> 
> zip_ok = False

The apache user has trouble dealing with zipped egg installations, so I
make sure that eggs are always unzipped.

easy_install mechanize 
        should simply do the right thing.  If it does not, you're
        probably better off doing a distutils install:
                
                python setup.py install
        
        from your privileged account after downloading the package.
        
I have a package from 3 years ago called mechanoid that I downloaded,
but deleted from the Python library.  (I do not remember why I was
unhappy with it.)  Looking at that package:
                
                from mechanoid.clientform import ClientForm

was the proper import statement.  Presumably you'd change mechanoid to
mechanize.
        
The Dive Into Python on-line book provides good examples using urllib2.
I've been using urllib2 with minor changes for all of my http
automation.

-- 
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://dlslug.org/library.html
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/rsshtml/recent/dlslug
http://www.librarything.com/rss/recent/dlslug



More information about the gnhlug-discuss mailing list