HTML scraping in python
Lloyd Kvam
python at venix.com
Thu Jun 11 08:16:51 EDT 2009
On Thu, 2009-06-11 at 07:21 -0400, Paul Lussier wrote:
> Hi Folks,
>
> I would like to extract a table from an HTML document and break it
> down to a dict for further processing.
I assume you want a dict for each row.
> I've googled around a bit and found about 4 different modules that do
> html processing, but nothing on dealing explicitly with tables
> (something like Perl's HTML::TableExtract module).
>
I have not seen a table extract module. BeautifulSoup is a third party
module that is usually effective in dealing with any HTML. Hopefully
the table is reasonably simple with no colspan/rowspan attributes and
funny data mixed in.
Are the column headers in th tags? Can you use the headers to create
field names? (e.g. fieldname = '_'.join( head.lower().split() )
I've got to run (a funeral), but am happy to help. I'll check my email
when I get back.
> Can someone more knowledgable please point me in the right direction ?
>
> --
> Thanks,
> Paul
>
> P.S. I'm also looking for a job if anyone knows of anything, or needs
> a sysadmin with great perl skills and growing python experience :)
Good Luck!
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
--
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://dlslug.org/library.html
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/rsshtml/recent/dlslug
http://www.librarything.com/rss/recent/dlslug
More information about the gnhlug-discuss
mailing list