Searching a site - including in PDFs and (ugh) DOCs?
Larry Cook
lcook at sybase.com
Fri Jun 4 09:07:01 EDT 2004
Bill Sconce wrote:
> Does anyone know of a package which can provide a search capability
> for a Web site - including searching in PDF and .DOC files?
Back in April I did a quick internet search for search engines. Lucene
(http://jakarta.apache.org/lucene/docs/index.html) seems to to be the most
advanced and the most active. There is a good set of converters, including
PDF and DOC, that have been contributed. See the jGuru FAQ. ht://Dig
(http://www.htdig.org/) also seems pretty popular. It also uses converters
and it sounds like there are ones available for PDF and DOC. See FAQ
questions 4.8 and 4.9.
Here is the list of what I found:
http://jakarta.apache.org/lucene/docs/index.html
http://www.htdig.org/
http://openfts.sourceforge.net/
http://www.egothor.org/
http://www.twmacinta.com/bddbot/
http://mg4j.dsi.unimi.it/
http://exist-db.org/
http://search.jxta.org/
http://xqengine.sourceforge.net/
http://search.mnogo.ru/
http://www.javaforu.com/start.htm (SearchAssist - no direct link)
http://www.me.lv/jse/
http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html
http://www.aspseek.org/
http://findmaan.sourceforge.net/
http://harvest.sourceforge.net/
http://linksearch.sourceforge.net/
http://www.perlfect.com/freescripts/search/
http://swish-e.org/
http://www.wrensoft.com/zoom/index.html
http://www.nutch.org/docs/en/
http://www.etymon.com/tr.html
And here are some additional info sites:
http://www.searchtools.com/tools/tools-opensource.html
http://www.searchtools.com/
http://www.hesketh.com/publications/finding_the_right_search_engine.html
http://zez.org/article/articleview/83/
http://www.weberdev.com/ViewArticle.php3?ArticleID=245
http://www.zend.com/zend/tut/tutorial-ferrara1.php
Larry
More information about the gnhlug-discuss
mailing list