« OAI-PMH Powerpoint slides: Gerry McKiernan | Main | TEI publication tool: Anastasia »

September 1, 2004

Swish-e open xml indexing

http://swish-e.org/

Swish-e is an open source indexer/search engine. It excels at indexing
(X)HTML files, but indexes plain text and XML files almost as easily.
It comes with C, PHP, and Perl API's, and it runs under (over?) Unix as
well as Window's operating systems.

I am/will be using swish-e as the underlying indexer for searches
against TEI documents. Specifically, I have been marking sets of
literature up in TEI. I then convert the sets into a number of formats
such as plain text, XHTML, PDF, various Palm flavors, etc. I then use
swish-e to index the XHTML because swish-e does makes it easy to pull
out the meta tags of HTML head elements and make them field searchable
as well as the body of the text being free-text searchable. I could
have almost as easily indexed the raw TEI files, then then I have to
deal with transforming the XML before it gets to the browser. ("I know.
There are many ways to do that."). See:

http://infomotions.com/alex2/

I have also been fiddling with Plucene, a Perl port of Lucene, a
Java-based indexer/search engine library:

http://search.cpan.org/dist/Plucene/

Unlike swish-e, Lucene/Plucene are libraries. Swish-e is a
indexer/search engine binary as well as a library.

Posted by hag at September 1, 2004 9:25 AM

Comments

Post a comment

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?