How does the search engine fit into all of this?greenspun.com : LUSENET : Publishing Tool : One Thread |
Right now, our search engine is a full-text engine. It goes out to every page in the site (except for ones with URLs that have funky characters like ? in them -- including, for instance, the entire store and most of the /education site) and indexes the words on the page from top to bottom.We know our search engine stinks. It stinks because:
- The tops of most of our pages are the pulldown menu, and that's what the search engine ends up indexing. (Try this altavista search for "lectures at nationalgeographic.com")
- A full-text search engine is extremely poor at categorizing content, especially by things like subject area.
- We're sloppy about keeping our tags current.
- We don't handle our most common searches well. Why doesn't a search for "photography" just go straight to our photography channel?
- Website searchers think they're searching the magazine archive, and we're inexcusably awful at explaining to users what the difference is.
- We're acutally running at least five different search engines, each of which work differently and none of which talk to each other:
- the www site
- store products
- the news site
- the education site (lesson plan index, products)
- the publications index
The ideal search engine for us, in my humble opinion, would:
- be primarily indexed by subject areas and human-generated descriptions, and secondarily on full text
- serve disparate content types in one list, including publications, articles, products, features, sections, photo galleries, outside links, etc. -- and would appropriately label and sort each
- would jump straight to the appropriate destination if there was ony one logical choice (e.g., a search for "NGM")
- would require as little editorial maintenance as possible
This would suggest that tight integration with our asset database would be appropriate for a search engine. Maybe our "search engine" is actually a database-backed application that our developers create, that uses a full-text search engine as a backup (a la Yahoo!, which searches Yahoo's index first, then defaults to Inktomi's full-text engine).
-- Anonymous, January 12, 2000