Re:hapax legomena-driven retrieval

Andrea Gallagher (drea@alumni.stanford.org)
Thu, 24 Jul 1997 11:29:46 -0700


At 11:02 AM 7/24/97 -0400, Carl Feynman wrote:
>I found document this by recalling that I had once heard of such a case,
>years ago, and the organism's name was raphanobrassica. That made it easy
>to find this document through Alta Vista. Perhaps in the age of search
>engines, the best way to learn information is to assosciate a few bizarre
>buzzwords with any topic, just as I assosciated 'speciation through
>hybridization' with 'raphanobrassica'. Then it will be easy to pull up the
>actual information when you need it.
>
>This might be called the 'hapax legomena strategy'. (A hapax legomena is a
>word or phrase that occurs only once in a given corpus).

Aarrrgh! It's true. I have also been doing this, which explains why I can
only find out information about topics I already know. For those areas I'm
not familiar with, bizarre buzzwords never set in my memory. I don't even
think I read them: "...crossed the radish, Rmpphmmm smmtmvm, with the
cabbage, Brmmphmm olmmlmmm, trying to produce...".

I suppose it's an improvement that you need to be a domain expert to search
effectively, instead of having to be a search system expert. It's
interesting that we're seeing a proliferation of guide services (The Mining
Company, Netguide, Excite & Yahoo, the Subject Clearinghouse), where you
only need to find one site on a topic to learn the basics & the buzzwords.
A nice split in information access methods between experts and novices.

-Drea

("hapax legomena", "hapax legomena", got to practice!!)