Filtech - new filtering software for mailing lists

From: Peter C. McCluskey (pcm@rahul.net)
Date: Tue Jun 29 1999 - 19:05:40 MDT


 It's available at http://www.rahul.net/pcm/filter/index.html.

 A few months ago Robin Hanson mentioned to me that lots of people
were probably saving messages that they thought valuable from mailing
lists, and that this could potentially provide valuable information
about the quality of those posts.

 This project is an attempt to turn that idea into software which
will provide some sort of collaborative filtering for mailing lists
and similar discussion media. (It should be of some use even if
used in a noncollaborative way; I certainly intend to filter my
Extropians list mail through it if I continue to find time to read
the list).

 A big obstacle to collaborative filtering attempts that I've heard of
is that they require an investment of time by the people who could be
the source of the information needed for the filtering (either the time
it takes to enter the information into the system, or the time to switch
to a new mail reader that automatically deduces it from the user's
behavior), but don't provide any reward for that effort.

 I've therefore chosen an approach which should minimize the time required:
all the person who is already saving a "best of X" mail folder should need
to do is to insure that that folder is located on or periodically gets
copyied to a web-visible location. All the remaining work can be done by
those who want the benefits of filtering.

 If a discussion group can get several people making such folders available,
and also has a public archive with the complete set of messages, it is
easy to automate a rating system that provides some information about how
popular authors are, and not too hard to enable readers to filter out some
of the least valuable messages. I've come up a crude algorithm that evaluates
message by a combination of email and subject line evaluations.
 It evaluates email addresses approximately by what percent of messages
I've saved for which that address appeared in the From line. For addresses
that have posted few messages, it is kludged up to be closer to 50 than
that percent would indicate (I don't want to ignore posts from new people
unless I'm so overwhelmed that I'm only reading messages from people who
are reliably interesting).
 I haven't made much use of the Subject line yet, as I'm often slow enough
at saving messages that a thread is often half done before I get around to
providing the info needed to indicate whether it interests me.

 I mentioned to Robin what I was working on, and he said he wouldn't
find it of much use because he always saved messages that were replies
to a message of his, regardless of the message quality. I've looked at
my habits, and I've been doing something moderately close to that, but
there's enough difference in what topics I deal with from that of people
whom I'm least interested in hearing from that it probably doesn't have
a big effect.

-- 
------------------------------------------------------------------------
Peter McCluskey          | Critmail (http://crit.org/critmail.html):
http://www.rahul.net/pcm | Accept nothing less to archive your mailing list


This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:04:20 MST