From: Eugen Leitl (eugen@leitl.org)
Date: Thu Apr 11 2002 - 00:28:39 MDT
On Wed, 10 Apr 2002, Robert J. Bradbury wrote:
> I suspect however that one could develop an adaptive algorithm.
> I've got 1.5 million lines of SPAM and 0.7 million lines of personal
> email stored here (from the last couple of years). I find it
> difficult to believe that there is not an algorithm that could not
> come up with 2-3-4 word phrases indicitive of "SPAM" vs. real email.
Robert, it's been done: http://spamassassin.taint.org/ and works
very well.
The spam-identification tactics used include:
header analysis: spammers use a number of tricks to mask their
identities, fool you into thinking they've sent a valid
mail, or fool you into thinking you must have subscribed at some
stage. SpamAssassin tries to spot these.
text analysis: again, spam mails often have a characteristic style
(to put it politely), and some characteristic disclaimers
and CYA text. SpamAssassin can spot these, too.
blacklists: SpamAssassin supports many useful existing blacklists,
such as mail-abuse.org, ordb.org or others.
Razor: Vipul's Razor is a collaborative spam-tracking database,
which works by taking a signature of spam messages.
Since spam typically operates by sending an identical message to
hundreds of people, Razor short-circuits this by
allowing the first person to receive a spam to add it to the
database -- at which point everyone else will automatically
block it.
Once identified, the mail can then be optionally tagged as spam for
later filtering using the user's own mail user-agent
application.
SpamAssassin requires very little configuration; you do not need to
continually update it with details of your mail accounts,
mailing list memberships, etc. It accomplishes filtering without this
knowledge, as much as possible.
The distribution provides a command line tool to perform filtering,
along with Mail::SpamAssassin, a set of perl modules
which implement a Mail::Audit plugin, allowing SpamAssassin to be used
in a Mail::Audit filter, or (possibly at some point) in
a spam-protection proxy POP/IMAP server.
SpamAssassin lives at http://spamassassin.org/ or in CPAN, and is
distributed under Perl's Artistic license.
Features
Wide-spectrum: SpamAssassin uses a wide variety of local and
network tests to identify spam signatures. This makes it
harder for spammers to identify one aspect which they can craft
their messages to work around.
Free software: it is distributed under the same terms and
conditions as Perl itself.
Easy to extend: Rules, weights and user-visible text are stored in
text configuration files as much as possible, which the
user (or sysadmin) can edit to modify or add new rules.
Flexible: SpamAssassin encapsulates its logic in a well-designed,
abstract API. As a result, it's not limited to the
traditional local-delivery-to-spool case; using the
Mail::SpamAssassin classes, it can be used in a wide variety of setups.
This means that SpamAssassin support is available for a variety of
mail systems -- traditional procmail, a Mail::Audit
plugin, qmail, MIMEDefang, Postfix, and many others.
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 09:13:24 MST