Re: Hey, a sunshine-y morning with no spam

From: Mike Lorrey (mlorrey@datamann.com)
Date: Wed Apr 10 2002 - 19:06:23 MDT


"Robert J. Bradbury" wrote:
>
> On Wed, 10 Apr 2002, Dave Sill wrote:
>
> > Recognizing spam is practically impossible. Any heuristic that's
> > easily implemented is also easily worked-around by spammers once
> > people start using it.
>
> I suspect however that one could develop an adaptive algorithm.
> I've got 1.5 million lines of SPAM and 0.7 million lines of personal
> email stored here (from the last couple of years). I find it
> difficult to believe that there is not an algorithm that could not
> come up with 2-3-4 word phrases indicitive of "SPAM" vs. real
> email. Ok, so "young girls" mutates into "immature women" mutates
> into "nubile youngthangs".

Yes, the same algorithms that are used to spot non-randomness in
encrypted messages can be adapted to this purpose, analysing various
categories of spam (primarly by product sold) for word frequency
patterns as fingerprints depending on how spam messages differ
semantically from normal email text.



This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 09:13:24 MST