From: Harvey Newstrom (mail@HarveyNewstrom.com)
Date: Wed Sep 19 2001 - 18:06:26 MDT
Robert Coyote wrote,
> But is it still possible to detect steganographic messages if the amount
> of information encoded is very small, compared to the amount of
background?
Remember that standard graphic algorithms, such as GIF or JPEG, use industry
standard mathematical algorithms to compress data and smooth insignificant
variations. It is easy to use these algorithms to predict how color should
have been smoothed and to detect bits that deviate from the expected values.
The whole theory behind color smoothing is that any given bit might be
random noise which should be overridden with calculations based on its
neighbors. The same algorithms can be used to detect these bits that have
unexpected values.
For nonstandard algorithms, it is easy to use statistical analysis to
reverse engineer the smoothing algorithm. The wrong algorithm won't predict
the encoding of most pixels. When the correct algorithm is tried, it
instantly predicts 99.999% of the pixels, leaving the other 0.001% exposed
as hidden data.
Even if no predictable graphical coding algorithm can be deduced,
statistical analysis of an unknown algorithm would still probably identify
the alien bits. A digital picture capable of defining millions of color
shades might only use a few hundred thousand of them. When counted up,
99.999% of the pixels might use the top few hundred thousand popular shades,
while 0.001% of the pixels define subtle off-shades that did not occur
naturally in the original picture. As the ratio of real picture to data
increased, this only makes it easier for statistical analysis to detect
alien shades introduced into the picture.
For a camera lens there is a certain amount of blurring between pixels and
between shades. Any pixel which deviates from its neighbor at a faster rate
than the blurring that occurs in 99.999% of the picture can be identified as
being too specific for the level of blurring.
The picture itself can provide clues as well. A flat area of one color can
easily be scanned for a few rare bits of another shade. Any pixel that
deviates from the narrow range of color values in its local area would be
identified as containing alien data. Likewise, the rate of change from one
shade to another caused by lighting can be mathematically calculated. Any
pixel that deviates at a different rate from its neighbors as do the pixels
surrounding it would be a data pixel.
Pictures are too predictable, both due to their contents, the technology
that records them, and the encoding algorithm by which they are expressed.
In a way, all this interrelated information almost provides a
self-correcting checksum where each pixel's value can be double-checked by
its neighbors. Any pixel that is modified can be detected as falling
outside a range, having too much precision, or being statistically out of
place when compared to other pixels.
-- Harvey Newstrom <http://HarveyNewstrom.com> <http://Newstaff.com>
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 08:10:50 MST