Re: Transparency and IP

From: Michael S. Lorrey (retroman@turbont.net)
Date: Wed Sep 13 2000 - 11:51:51 MDT


Samantha Atkins wrote:
>
> David Lubkin wrote:
> >
> > On 9/12/00, at 3:45 PM, hal@finney.org wrote:
> >
> > >Whether or not this was truly a concern of David Brin several years ago,
> > >it has apparently become an issue for the industry as a whole. Still it
> > >seems that novelists have several years breathing room before they have
> > >to worry. It is hardly practical today to unbind, scan and text-convert
> > >books into electronic form.
> >
> > Au contraire. First, from a technical standpoint, it's trivial. Chop
> > the spine off a book. Put the pages on a scanner with a document feeder.
> > Use a batch-mode OCR program. Anyone can do it for a few hundred dollars.
> >
>
> Somehow I doubt you actually tried this. The results are not very good
> at all. You would have to position the pages well (no mean trick as
> they aren't exactly 8-1/2 * 11 usually) flip the stack of pages over to
> scan both sides, have the OCR or post processing paste the results
> together in a continuous narrative, post process a lot more with a
> really good dictionary/grammar program to try to fix the 10% or so
> minimum OCR errors likely fro the process thus far and still have a
> pretty major editing job to make the results really good.
>
> If you know better I would very much like to know how to improve on
> this. I've been wanting my library online for many years now.

Actually, I've done this numerous times on a Xerox Docutech system, it rips
through 'de-spined' books fast, and if you are smart you can 'de-rip' the
resulting file on the Docutech back to the network, then do a PS to HTML
conversion, and voila.

You can also buy scanners that have feeders that are adjustable. I have one here
at Datamann that can scan 21 pages per minute at 600 dpi and produces scans that
can be very easily OCR'd.

The only errors you really get with these processes are with older books of
heavy serif type that is closely kerned, or with faded type. There do tend to be
errors as a result of those features...



This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:30:57 MST