From: Ross A. Finlayson (raf@tiki-lounge.com)
Date: Mon Dec 25 2000 - 11:51:39 MST
Eugene.Leitl@lrz.uni-muenchen.de wrote:
> "Ross A. Finlayson" wrote:
>
> > Because most of the books aren't in readily-available machine-readable form, it
> > would take actual people to collate this information or arrange it put in
> > machine-readable form.
>
> If you own an SF library plus a scanner with OCR, you're practically required
> to scan it in, the rare and out-of-print books first.
I was just thinking of this, also, part of it would be a computer network robot
program that would collect all the relevant newsgroups and search for relevant
webpages. These, luckily (?) are machine-readable.
Scanning books is destructive in its most exact way, where each page or pair of
facing apges is unbound and scanned in a flatbed or drum scanner, or one of the
high-resolution camera ones. If you keep the spine on the book and scan facing pages
there are more artifacts in the image. If it's on regular letter or legal sized
paper then you can put the documents in a sheet-fed scanner and get decent results
most of the time, and 100 kilobytes or much less of computer filesize for one letter
page of black and white document image scan at 300 dots per inch. If you have the
original files, keep a copy.
About OCR software, some of them let you add your own word dictionaries. Another
good feature for OCR is learning OCR that you can retrain on your fonts. In that
case, the OCR for Optical Character Recognition program is driven partly by a neural
or Bayesian network, so you can OCR new fonts and correct its idea of what it is.
About having a science fiction library, there are many many people with fifty or more
SF books that might share in common few or no SF authors. That means there is a high
ratio of readers to authors, to where they are more even.
Merry Christmas, Happy Holidays,
Ross
-- Ross Andrew Finlayson Finlayson Consulting Ross at Tiki-Lounge: http://www.tiki-lounge.com/~raf/
This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:32:33 MST