ISB imaging module
Dave Orendorff
8/31/04
API DOCUMENTATION
POGO VISION
This document assumes you have read over the user manual and are familiar with the basic purpose
and operation of VISION.
VISION was designed to be intuitive in its logic and
scheme. The goal of this document is not to give an in-depth interview with all
of VISION’s functions but rather a good-enough synopsis of VISION as to enable
the reader to understand or modify VISION.
Outline
2.
Structs
a.
Image/FltImage
b.
Grid
3.
Modules
a.
Image acquisition
b.
Accessing an image
c.
Image processing
d.
Spotfinding
e.
Finding the exact size of a spot
f.
Display
g.
MetaImage
h.
File Saving
4.
Compiling
5.
License
6.
Credits
7. Contact
1. Roadmap–
your guide to fixing and modifying VISION
The goals of VISION are based on
a sequential algorithm. First we capture the image, then we process it, cut it
up, analyze it and maybe save it. Therefore, VISION is modular in nature and
should be easy to modify. We defined 2 new struct types, which I recommend you
look at. We wanted to keep VISION c-style as much as possible. So, there is not
a class hierarchy to consider.
VISION uses a series of function
calls. These function calls are usually not linked to each other except through
the main caller (vision.h/vision.cpp). They exchange primitives or one of the
defined struct types.
I imagine that the quickest way
to get a handle on VISION is to read Modules and Structs. Then, if you require
further information the source files are moderately documented and should be
helpful.
2. Structs
The two defined structs represent an Image and the
Grid(how the spots are arranged). All struct data members are in the
form _<variable name>.
|
Struct: Image
|
|
|
Var: int _width |
Comments: Pixel width of the Image. |
|
Var: int _height |
Comments: Pixel height of the Image. |
|
Var: unsigned char * _data |
Comments: A pointer to a long array of characters (length = width
* height). It is represented like a c-string except is not NULL terminated. |
|
Comments: Image holds a grayscale, rectangular image represented
by one byte per pixel. 0x00 is black and 0xFF is white. There are three data
members: FltImage is also a struct. This is the same as Image
except pixels are floats instead of unsigned chars. |
|
|
Struct: Grid
|
|
|
Var: int _top, _left, _bottom, _right; |
Comments: pixel boundary coordinates |
|
Var: int _rows, _cols; |
Comments: how many rows, columns of spots |
|
Var: int _dx, _dy; |
Comments: distance between adjacent vignettes |
|
Comments: You may notice that there is embedded redundant
information. This is because these variables are referenced many times and
calculating them every time can be costly. So, we just made a pact not to
modify a Grid struct outside of grid.cpp. That is the reason for the babyfunctions. |
|
3. Modules
There are independent tasks to be performed systematically in the library. Each one of the tasks represents a module.
a.
Image acquisition
There are three different
methods to acquire an image. The first is to directly take the image from the
camera. The second and third are to acquire the image from a PNG or PPM file.
CAMERA-
cameraimage.h
The function
getGreyscale8BitCameraImage(…) uses 1394Camera.dll. It also uses a few headers with the string “1394Camera”
embedded in their file names. Right now the settings for the camera are
hardcoded into cameraimage.cpp file and the global variables: g_camWidth, g_camHeight.
If the camera settings ever need
to be changed, the camera DLL we are using is nicely documented. We snipped a page from their manual that contains various
settings for the camera. Changes should be made to the 1 function in
cameraimage.cpp.
FROM FILE-
Image.h
In normal mode VISION never gets
an image from a file. However, the code is for debugging purposes. Two formats
are supported: PPM and PNG.
PPM
PPM
images are really just a long array of pixels—exactly like we store an image.
So, reading a PPM requires reading the header of the file, reading the size and
pixel type, and then putting the rest of the file into a string. The function
in VISION is buggy in that it does not read long files consistently. It does,
however have support for RBG (P6) or grayscale (P5) PPM types.
PNG
PNG
is a lossless compression format for images. It uses the zlib library for
compression. This function is limited to reading black and white images. Though
it can read RGB, it only takes the red channel out of the file.
Unfortunately,
libPNG seems touchy in that it can crash easily. We had to be extra careful not
to do anything too fancy.
b.
Accessing an image
Images are stored in the image
struct. There are some babyfunctions in Image.h. The data of the images is
(usually) accessed by getPixel(x, y) or else by referencing the index of the
pixel directly by something like: image->_data[42].
Sometimes, it is handy to get a
little image cutout from a large one. This is done with subImage. Note that
subImage does only a shallow copy of the image.
c.
Image Processing
processimage.h/processimage.cpp,
affine.cpp
Perform a myriad of transforms on an Image. These
functions do not have to do with spotfinding directly. This deals with Image:
Gamma
correction, flattening, inverting, copying, flipping, negative Laplacian,
affine transforms and bounds checking.
All of these functions are done in
place.
Flipping does exactly n
computations, and uses 0 temporary storage variables. It does this amazing feat
by using an array-switching algorithm found in auxiliary.h that switches
variables by using two XOR operations and no overhead.
NegativeLaplacian
is from DAPPLE. The algorithm is nicely documented in the source code.
Gamma
correction (see source) uses floats to normalize the pixels such that the
corrected image takes up the available bandwidth (a range of 255 values).
The
affine transform is in a separate file for later portability. See the source
code(affine.cpp) for the algorithm.
The
rest of the processing functions are strait forward and are done using intuitive
algorithms.
d.
Spotfinding
findspots.h/findspots.cpp,
fft.h/fft.cpp
This is the core spotfinder of
the program. Jeremy Buhler wrote the majority of this file and the fft files. I
needed to take a lot of his code and put it into a nice little place. So, this
is it. It is not messy, but it does use a few static variables for speed and
simplicity. The spotfinder does use the FFTW3 library. See his paper for how the spotfinder works.
That said, I did change his
algorithm in a few ways:
1.
It does not ‘learn’ what a good spot acts like.
2.
It does not guess multiple times, erasing the erroneous
data on each run, until it finds a good spot.
3.
There is no bias for spots located near the center.
e.
Finding the exact size of a spot
spotarea.h/spotsarea.cpp
The soptfinder finds the
approximate radius of the spot. However, the radius of the spot to the nearest
pixel is not a good enough number to calculate the size of the spot. Therefore,
spotArea works to find the exact size of the spot. There is only one function
to reference. This function uses the approximate location and size found from
the spotfinder.
SpotArea
gets the negative Laplacian of a particular grid area in the image. Since the
definition of a spot is that the outside pixel intensity goes from light to
dark, the negative laplacian should have a negatively colored ring representing
the outside of the spot.
SpotArea then finds the ring and
traces its outline. Consistently the ring has a width of at least two pixels.
So, we prod pixels that are located where the spot should be until will find a
‘pool’ of negative pixels. When the pool is of a critical size, we know we have
found the pool. This pool is found by using the recursive function:
RecursiveFill(…) Algorithm
If this pixel is not filled in
(as defined by the functions parameters), fill it in and call the neighbors to
let them know that they too can be filled.

Return 1 + whatever the
neighbors return.
Filling in the shape
Once we have our shape outlined,
we just need to fill in the pixels that may have been missed by the pool.
This is done in 2 steps,
vertical and horizontal filling.
Horizontal
filling
Two ‘soldiers’ start at the
opposite edges of a row. They start marching towards each other and stop when
they either hit each other or hit a bit of filled in ‘terrain’(pixels). They
then fire at each other. Any terrain not filled in between them is scorched
(filled in).
This operation is done on each
row.
Vertical filling Same as
horizontal but up and down instead of left and right.
A tally of the number of filled
in pixels is returned as the area of the spot.
f.
Display
window.h/window.cpp
When showImage/showFltImage is
called, the Image/FltImage passed as a parameter is displayed in a window. The
window does not make a copy of the image, so care must be taken to not modify
the global Image while a window is open.
The function references the
static boolean variables s_gridShown, s_shrinkFactor to determine how the Image
should be shown. These variables are set using babyfunctions.
We use the windows API for
windows management.
KillWindow() sends a WM_CLOSE
message to the window and then makes the window active so that the window will
close immediately.
There are handy functions used for
debugging. ShowCirlces, for example, shows the spots location and size of the
spots. ShowFltImage displays a normalized version of an image represented by
floats.
There are a few static variables
to keep track of where the user has clicked on the screen. These are accessed
by a simple function and set by WM_<EVENTs>.
In the actual DLL ShowImage
opens a new thread to display the window in. This way POGO does not get locked
up when it is run. You cannot run a thread with _stdcall, so we cast showImage
to _cdecl.
g.
MetaImage
vision.h/vision.cpp
The metaImage is stored as a
static variable in vision.cpp. Each tile is specified by three coordinates
<row, col, channel>. Whenever the global Image is added to the metaImage,
the Image is copied. The pointer to the new, copied image is put into the
metaImage.
The metaImage is actually stored
in a 1-demensional array of Image pointers. To convert from 3D coordinate to 1D
coordinate use the formula:
1D
<--> 3D mapping
Index
= row * <total columns> * <total channels> + col *
<total channels> + channel
Therefore, the length of
metaImage array is <total rows> * <total columns> * <total
channels>
h.
File Saving
image.h/images.cpp
MetaImages can be saved in the
PNG format.
Internally, this is done in two
steps:
1.
convert a metaImage into a regular image.
First, a blank image is created
that is the size of our metaImage + room for borders. Then, each of the
metaImage tiles is copied onto the large Image. Any channels, tiles and
borders, which are unfilled will remain black in the final PNG. If the PNG is
to be an RGB then each row will be 3 times the length in pixels to contain all
of the channel data.
2.
save the Image as a PNG. This is just a series of function
calls to the libPNG library. LibPNG expects an Image in roughly the same format
we store an Image.
We use the libPNG library to
facilitate our saving. The libPNG (and zlib) source code is included in the
project, however they are NOT modified. We were having problems with
libPNG interface, and it is difficult to debug without source code.
4. Compiling
VISION compiles into a
multithreaded DLL that needs to be accessed by Visual Basic. Visual Basic needs
the DLL functions to use the __stdcall convention (the standard is __cdecl).
These calling conventions dictate how the function’s memory is handled.
There was difficulty in using
__stdcall throughout VISION. While we did compile the entire project under
__stdcall, we had to internally cast specific functions to use __cdecl. Also,
we compiled fft.cpp using __cdecl. There were two reasons for these
inconsistencies:
1.
New threads must use __cdecl
2.
fftw3.dll only would talk to functions using __cdecl. So,
we had to cast the interacting file/functions to __cdecl.
We used MS Visual Studio 6.0.
However, it should be cross compiler and cross platform compatible (the
exception is the Display module and the supplementary DLL’s, which are not
platform independent.)
Release Build Compiler Options
Preprocessor definitions:
WIN32,NDEBUG,_WINDOW,_MBCS,_USRDLL,_ATL_DLL
Code generation project options:
/nologo /Gz /MT /W3 /GX /O2 /D
"WIN32" /D "NDEBUG" /D "_WINDOW" /D
"_MBCS" /D "_USRDLL" /D "_ATL_DLL"
/Fp"Release/vision.pch" /YX /Fo"Release/"
/Fd"Release/" /FD /c
Linking project options:
kernel32.lib user32.lib gdi32.lib
winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib
odbc32.lib odbccp32.lib 1394camera.lib /nologo /subsystem:windows /dll
/incremental:no /pdb:"Release/vision.pdb"
/map:"Release/vision.map" /machine:I386
/out:"Release/vision.dll" /implib:"Release/vision.lib"
VISION has been tested on
Windows2k.
5. License
VISION is under the GNU General
Public License (GPL).
Copyright (C) 2004, ISB
See copying.
6. Credits
The author would like to acknowledge the following for
their contributions:
7. Contact
Dave Orendorff wrote this
manual. You can find him (until ~2007) at:
dweo@u.washington.edu
* babyfunction – any function a
baby could write, including functions like “newImage”, “getPixelAt”…