ISB imaging module

Dave Orendorff

8/31/04

API DOCUMENTATION

POGO VISION

 

 

Overview

This document assumes you have read over the user manual and are familiar with the basic purpose and operation of VISION.

VISION was designed to be intuitive in its logic and scheme. The goal of this document is not to give an in-depth interview with all of VISION’s functions but rather a good-enough synopsis of VISION as to enable the reader to understand or modify VISION.

 

Outline

1.      Roadmap of VISION

2.      Structs

a.      Image/FltImage

b.      Grid

3.      Modules

a.      Image acquisition

b.      Accessing an image

c.      Image processing

d.      Spotfinding

e.      Finding the exact size of a spot

f.       Display

g.      MetaImage

h.      File Saving

4.      Compiling

5.      License

6.      Credits

7.    Contact

 

 

1. Roadmap– your guide to fixing and modifying VISION

 

The goals of VISION are based on a sequential algorithm. First we capture the image, then we process it, cut it up, analyze it and maybe save it. Therefore, VISION is modular in nature and should be easy to modify. We defined 2 new struct types, which I recommend you look at. We wanted to keep VISION c-style as much as possible. So, there is not a class hierarchy to consider.

 

VISION uses a series of function calls. These function calls are usually not linked to each other except through the main caller (vision.h/vision.cpp). They exchange primitives or one of the defined struct types.

 

I imagine that the quickest way to get a handle on VISION is to read Modules and Structs. Then, if you require further information the source files are moderately documented and should be helpful.

 

 

 

2. Structs

 

The two defined structs represent an Image and the Grid(how the spots are arranged). All struct data members are in the form _<variable name>.

 

 

 

 

Struct:

Image

Var:

 

int _width

Comments:

 

Pixel width of the Image.

Var:

 

int _height

Comments:

 

Pixel height of the Image.

Var:

 

unsigned char * _data

Comments:

A pointer to a long array of characters (length = width * height). It is represented like a c-string except is not NULL terminated.

 

Comments:

Image holds a grayscale, rectangular image represented by one byte per pixel. 0x00 is black and 0xFF is white. There are three data members:

 

FltImage is also a struct. This is the same as Image except pixels are floats instead of unsigned chars.

 

 

 

Struct:

Grid

Var:

int _top, _left, _bottom, _right;

Comments:

 

pixel boundary coordinates

Var:

 

int _rows, _cols;

Comments:

 

how many rows, columns of spots

 

Var:

 

int _dx, _dy;

Comments:

 

distance between adjacent vignettes

 

Comments:

 

You may notice that there is embedded redundant information. This is because these variables are referenced many times and calculating them every time can be costly. So, we just made a pact not to modify a Grid struct outside of grid.cpp. That is the reason for the babyfunctions.

 

 

 

 

 

3. Modules

 

There are independent tasks to be performed systematically in the library. Each one of the tasks represents a module.

 

 

a. Image acquisition

Image.h/image.cpp, cameraimage.h/cameraimage.cpp

There are three different methods to acquire an image. The first is to directly take the image from the camera. The second and third are to acquire the image from a PNG or PPM file.

 

CAMERA-

cameraimage.h

The function getGreyscale8BitCameraImage(…) uses 1394Camera.dll. It also uses a few headers with the string “1394Camera” embedded in their file names. Right now the settings for the camera are hardcoded into cameraimage.cpp file and the global variables: g_camWidth, g_camHeight.

If the camera settings ever need to be changed, the camera DLL we are using is nicely documented. We snipped a page from their manual that contains various settings for the camera. Changes should be made to the 1 function in cameraimage.cpp.

 

FROM FILE-

Image.h

In normal mode VISION never gets an image from a file. However, the code is for debugging purposes. Two formats are supported: PPM and PNG.

          PPM

          PPM images are really just a long array of pixels—exactly like we store an image. So, reading a PPM requires reading the header of the file, reading the size and pixel type, and then putting the rest of the file into a string. The function in VISION is buggy in that it does not read long files consistently. It does, however have support for RBG (P6) or grayscale (P5) PPM types.

          PNG

          PNG is a lossless compression format for images. It uses the zlib library for compression. This function is limited to reading black and white images. Though it can read RGB, it only takes the red channel out of the file.

          Unfortunately, libPNG seems touchy in that it can crash easily. We had to be extra careful not to do anything too fancy. 

 

b. Accessing an image

Image.h

Images are stored in the image struct. There are some babyfunctions in Image.h. The data of the images is (usually) accessed by getPixel(x, y) or else by referencing the index of the pixel directly by something like: image->_data[42].

Sometimes, it is handy to get a little image cutout from a large one. This is done with subImage. Note that subImage does only a shallow copy of the image.

 

c. Image Processing

processimage.h/processimage.cpp, affine.cpp

Perform a myriad of transforms on an Image. These functions do not have to do with spotfinding directly. This deals with Image:

          Gamma correction, flattening, inverting, copying, flipping, negative Laplacian, affine transforms and bounds checking.

All of these functions are done in place.

Flipping does exactly n computations, and uses 0 temporary storage variables. It does this amazing feat by using an array-switching algorithm found in auxiliary.h that switches variables by using two XOR operations and no overhead.

NegativeLaplacian is from DAPPLE. The algorithm is nicely documented in the source code.

Gamma correction (see source) uses floats to normalize the pixels such that the corrected image takes up the available bandwidth (a range of 255 values).

The affine transform is in a separate file for later portability. See the source code(affine.cpp) for the algorithm.

          The rest of the processing functions are strait forward and are done using intuitive algorithms.

 

d. Spotfinding

findspots.h/findspots.cpp, fft.h/fft.cpp

This is the core spotfinder of the program. Jeremy Buhler wrote the majority of this file and the fft files. I needed to take a lot of his code and put it into a nice little place. So, this is it. It is not messy, but it does use a few static variables for speed and simplicity. The spotfinder does use the FFTW3 library. See his paper for how the spotfinder works.

That said, I did change his algorithm in a few ways:

1.                              It does not ‘learn’ what a good spot acts like.

2.                              It does not guess multiple times, erasing the erroneous data on each run, until it finds a good spot.

3.                              There is no bias for spots located near the center.

 

 

e. Finding the exact size of a spot

spotarea.h/spotsarea.cpp

The soptfinder finds the approximate radius of the spot. However, the radius of the spot to the nearest pixel is not a good enough number to calculate the size of the spot. Therefore, spotArea works to find the exact size of the spot. There is only one function to reference. This function uses the approximate location and size found from the spotfinder. 

 

 

Algorithm…

 

Text Box:  Negative LaplacianSpotArea gets the negative Laplacian of a particular grid area in the image. Since the definition of a spot is that the outside pixel intensity goes from light to dark, the negative laplacian should have a negatively colored ring representing the outside of the spot.

SpotArea then finds the ring and traces its outline. Consistently the ring has a width of at least two pixels. So, we prod pixels that are located where the spot should be until will find a ‘pool’ of negative pixels. When the pool is of a critical size, we know we have found the pool. This pool is found by using the recursive function:

 

RecursiveFill(…) Algorithm

If this pixel is not filled in (as defined by the functions parameters), fill it in and call the neighbors to let them know that they too can be filled.

 

Text Box:  pooled in spot

Return 1 + whatever the neighbors return.

 

 

Filling in the shape

Once we have our shape outlined, we just need to fill in the pixels that may have been missed by the pool.

This is done in 2 steps, vertical and horizontal filling.

 

 

 

Text Box:  filled in

Horizontal filling

Two ‘soldiers’ start at the opposite edges of a row. They start marching towards each other and stop when they either hit each other or hit a bit of filled in ‘terrain’(pixels). They then fire at each other. Any terrain not filled in between them is scorched (filled in).

This operation is done on each row.

 

Vertical filling Same as horizontal but up and down instead of left and right.

 

A tally of the number of filled in pixels is returned as the area of the spot.

 

 

f. Display

window.h/window.cpp

When showImage/showFltImage is called, the Image/FltImage passed as a parameter is displayed in a window. The window does not make a copy of the image, so care must be taken to not modify the global Image while a window is open.

The function references the static boolean variables s_gridShown, s_shrinkFactor to determine how the Image should be shown. These variables are set using babyfunctions.

We use the windows API for windows management. 

KillWindow() sends a WM_CLOSE message to the window and then makes the window active so that the window will close immediately.

There are handy functions used for debugging. ShowCirlces, for example, shows the spots location and size of the spots. ShowFltImage displays a normalized version of an image represented by floats.

There are a few static variables to keep track of where the user has clicked on the screen. These are accessed by a simple function and set by WM_<EVENTs>.

 

In the actual DLL ShowImage opens a new thread to display the window in. This way POGO does not get locked up when it is run. You cannot run a thread with _stdcall, so we cast showImage to _cdecl.

 

 

g. MetaImage

vision.h/vision.cpp

The metaImage is stored as a static variable in vision.cpp. Each tile is specified by three coordinates <row, col, channel>. Whenever the global Image is added to the metaImage, the Image is copied. The pointer to the new, copied image is put into the metaImage.

The metaImage is actually stored in a 1-demensional array of Image pointers. To convert from 3D coordinate to 1D coordinate use the formula:

          1D <--> 3D mapping

          Index = row * <total columns> * <total channels> + col * <total channels> + channel

 

Therefore, the length of metaImage array is <total rows> * <total columns> * <total channels>

         

 

h. File Saving

image.h/images.cpp

MetaImages can be saved in the PNG format.

Internally, this is done in two steps:

1.                              convert a metaImage into a regular image.

First, a blank image is created that is the size of our metaImage + room for borders. Then, each of the metaImage tiles is copied onto the large Image. Any channels, tiles and borders, which are unfilled will remain black in the final PNG. If the PNG is to be an RGB then each row will be 3 times the length in pixels to contain all of the channel data.

2.      save the Image as a PNG. This is just a series of function calls to the libPNG library. LibPNG expects an Image in roughly the same format we store an Image.

 

We use the libPNG library to facilitate our saving. The libPNG (and zlib) source code is included in the project, however they are NOT modified. We were having problems with libPNG interface, and it is difficult to debug without source code.

 

 

 

4. Compiling

 

VISION compiles into a multithreaded DLL that needs to be accessed by Visual Basic. Visual Basic needs the DLL functions to use the __stdcall convention (the standard is __cdecl). These calling conventions dictate how the function’s memory is handled. 

 

There was difficulty in using __stdcall throughout VISION. While we did compile the entire project under __stdcall, we had to internally cast specific functions to use __cdecl. Also, we compiled fft.cpp using __cdecl. There were two reasons for these inconsistencies:

1.      New threads must use __cdecl

2.      fftw3.dll only would talk to functions using __cdecl. So, we had to cast the interacting file/functions to __cdecl.

 

Compiler

We used MS Visual Studio 6.0. However, it should be cross compiler and cross platform compatible (the exception is the Display module and the supplementary DLL’s, which are not platform independent.)

 

Release Build Compiler Options Preprocessor definitions:

WIN32,NDEBUG,_WINDOW,_MBCS,_USRDLL,_ATL_DLL

 

Code generation project options:

/nologo /Gz /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOW" /D "_MBCS" /D "_USRDLL" /D "_ATL_DLL" /Fp"Release/vision.pch" /YX /Fo"Release/" /Fd"Release/" /FD /c

 

Linking project options:

kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib 1394camera.lib /nologo /subsystem:windows /dll /incremental:no /pdb:"Release/vision.pdb" /map:"Release/vision.map" /machine:I386 /out:"Release/vision.dll" /implib:"Release/vision.lib"

 

 

Platform

VISION has been tested on Windows2k.

 

 

5. License

 

VISION is under the GNU General Public License (GPL).

 

Copyright (C) 2004, ISB

See copying.

 

 

6. Credits

 

The author would like to acknowledge the following for their contributions:

 

 

 

7. Contact

Dave Orendorff wrote this manual. You can find him (until ~2007) at:

dweo@u.washington.edu

 

 

 

* babyfunction – any function a baby could write, including functions like “newImage”, “getPixelAt”…