JPFix

 Digital Photo Retouching & Enhancement Service

FPGA-Based Image Processing Accelerator

Image manipulations (digital retouching) can consume large amounts of the processing time, which also needs to handle other tasks working for representing image on the screen. This is especially important for interactive applications, such as photo editors:
 we cannot use parallel multitasking and bunch processing.

A lot of researches have been performed in recent years into the acceleration of image processing algorithms using specific graphic processors (GPU). Unfortunately, this way is very expensive, because it requires huge technological job to create new silicon microchips. In other hand, the GPU, like any processor executes instructions step by step and performance is still limited by clock frequency.

We can dramatically increase performance using reconfigurable hardware logic devices such as FPGAs (Field Programmable Gate Arrays).
FPGA device can contain millions of logic gates, which are the basic building blocks of digital circuits.
FPGAs hold several advantages over CPUs when it comes to image processing. While they often run at much lower clock speeds, the parallel nature of hardware logic allows FPGAs to execute certain algorithms much faster than a regular CPU.

For example, lets try to create a simple low-pass filter. Such filter can be used for image softening.

Fig. 1

Look at the code (original C and CPU instructions, generated by a compiler):

Fig. 2

We see 14 CPU instructions, needed to process a single pixel. And during this time CPU cannot do anything else.

Our example demonstrates only one-dimension and single color filter. Real image requires two-dimension three color vector processing. But CPUs oriented to scalar type of data, and this factor dramatically increases number of instructions.
Typical image processing operations such as Fourier transform, statistical analyzing and geometrical transformations consume much more CPU time. As a result, each stage of image edition process can reach minutes, which is not acceptable for interactive applications.

Working with FPGA we can allocate a small part of the silicon chip for our filter. This part will work in parallel with other parts and number of components of the vector will affect only the complexity, but not the performance.
FPGAs, being reprogrammable elements, allow us to program parts and test them at any stage of the design process. If we need to change something, we can immediately reprogram a part. FPGAs also allow us to implement hardware computing functions that were previously impossible, especially for real-time applications.

The best result can be reached combining the CPU or GPU and FPGA devices. In such configuration FPGA process single pixel or multiple (array) pixels during few clock periods. CPU controls the data flow and overall operating. In this case, the processor will be used as a "master" configuration unit to direct the flow of data to "slave" FPGA device and re-assembly of image data on each stage of job.
The design must incorporate enough programmable gates for processing and enough SRAM for data holding to be able to support the processing of image data of up to 4096 x 4096 pixels in size.

This size requires a big amount of memory: at least 50MB for only the original image. Calculations require 16-bit accuracy and up to 20 buffers for intermediate results, so we need 1-2GB of memory. Fortunately we do not need all this onboard, PCI express interface allows to use conventional computer's memory.
The biggest obstacle on way of migration from software to hardware is difference of design concepts. Historically, software designers use C language, but FPGA designers use VHDL, Verilog or even schematic diagrams. Newest design tools allow to implement C code to FPGA design (see article:Generate FPGA accelerators from C) and now we are able to move software parts of our system to hardware level.

© 2006 JPFix. All rights reserved. JPFix is a registered trademark of JPFix, LLC.