Home   Contact   Search de | en

Deep Learning Hardware and Software with FPGA


Deep neural networks (CNN) are conquering image processing.

Deep learning complements conventional algorithm-based image processing and excels with high reliability in recognition rates and very high bandwidths.

In the case of deep learning as one part of artificial intelligence, a computer model automatically learns the characteristics for differentiating between objects and then directly executes classification tasks on images, videos, audio or texts. Neural network architectures are typically used in such cases. If there is a high number of hidden layers in such networks, this is referred to as a deep neural network such as a CNN (Convolutional Neural Network).

For objects or defects to be classified, such a network extracts specifying characteristics based upon general characteristics (edges, color, light changes) in the upper levels through to particular, application-specific characteristics in the deeper ones. In image processing, CNN architecture is well suited for processing images and videos as 2D as well as 3D data (i.e., stereo vision, laser triangulation, time of flight, and blob tables).
Deep learning requires large amounts of pre-classified data that are learned during training, as well as high processing capacity. The more data is used for training, the higher the predictive accuracy will be.

High-speed deep learning application on FPGA for the error detection on reflecting surfaces

High-speed deep learning application on FPGA for the error detection on reflecting surfaces

Deep learning differentiates between the neural network’s training and learning, implementation of the network — for example, on an FPGA — and inference, i.e. execution of the network’s CNN algorithmic upon images with output of a classification result. The use of FPGA technology offers few added values in training; however, during inference, it offers much more. A demonstrator for the defect detection on metallic reflective surfaces has demonstrated how a deep learning solution with VisualApplets runs on an FPGA in real time and with high bandwidth. Throughput rates of 220 MB/sec with 99.4% classification accuracy were achieved. No additional CPU or ARM processors were required. The bandwidth and accuracy can be increased markedly more by using frame grabbers with greater FPGA resources. The example shown here involves the inspection of metallic surfaces with analysis of six defect classes.


Deep Learning Hardware

Deep learning achieves the necessary processing speed for production and can already replace existing solutions today. Inference suffices for the demands of inline inspection. New higher-performance Camera Link frame grabbers such as the microEnable 5 marathon deepVCL, equipped with a CNN run time license, already include larger FPGA processors with the necessary high processing power and bandwidth needed for deep learning applications. Larger FPGA resources make it possible to implement more complex architectures and thus applications at concurrently higher bandwidths. Future generations of frame grabbers will be equipped from the outset with appropriate power for the inference of very large neural networks. Users can continue to use their existing image processing system consisting of cameras, cables, lighting, sensors, and actuators.

Image Preprocessing for Even More Resources
With the additional FPGA resources, various processing steps can be carried out at once, with greater precision or in higher bit depth. The higher processing bandwidth enables processing of an entire image, more complex user designs such as loop processing, several parallel image outputs, and more image preprocessing on the FPGA. Using preprocessing with the aid of reduction in resolution and localization (blob analysis), the data throughput can be reduced with great effectiveness, allowing the bandwidth to be increased and smaller or deeper networks or smaller FPGAs to be used that often suffice for simple image processing tasks.

CNN ready Camera Link frame grabber with greater FPGA resources

CNN ready Camera Link frame grabber with greater FPGA resources

For embedded Vision applications, the approach supports third-party FPGA devices along with frame grabbers that are compatible with VisualApplets, such as cameras and vision sensors. To this end, a compatibility level is set up between the hardware and the VisualApplets programming core using VisualApplets Embedder. This reserved part of the FPGA can be programmed with VisualApplets as often as desired and can, for example, be used for decentralized deep learning applications.

Deep Learning Software

Integration of network architectures and weights in VisualApplets

Integration of network architectures and weights in VisualApplets

Using graphical FPGA programming with VisualApplets, appropriate network architectures of varying sizes and complexities can be integrated and pre-trained configuration parameters for network weights can be loaded for a variety of image processing applications. In so doing, information about networks and parameters from third-party software and training tools such as TensorFlow are also imported. New weights are easy to load, as long as the network remains unchanged. Retrainings, such as those for new work pieces in production, occur with relatively little effort. Should the test environment or objects change, the retrained images are reloaded via a new weights parameter set or as a new network.

Users see the mapping and configuration of a neural network as a CNN operator within the VisualApplets user interface. The specific CNN is connected between the camera operator as image source and the DMA operator as data transfer to the PC. Additional image optimizations can be integrated as preprocessing (such as image improvement, detail) or post-processing operators. To control external peripherals, signal processing operators are used that also execute discharges. The entire application design can be simulated as well as synthesized and always runs in real-time operation at the defined speed with the lowest latencies possible.
Graphical programming with VisualApplets using data flow diagrams opens up access for software developers and application engineers to image processing projects such as deep learning, greatly reducing their development time and as such their time to market, compared with HDL programming (VHDL/Verilog).

Deep Learning in Test Mode

In a current demonstrator environment for identifying six different defect classes on steel surfaces, a microEnable 5 marathon VCL was used. The steel surfaces were captured using a color line camera’s capture speed of 220 MB/sec and analyzed via a neural network with VisualApplets. Training was carried out at 200 images per defect class at a size of 300×300 pixels. Likewise, inference ran at 220 MB/sec, at which rate it could process all recorded data and achieve a predictive accuracy of 99.4%.
Training time was 3 days; implementation on the FPGA using VisualApplets with creation of a customized network and image improvement steps took 2 days.
Using new frame grabbers’ potential, with larger FPGA processors and resources (such as the microEnable 5 marathon deepVCL), the data throughput rate can be raised significantly, while at the same time deeper networks can be designed and implemented.

Real-time classification of defects on metallic surfaces

Real-time classification of defects on metallic surfaces

Deep Learning with Silicon Software Has Many Advantages

Deep learning in VisualApplets enables use of neural nets with FPGA technology for applications with industrial demands on real-time ability and low latencies (important for inline inspection), data throughput, bandwidth, and low heat output (important for embedded Vision).

Implementation of deep learning applications is easy to manage and quick to implement with VisualApplets — and this as a hardware-programmed real-time application.

Deep learning on FPGAs concurrently achieves high speed as well as bandwidth at a high rate of identification precision (between 98.4 and 99% in demonstrators).

Users profit from long-term savings due to low overall system costs and rapid customizability.

The FPGA approach delivers immediate results in real time (with the lowest possible latencies) even for high-resolution images: from image acquisition to image preprocessing (data reduction, image optimization), directly to resulting images.

By optimizing network architectures, the FPGA can also achieve high prediction probabilities on small networks over 98% and concurrently achieve data throughputs of over 200 MB/sec (current demonstration environment using Camera Link). Deep learning is thus becoming implementable for inline inspection and in the embedded area as well.

VisualApplets is based upon a graphical user interface using flow diagrams. FPGA programming assumes no knowledge of hardware programming. For use of deep learning in embedded devices (embedded Vision), VisualApplets can run on third-party FPGA devices such as cameras and vision sensors. These are flexibly equipped with the required specific intelligence and are repeatedly programmable. Application designs can be transferred to other devices.

The long-term availability (10 years and more) of FPGAs, frame grabbers, and the VisualApplets graphical programming environment guarantees a high degree of investment security.