Deep Learning on FPGAs
High Speed, Bandwidth, and Predictive Accuracy
In the case of deep learning as one part of artificial intelligence, a computer model automatically learns the characteristics for differentiating between objects and then directly executes classification tasks on images, videos, audio or texts. Neural network architectures are typically used in such cases. If there is a high number of hidden layers in such networks, this is referred to as a deep neural network such as a CNN (Convolutional Neural Network).
For objects or defects to be classified, such a network extracts specifying characteristics based upon general characteristics (edges, color, light changes) in the upper levels through to particular, application-specific characteristics in the deeper ones. In image processing, CNN architecture is well suited for processing images and videos as 2D as well as 3D data (i.e., stereo vision, laser triangulation, time of flight, and blob tables).
Deep learning requires large amounts of pre-classified data that are learned during training, as well as high processing capacity. The more data is used for training, the higher the predictive accuracy will be.
High-speed deep learning application on FPGA for the error detection on reflecting surfaces
Deep learning differentiates between the neural network’s training and learning, implementation of the network — for example, on an FPGA — and inference, i.e. execution of the network’s CNN algorithmic upon images with output of a classification result. The use of FPGA technology offers few added values in training; however, during inference, it offers much more. A demonstrator for the defect detection on metallic reflective surfaces has demonstrated how a deep learning solution with VisualApplets runs on an FPGA in real time and with high bandwidth. Throughput rates of 220 MB/sec with 99.4% classification accuracy were achieved. No additional CPU or ARM processors were required. The bandwidth and accuracy can be increased markedly more by using frame grabbers with greater FPGA resources. The example shown here involves the inspection of metallic surfaces with analysis of six defect classes.
Deep Learning Hardware
Deep learning achieves the necessary processing speed for production and can already replace existing solutions today. Inference suffices for the demands of inline inspection. New higher-performance Camera Link frame grabbers such as the microEnable 5 marathon deepVCL, equipped with a CNN run time license, already include larger FPGA processors with the necessary high processing power and bandwidth needed for deep learning applications. Larger FPGA resources make it possible to implement more complex architectures and thus applications at concurrently higher bandwidths. Future generations of frame grabbers will be equipped from the outset with appropriate power for the inference of very large neural networks. Users can continue to use their existing image processing system consisting of cameras, cables, lighting, sensors, and actuators.
CNN ready Camera Link frame grabber with greater FPGA resources
Image Preprocessing for Even More Resources
With the additional FPGA resources, various processing steps can be carried out at once, with greater precision or in higher bit depth. The higher processing bandwidth enables processing of an entire image, more complex user designs such as loop processing, several parallel image outputs, and more image preprocessing on the FPGA. Using preprocessing with the aid of reduction in resolution and localization (blob analysis), the data throughput can be reduced with great effectiveness, allowing the bandwidth to be increased and smaller or deeper networks or smaller FPGAs to be used that often suffice for simple image processing tasks.
For embedded Vision applications, the approach supports third-party FPGA devices along with frame grabbers that are compatible with VisualApplets, such as cameras and vision sensors. To this end, a compatibility level is set up between the hardware and the VisualApplets programming core using VisualApplets Embedder. This reserved part of the FPGA can be programmed with VisualApplets as often as desired and can, for example, be used for decentralized deep learning applications.
Deep Learning Software
Integration of network architectures and weights in VisualApplets
Using graphical FPGA programming with VisualApplets, appropriate network architectures of varying sizes and complexities can be integrated and pre-trained configuration parameters for network weights can be loaded for a variety of image processing applications. In so doing, information about networks and parameters from third-party software and training tools such as TensorFlow are also imported. New weights are easy to load, as long as the network remains unchanged. Retrainings, such as those for new work pieces in production, occur with relatively little effort. Should the test environment or objects change, the retrained images are reloaded via a new weights parameter set or as a new network.
Users see the mapping and configuration of a neural network as a CNN operator within the VisualApplets user interface. The specific CNN is connected between the camera operator as image source and the DMA operator as data transfer to the PC. Additional image optimizations can be integrated as preprocessing (such as image improvement, detail) or post-processing operators. To control external peripherals, signal processing operators are used that also execute discharges. The entire application design can be simulated as well as synthesized and always runs in real-time operation at the defined speed with the lowest latencies possible.
Graphical programming with VisualApplets using data flow diagrams opens up access for software developers and application engineers to image processing projects such as deep learning, greatly reducing their development time and as such their time to market, compared with HDL programming (VHDL/Verilog).
Deep Learning in Test Mode
In a current demonstrator environment for identifying six different defect classes on steel surfaces, a microEnable 5 marathon VCL was used. The steel surfaces were captured using a color line camera’s capture speed of 220 MB/sec and analyzed via a neural network with VisualApplets. Training was carried out at 200 images per defect class at a size of 300×300 pixels. Likewise, inference ran at 220 MB/sec, at which rate it could process all recorded data and achieve a predictive accuracy of 99.4%.
Training time was 3 days; implementation on the FPGA using VisualApplets with creation of a customized network and image improvement steps took 2 days.
Using new frame grabbers’ potential, with larger FPGA processors and resources (such as the microEnable 5 marathon deepVCL), the data throughput rate can be raised significantly, while at the same time deeper networks can be designed and implemented.
Areas of Application
Deep learning is already in use today in sample and object detection with classification. The procedure achieves the best results with varying objects and identification of defects or anomalies as well as with difficult surfaces including transparencies and reflections. In the manufacturing process, a machine is thus able to manage a variety of variants even in varying surrounding conditions. Here, deep learning is used successfully in condition monitoring and predictive maintenance. Further areas of application encompass inline inspection, robotics, pick and place, autonomous driving and driver assistance systems, drones, satellite imaging, agriculture, medical technology, cellular research, and cognitive systems that work with humans, such as those used in human-machine interaction (HMI).
|Machine Vision||Embedded Vision||Non-Industrial|
|Inline Inspection||Manufacturing||Medical Technology|
|Cognitive Systems||Drones||Satellite Imaging|
|Human-machine Interaction (HMI)||Pick and Place||Cellular Research|