arXiv:1705.02583 Abstract | arXiv Analytics

arXiv:1705.02583 [cs.LG]Abstract References Reviews Resources

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

Xinyu Zhang, Srinjoy Das, Ojash Neopane, Ken Kreutz-Delgado

Published 2017-05-07Version 1

In recent years deep learning algorithms have shown extremely high performance on machine learning tasks such as image classification and speech recognition. In support of such applications, various FPGA accelerator architectures have been proposed for convolutional neural networks (CNNs) that enable high performance for classification tasks at lower power than CPU and GPU processors. However, to date, there has been little research on the use of FPGA implementations of deconvolutional neural networks (DCNNs). DCNNs, also known as generative CNNs, encode high-dimensional probability distributions and have been widely used for computer vision applications such as scene completion, scene segmentation, image creation, image denoising, and super-resolution imaging. We propose an FPGA architecture for deconvolutional networks built around an accelerator which effectively handles the complex memory access patterns needed to perform strided deconvolutions, and that supports convolution as well. We also develop a three-step design optimization method that systematically exploits statistical analysis, design space exploration and VLSI optimization. To verify our FPGA deconvolutional accelerator design methodology we train DCNNs offline on two representative datasets using the generative adversarial network method (GAN) run on Tensorflow, and then map these DCNNs to an FPGA DCNN-plus-accelerator implementation to perform generative inference on a Xilinx Zynq-7000 FPGA. Our DCNN implementation achieves a peak performance density of 0.012 GOPs/DSP.

Categories: cs.LG, cs.NE

Keywords: deconvolutional neural networks, efficient implementation, memory access patterns, fpga deconvolutional accelerator design methodology, high performance

Related articles: Most relevant | Search more

arXiv:2104.01303 [cs.LG] (Published 2021-04-03)

Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation

Xizi Chen, Jingyang Zhu, Jingbo Jiang, Chi-Ying Tsui

arXiv:2105.01196 [cs.LG] (Published 2021-05-03)

EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering Algorithm in Julia

Paweł Renc, Patryk Orzechowski, Aleksander Byrski, Jarosław Wąs, Jason H. Moore

arXiv:1910.01382 [cs.LG] (Published 2019-10-03)

Silas: High Performance, Explainable and Verifiable Machine Learning

Hadrien Bride, Zhe Hou, Jie Dong, Jin Song Dong, Ali Mirjalili