{ "id": "2304.10891", "version": "v1", "published": "2023-04-21T11:15:31.000Z", "updated": "2023-04-21T11:15:31.000Z", "title": "Transformer-based models and hardware acceleration analysis in autonomous driving: A survey", "authors": [ "Juan Zhong", "Zheng Liu", "Xi Chen" ], "categories": [ "cs.LG", "cs.AI", "cs.CV", "cs.RO", "cs.SY", "eess.SY" ], "abstract": "Transformer architectures have exhibited promising performance in various autonomous driving applications in recent years. On the other hand, its dedicated hardware acceleration on portable computational platforms has become the next critical step for practical deployment in real autonomous vehicles. This survey paper provides a comprehensive overview, benchmark, and analysis of Transformer-based models specifically tailored for autonomous driving tasks such as lane detection, segmentation, tracking, planning, and decision-making. We review different architectures for organizing Transformer inputs and outputs, such as encoder-decoder and encoder-only structures, and explore their respective advantages and disadvantages. Furthermore, we discuss Transformer-related operators and their hardware acceleration schemes in depth, taking into account key factors such as quantization and runtime. We specifically illustrate the operator level comparison between layers from convolutional neural network, Swin-Transformer, and Transformer with 4D encoder. The paper also highlights the challenges, trends, and current insights in Transformer-based models, addressing their hardware deployment and acceleration issues within the context of long-term autonomous driving applications.", "revisions": [ { "version": "v1", "updated": "2023-04-21T11:15:31.000Z" } ], "analyses": { "keywords": [ "hardware acceleration analysis", "transformer-based models", "autonomous driving applications", "convolutional neural network", "operator level comparison" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }