AI and FPGA: Stronger Together? Hyperparameter Optimization for Deep Learning With FPGAs

A guest comment from Sebastian Gerstl | Translated by AI 9 min Reading Time

Related Vendors

The use of neural networks in embedded systems is challenging due to energy, resource, and runtime requirements. FPGAs offer more flexibility through their configurability compared to GPUs, without requiring more energy—provided targeted optimization is applied.

(Image: freely licensed /  Pixabay)
(Image: freely licensed / Pixabay)

Implementing neural networks in embedded systems is challenging due to constraints such as energy limitations, availability of computing resources, and runtime requirements. For example, an energy-autonomous and AI-supported sensor platform for local data processing must ensure battery operation and real-time functionality. On one hand, the quality and efficiency of inference depend on the task and the neural network used, while on the other, the hardware employed also plays a crucial role.

FPGAs are an interesting hardware approach here because their configurability allows the hardware for inference computation to be specifically tailored to the requirements. Flexible hardware is not provided, for example, with microcontrollers, but can be utilized with FPGAs as an additional degree of freedom in optimizing neural networks on embedded systems. However, this additional degree of freedom introduces a cross-relationship: which model configurations work best with which hardware implementation? This is not trivial, as the best individual hyperparameter configuration must be found for the specific applications and their associated requirements, and the solution can hardly be generalized if conditions change.

Gallery

Therefore, it is advantageous to align the individual components of the system with each other. Using a method based on Bayesian optimization, which considers both the parameterization of the neural network and the flexibility of the FPGA in the search space, high-quality compromises between competing optimization goals such as high accuracy and low energy consumption can be achieved to improve inference. By simultaneously combining software and hardware optimization, better hyperparameter configurations can be determined.

FPGAs have several advantages in relation to deep learning. The modules are often directly connected to sensors or placed at interfaces, allowing them to process data with neural networks right where it is generated. FPGAs enable high parallelism compared to traditional processors, which is crucial for deep learning. Furthermore, they are characterized by lower redundancy, making them faster and more efficient than general hardware, as specialized functions can be implemented. Specifically, compared to ASICs, FPGAs allow for quicker and more cost-effective hardware adjustments.

FPGAs are usually slower than GPUs due to their potential hardware flexibility; however, through specialization in implementation and parallelism, they can achieve comparable results. Often more problematic: FPGAs cannot be directly controlled with traditional deep learning frameworks.

Frameworks for FPGA Inference

There are, however, frameworks that address precisely this issue. These automate the implementation of the FPGA and generate an optimized description of the models for executing inference on the FPGA. The models are typically defined at a high level and, for example, trained with PyTorch. Examples of such tools include Vitis AI and FPGA AI Suite. The implementation of the hardware can be influenced by configurations that are set automatically or manually, depending on the model description. The hardware configuration serves as the basis for the implementation of the FPGA, on which the inference is executed. While the automatic generation of a suitable hardware description from the model description may initially seem appealing, it has several disadvantages:

  • The execution of inference on the hardware must be simulated depending on the respective model to make statements about computation time and energy consumption. This requires precise knowledge of the hardware and the framework's backend.
  • An accurate modeling of energy consumption is challenging.
  • Side effects, such as variable latency due to memory accesses, are very difficult to model.
  • Not all frameworks allow automatic configuration.

This means that real experiments are necessary for optimizing the hyperparameters of the model and the hardware in order to determine optimal configurations with concrete measurements. The following experiments were conducted representatively using the Vitis AI framework, which does not support model simulation. The Cifar-10 dataset was selected.

Analysis of the Hardware Platform

In Vitis AI, a configurable accelerator architecture (Deep Learning Processor Unit: DPU) is implemented on a Zynq UltraScale+ MPSoC, and the trained model is exported into macro instructions and data for this accelerator. The platform implements a CPU and an FPGA that are interconnected. The exported model is executed on the FPGA part with the respective inputs via an API from the CPU part of the platform. The configurations influence the computational and memory resources utilized by the accelerator and the performance of the accelerator. The various architectures differ in the maximum possible operations per clock cycle, which is also reflected in their naming conventions.

To enable a better comparison between different systems, the size of energy costs is used to represent the energetic price of an inference. When comparing the different accelerator architectures for inferring ResNet-50 on Cifar-10, it becomes evident that energy costs do not steadily decrease with an increasing number of maximum possible operations (Image 1). This is related to the actual internal parameterization of the implementation, which scales parallelism and memory capacity. With higher nominal performance, the actual energy consumption and the size of the implementation also increase, which can be a disqualifying factor in application scenarios. Therefore, the prior selection of the architecture is not trivial.

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent

Furthermore, the models ResNet-50, VGG16, and MobileNetV2 were executed on both the FPGA platform and a GPU for Cifar-10. Both the inference speed and the energy costs on the respective hardware vary significantly between the different models. The inference speed of the GPU was clearly superior to that of the FPGA platform in all cases. The FPGA platform was only more energy-efficient than the GPU in the case of MobileNetV2. The architecture and parameterization of the model, as well as the hardware configuration, strongly influence the inference.

This shows that the implementation of the hardware and the neural network used have a significant impact on the quality of the inference. To achieve the best compromise between both worlds, an optimization problem must be solved.

Optimization of Hyperparameters

On the one hand, the new layer of hardware configuration creates new and difficult-to-interpret dependencies, and on the other hand, evaluations on the actual system are time-consuming because the models must be trained and converted for the FPGA, and the FPGA itself must also be built. Therefore, an optimization should be chosen that can achieve usable results in a manageable number of iterations by making high-quality directional decisions based on previous evaluations of the system. Furthermore, the optimization should encompass multiple competing design objectives to find a good compromise. The maximization of accuracy and inference speed as well as the minimization of energy consumption were considered here. The search space of the various hyperparameters spans the architecture and parameterization of a Convolutional Neural Network and the various DPU architectures provided by Vitis AI. The optimization specifies a hyperparameter configuration that influences the neural network, the compilation of the model graph, and the implementation of the hardware (Image 2). The model is first generated, trained, and exported. Afterwards, the model is compiled for the hardware and the hardware implementation is built. Finally, both are integrated on the FPGA platform and the inference is executed. The inference measurements and the selected hyperparameter configuration form the basis for the next optimization step.

For the mentioned requirements, Bayesian optimization is a suitable approach, as the selection of the next configuration to be examined is based on all past measurements, but the approach itself is less parameterized and random compared to other methods like genetic algorithms. In Bayesian optimization, a probabilistic model is created based on past measurement observations to describe the unknown system function. This model provides estimates of the various measurement points of the function as well as their uncertainties. Evaluating this model is more cost-effective than testing configurations on the real system. This model is updated after each measurement cycle, and an acquisition function is optimized based on it to determine the best configuration to be evaluated next. After an initial configuration is selected, iterations are performed until enough high-quality results are achieved. In each iteration, a new configuration is determined with which a neural network is generated and the hardware is implemented.

Experiments And Results

A configurable model for Cifar-10 was created with a variable number of convolutions and their sizes, different positioning of pooling operators, and the option to use residual connections or more efficient depthwise-separable convolutions. On the hardware side, the various DPU architectures were defined as hyperparameters.

The optimization algorithm was iterated 100 times (Images 3 and 4). Each point represents a specific configuration that performs differently with respect to the design objectives. The plots reveal a boundary of configurations that balance the development objectives particularly well. This boundary is called the Pareto front. Configurations on the Pareto front represent the best compromises, each with different weighting of the development objectives. However, not all configurations on the Pareto front are relevant, as accuracy can only be reduced to a reasonable extent to improve throughput or energy consumption. Consequently, only the range of high accuracy is of interest. The results of Bayesian optimization were also compared with a random search within the same search space. Bayesian optimization more frequently examines configurations closer to the Pareto front, which is also comparatively better developed. Random search achieves worse results in terms of the speed and energy efficiency of configurations with high accuracy compared to Bayesian optimization when considering the extremes and average values.

The ResNet-50 model for Cifar-10 was executed on both the FPGA platform and a GPU. Additionally, the high-accuracy configurations with the best speed or energy consumption from the optimization results were compared (see table): On the GPU, only the model configuration is relevant. It can be observed that the FPGA is generally slower and consumes less power than the GPU. In terms of potential applications in embedded systems, this would be entirely acceptable and desirable. For ResNet-50, which was not optimized for the FPGA, it is clear that the GPU achieves better results than the FPGA platform. For the optimized configurations, the difference is significantly smaller. The inferences are faster, and the energy costs are lower compared to the ResNet-50 model. For the fastest model, the GPU was even surpassed by the FPGA platform in terms of energy costs.

In this context, it becomes evident that the models were generally optimized, but the larger improvements clearly manifest on the FPGA, as the models were specifically optimized for the hardware and the hardware was also adapted to them. It has been demonstrated that the FPGA can achieve similar energy efficiency to a GPU, meaning that an optimized hyperparameter configuration on an FPGA can be relatively at least as performance-efficient as a GPU.

With the presented method, it is thus possible to simultaneously optimize the hyperparameters of the model architecture and the hardware to improve the quality and efficiency of FPGA-accelerated deep learning inferences. The hardware configurations influence the performance and efficiency of inference, making it sensible to include the hardware configuration in the search space. The presented method is independent of the framework and hardware used and can be flexibly integrated if the configurations of the platform and framework can be externally adjusted. The hyperparameters can be extracted from the model, framework, or hardware level and can be either discrete or continuous.

The runtime of an iteration is largely determined by the model training and the generation of the hardware implementation, making it sensible to minimize the number of optimization iterations. Bayesian optimization is suitable due to its well-founded system model to reduce convergence time. Compared to methods like genetic algorithms, which rely on randomness, Bayesian optimization should be superior in terms of runtime.

The results cannot be transferred to other hardware platforms because the optimization was performed only within the context of the system under consideration. Therefore, it is also difficult to compare the results to previous work, as the initial conditions differ too greatly. However, the optimization defines a standardized approach, enabling a comparison of different platforms or models. (sg)