Adaptive Load Balancer for Distributed FPGA Acceleration System

Wednesday March 7, 2018
Location: HH D-level Conference Room
Time: 4:30PM


FPGAs have been deployed in data center to accelerate cloud applications, for example, Microsoft Catapult FPGA system, where a FPGA is attached to each server and all the FPGAs are interconnected through Ethernet. When a server's host CPU offloads a job to the attached FPGA, it actually submits the job to a pool of FPGAs. Then the question is which FPGA should we choose to accelerate that job. This is a load balancing problem, because if some FPGAs are overloaded, the latency of the jobs would be high.

We propose an adaptive load balancer that using the number of in-flight jobs to model the load of FPGAs. For our Bing search case study, we observed that our adaptive load balancer can (1)balance the load evenly and dynamically, (2) is insensitive to load traffic types, and (3)keep the system simple. Compared with the baseline hardware Round-robin load balancer, our adaptive load balancer can reduce Bing search 99.9% tail latency from 15% to 35%. This work was done when I was a summer intern at Microsoft Research Catapult Team in 2017.


Zhipeng Zhao is a fourth year Ph.D. student in the Department of Electrical and Computer Engineering at CMU advised by Prof. James Hoe. His research focuses on High-level Synthesis, distributed FPGA acceleration system and Network Function acceleration.