Differences

This shows you the differences between two versions of the page.

Link to this comparison view

seminars:dnn_inference_optimization_across_the_system_stack [2018/10/26 15:25]
zzhao1 created
seminars:dnn_inference_optimization_across_the_system_stack [2018/10/26 15:26] (current)
zzhao1
Line 11: Line 11:
 Recent breakthroughs in deep learning have made Deep Neural Network (DNN) models a key component of many AI applications ranging from speech recognition and translation,​ face recognition,​ and object/​human detection and tracking, etc. These DNN models are very resource demanding in terms of computation cycles, memory footprint, power and energy consumption,​ etc. and are mostly trained and deployed in the cloud/​datacenters. However, these is a growing demand on pushing the deployment of these AI applications from cloud to a wide variety of edge and IoT devices that are closer to data and information generation sources for reasons such as better user experience (latency and throughput sensitive apps), data privacy and security, limited/​intermittent network bandwidth, etc.  Compared to datacenters,​ these edge devices are very resource constrained and may not be even able to host these compute expensive DNN models. Great efforts have been made to optimize the serving/​inference of these DNN models to enable their deployment on edge devices and to even reduce resource consumption/​cost in datacenters. Recent breakthroughs in deep learning have made Deep Neural Network (DNN) models a key component of many AI applications ranging from speech recognition and translation,​ face recognition,​ and object/​human detection and tracking, etc. These DNN models are very resource demanding in terms of computation cycles, memory footprint, power and energy consumption,​ etc. and are mostly trained and deployed in the cloud/​datacenters. However, these is a growing demand on pushing the deployment of these AI applications from cloud to a wide variety of edge and IoT devices that are closer to data and information generation sources for reasons such as better user experience (latency and throughput sensitive apps), data privacy and security, limited/​intermittent network bandwidth, etc.  Compared to datacenters,​ these edge devices are very resource constrained and may not be even able to host these compute expensive DNN models. Great efforts have been made to optimize the serving/​inference of these DNN models to enable their deployment on edge devices and to even reduce resource consumption/​cost in datacenters.
  
-We will talk about a few research and product work at Microsoft on optimizing DNN inference pipeline that touch upon hardware accelerator,​ compiler, model architecture,​ application requirements and system dynamics. +We will talk about a few research and product work at Microsoft on optimizing DNN inference pipeline that touch upon hardware accelerator,​ compiler, model architecture,​ application requirements and system dynamics. We will discuss how these works optimize different layers of the DNN system stack. Moreover, we will show the importance of looking at the DNN system stack holistically in order to achieve better model performance and resource constraints tradeoffs.
-We will discuss how these works optimize different layers of the DNN system stack. Moreover, we will show the importance of looking at the DNN system stack holistically in order to achieve better model performance and resource constraints tradeoffs.+
  
 =====Bio===== ​ =====Bio===== ​