This shows you the differences between two versions of the page.

Link to this comparison view

seminars:seminar_11_02_08 [2017/09/20 22:02] (current)
Line 1: Line 1:
 +======CHIPPER:​ A Low-complexity Bufferless Deflection Router======
 +Tuesday Feb. 8, 2011\\
 +Hamerschlag Hall D-210\\
 +4:00 pm\\
 +**[[http://​www.ece.cmu.edu/​~cfallin/​|Chris Fallin]]**\\
 +Carnegie Mellon University\\
 +As Chip Multiprocessors (CMPs) scale to tens or hundreds of nodes, the
 +interconnect becomes a significant factor in cost, energy consumption
 +and performance. Recent work has explored many design tradeoffs for
 +networks-on-chip (NoCs) with novel router architectures to reduce
 +hardware cost. In particular, recent work proposes bufferless
 +deflection routing to eliminate router buffers. The high cost of
 +buffers makes this choice potentially appealing, especially for
 +low-to-medium network loads. However, current bufferless designs
 +usually add complexity to control logic. Deflection routing introduces
 +a sequential dependence in port allocation, yielding a slow critical
 +path. Explicit mechanisms are required for livelock freedom due to the
 +non-minimal nature of deflection. Finally, deflection routing can
 +fragment packets, and the reassembly buffers require large worst-case
 +sizing to avoid deadlock, due to the lack of network backpressure. The
 +complexity that arises out of these three problems has discouraged
 +practical adoption of bufferless routing.
 +To counter this, we propose CHIPPER (Cheap-Interconnect Partially
 +Permuting Router), a simplified router microarchitecture that
 +eliminates in-router buffers and the crossbar. We introduce three key
 +insights: first, that deflection routing port allocation maps naturally
 +to a permutation network within the router; second, that livelock
 +freedom requires only an implicit token-passing scheme, eliminating
 +expensive age-based priorities; and finally, that flow control can
 +provide correctness in the absence of network backpressure,​ avoiding
 +deadlock and allowing cache miss buffers (MSHRs) to be used as
 +reassembly buffers. Using multiprogrammed SPEC CPU2006, server, and
 +desktop application workloads and SPLASH-2 multithreaded workloads, we
 +achieve an average 54.9% network power reduction for 13.6% average
 +performance degradation (multiprogrammed) and 73.4% power reduction
 +for 1.9% slowdown (multithreaded),​ with minimal degradation and large
 +power savings at low-to-medium load. Finally, we show 36.2% router
 +area reduction relative to buffered routing, with comparable timing.
 +Chris Fallin is a second-year Ph.D. student in Electrical & Computer
 +Engineering at Carnegie Mellon University. He is advised by Dr. Onur
 +Mutlu and is a member of the SAFARI research group within CALCM
 +(Computer Architecture Laboratory at Carnegie Mellon). He studies
 +interconnect and memory system design in large CMPs. Chris is
 +currently supported by an NSF Graduate Research Fellowship. He
 +received a B.S. in Computer Engineering from the University of Notre
 +Dame in 2009.\\
 +**[[seminars| Back to the seminar page]]**