The confluence of disruptive technologies beyond CMOS and “Big Data” workloads calls for a fundamental paradigm shift from homogenous compute-centric system which was designed for handling structured data to new heterogeneous data-centric system which can effectively store/process a large set of semi-structured or unstructured data for better innovation, competition and productivity. In a heterogeneous system, silicon CMOS (e.g., multi-core CPU) will continue to play a major role in primary computing and essential bookkeeping while the tasks that are either difficult, expensive, or even unachievable with standard CMOS within a fixed power/cost budget can be effectively offloaded to hardware engines built with other technologies. By harnessing the potential of new technologies, we can enable efficient data-centric computing by building cost-effective heterogeneous hardware substrate with significantly enhanced energy efficiency, performance, throughput and scalability. With the objective of rethinking data-centric system design from ground up, I will present a PCM-CMOS hardware accelerator based on the concept of ternary content addressable memory (TCAM) using emerging memory technology i.e., phase change memory (PCM). In particular, a fully-functional heterogeneous chip was designed and fabricated for the first time, achieving >10x cell area reduction compared to homogenous CMOS-based at the same technology node. The accelerator distributes compute units within storage elements in a cost-effective way, providing fine-grained control and high bandwidth close to data sources to avoid communication cost. It is particularly efficient in performing search operation with high and deterministic lookup rate. It can also be used as either a monolithic compute unit to perform direct data-flow computation or a monolithic storage media as storage class memory. Thus, it is an attractive solution for a wide range of data-intensive applications e.g., genome matching in bioinformatics, intrusion detection in cloud computing, etc. It is particularly useful for applications with real-time response demand (e.g., real-time pattern reorganization for national security) that pure software-based approaches cannot meet. In spite of tremendous advantages in performance/cost/energy, design with heterogeneous PCM/CMOS technologies poses new challenges during hardware implementation due to the severely degraded operating margin introduced inherently by technology itself. To address these challenges, in the talk, I will present two enabling techniques: 1) a clocked self-referenced sensing scheme and 2) a two-bit encoding. With these techniques, the fabricated chip can reliably operate at very low voltage (750mV). Finally, I will briefly present two critical techniques to move further into a more cost-effective design based on variable-bit storage.
Dr. Jing Li is a Research Staff Member at IBM T. J. Watson Research Center, Yorktown Heights, NY. She received her Ph.D. degree from the Electrical and Computer Engineering department of Purdue University in 2009 and the B.E. degree from Electrical Engineering department of Shanghai Jiao Tong University in 2004. Her general research interest is developing new computing paradigm driven either by technologies (from bottom-up, including but not limited to emerging nonvolatile memories, flexible electronics, etc.) or by workloads (from top-down, including traditional commercial workloads as well as emerging data-centric workloads) or by both. Her primary area of interest is VLSI design-technology interaction with a strong emphasis on “design for transformation” (rather than “design for replacement”). Dr. Li has received IBM Research Division Outstanding Award in 2012 for successfully achieving CEO milestone, multiple invention achievement awards from IBM from 2010-present, IBM Ph.D. Fellowship Award in 2008, the Dean's and Semester Honors for outstanding scholastics performance from Purdue University in 2007, the Meissner Fellowship from Purdue University in 2004, etc. She was also the recipient of the 2005~2006 Magoon's Award for excellence in teaching from Purdue University. She has published more than 35 technical papers in referred journals and conferences in fields of computer design, CAD, VLSI circuit, device physics, material science, etc. and has more than 35 patents filed/issued. She won the Best Paper Award from IEEE Circuits and Systems Society VLSI Transactions, in recognition of her work as one of the very first papers tackling reliability issues in STT RAM. She has been reviewers for numerous journals and conferences, including COMPUTER, JSSC, TVLSI, ACM JETC, TNANO, TED, EDL, etc., and was recognized as Golden Reviewer by IEEE Electron Device Letters in 2012. She has been serving on the technical committee for IEEE Design Automation Conference (DAC) since 2011. She also represents IBM at premier industry conference ─ IEEE International Memory Workshop (IMW) ─ as a member of Scientific Committee and Organizing Committee.