Efficient Methods and Hardware for Deep Learning, Stanford cs231, Song Han, 2017 link
Efficient Processing of Deep Neural Networks: A Tutorial and Survey, Sze, MIT pdf
Tutorial on Deep Learning Acceleration, Vivien Sze, MIT link
High-Performance Hardware for Machine Learning, W. Dally, 2015 link
Acceleration of Deep Learning in Algorithm & CMOS-based Hardware by me (2019)pdf
Techniques for DL acceleration
Data Parallelism VS Model Parallelism in Distributed Deep Learning Training link
One weird trick for parallelizing convolutional neural networks, Alex Krizhevsky, 2014 link
BinaryConnect: Training Deep Neural Networks with binary weights during propagations, Matthieu Courbariaux, 2016 link
Measuring the Limits of Data Parallel Training for Neural Networks, J. Lee, Mar. 2019link
Industrial trend
BrainWave: Accelerating Persistent Neural Networks at Datacenter Scale, Microsoft, 2017 link
TPU: In-datacenter performance analysis of a tensor processing unit, Google, 2017 link
TPU: Machine Learning for Systems and Systems for Machine Learning, Google, 2017link
Stanford CS217 reading list (2019)
Hardware Accelerators for Machine Learning, Stanford cs217 link
Is Dark Silicon Useful? by M. B. Taylor, 2012 pdf_annotated
Why Systolic Architecture? by H. T. Kung, 1982 pdf_annotated
Anatomy of High Performance Matrix Multiplication by K. Goto pdf_annotated
Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era by A. Pedram, 2016 pdf_annotated
TABLA: A Unified Template-based Framework for Accelerating Statistical Machine Learning by D. Mahajan, 2016 pdf_annotated
Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures by A. Pedram, 2012 pdf_annotated
Spatial: A Language and Compiler for Application Accelerators by K. Olukotun, 2018 pdf_annotated
Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures by Y. S. Shao, 2014 pdf_annotated
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures by D. Patterson pdf_annotated
In-Datacenter Performance Analysis of a Tensor Processing Unit by Google pdf_annotated
Keep going!Keep going ×2!Give me more!Thank you, thank youFar too kind!Never gonna give me up?Never gonna let me down?Turn around and desert me!You're an addict!Son of a clapper!No wayGo back to work!This is getting out of handUnbelievablePREPOSTEROUSI N S A N I T YFEED ME A STRAY CAT