Date of Award
Doctor of Philosophy (PhD)
Electrical and Computer Engineering
Povinelli, Richard J.
The computational demands for training deep learning models doubled every three months recently. However, according to Moore’s Law, the computational power available only doubled every two years. To bridge this demand-supply gap while optimizing energy consumption and carbon emission, through my dissertation, we propose a novel algorithm-architecture-hardware co-design cross-layer approach for computing systems: from chip multicore to the cloud. At the Chip Multicore Level: How can we design high-performance network-on-chip based multiprocessors that are reliable to uncertainty in design parameters? This dissertation answered this question by 1) laying the foundation for uncertainty modeling and robust multi-objective optimization for embedded systems design and 2) providing computer-aided design (CAD) automation tools, which incorporate a novel design method to achieve this multi-level goal. Chapter 3 proposed the first uncertainty aware reliability model for NoC based chip multicore; it integrated uncertainty models as a new design methodology. Chapter 4, for the first time, applied the info-gap theory to uncertainty modeling in the context of embedded systems design. We developed uncertainty-aware and reliability-oriented CAD tools that can identify the most robust design solutions that compose the 3D Pareto frontier. This has not been done before. We demonstrated that significant differences between actual values and estimations of design attributes exist when uncertainty in design parameters is considered. At the Server and Cluster Levels: How should we build generic and effective machine learning models to improve datacenter scheduling algorithms? This dissertation answered this question by using deep learning models within a unified hierarchical approach for scheduling that combines cluster and node levels scheduling while modeling interference and heterogeneity and considering performance and energy usage as design objectives. Chapter 5 combines a unified approach cluster and node level scheduling algorithms, and it can consider specific optimization objectives including job completion time, energy usage, and energy delay product (EDP). Experimental results demonstrated that this approach outperforms state-of-the-art schedulers from industry and academia by 41.98% in energy delay product (EDP), 38.65% in energy usage, and 10.2% in job completion time. Chapter 6 harnesses additional external knowledge about applications and servers and exploits simplicity to develop AI-assisted datacenter scheduling.
Available for download on Monday, April 28, 2025