Artificial intelligence traiNing scheDuler foR disaggrEgAted resource clusterS
Today, artificial intelligence (AI) and deep learning (DL) methods are exploited in a wide gamut of products. DL models are trained on GPGPU systems, achieving 5-40x speedup wrt CPU-based servers. ANDREAS develops advanced scheduling solutions optimizing DL training run- time performance and their energy consumption in disaggregated GPGPU clusters. 2x speed-up and 50% energy savings are expected.