Multilevel training of deep neural networks

Video recording:

Speaker: Alena Kopanicakova (Division of Applied Mathematics, Brown University, Providence, USA)
Title: Multilevel training of deep neural networks
Time: Wednesday, 2022.09.21, 10:00 a.m. (CET)
Place: fully virtual (contact Dr. Jakub Lengiewicz to register)
Format: 30 min. presentation + 30 min. discussion

Abstract: Deep neural networks (DNNs) are routinely used in a wide range of application areas and scientific fields, as they allow to efficiently predict the behavior of complex systems. However, before the DNNs can be effectively used for the prediction, their parameters have to be determined during the training process. Traditionally, the training process is associated with the minimization of a loss function, which is commonly performed using variants of the stochastic gradient (SGD) method. Although SGD and its variants have a low computational cost per iteration, their convergence properties tend to deteriorate with increasing network size. In this talk, we will propose to alleviate the training cost of DNNs by leveraging nonlinear multilevel minimization methods. We will discuss how to construct a multilevel hierarchy and transfer operators by exploring the structure of the DNN architecture, properties of the loss function, and the form of the data. The dependency on a large number of hyper-parameters will be reduced by employing a trust-region globalization strategy. In this way, the sequence of step sizes will be induced automatically by the trust-region algorithm. Altogether, this will give rise to new classes of training methods, convergence properties of which will be analyzed using a series of numerical experiments.