Tencent Conference ID:144-899-652
Mingrui Liu, Associate Professor, George Mason University
The current analysis of deep learning optimization algorithms typically requires the landscape to be smooth. However, there is a class of deep neural networks which is inherently nonsmooth, with potentially unbounded smoothness parameter. The existing work addressed this issue by gradient clipping, but this algorithm is not scalable in the multi-agent distributed learning setting, and not adaptive for certain popular deep neural networks such as transformers.
In this talk, I will show case two results. First, I will consider the distributed deep learning setting where there are multiple machines and limited communication. I will introduce a new communication-efficient local stochastic gradient clipping algorithm in the presence of unbounded smoothness. The main result is that this algorithm provably enjoys linear speedup and requires significant fewer communication rounds.
Second, I will consider a class of deep neural networks with layer-wise unbounded smoothness parameter such as transformers, and introduce an new adaptive algorithm which resembles the popular Adam algorithm. This result provably shows the benefit of the proposed Adam-type algorithm compared with non-adaptive gradient algorithms such as gradient descent in the unbounded smoothness setting.
These two results are based on the following two papers which are going to be appear at Neural Information Processing Systems 2022 (NeurIPS 2022): https://arxiv.org/pdf/2205.05040.pdf, https://arxiv.org/pdf/2208.11195.pdf.
Brief introduction of the speaker:
Mingrui Liu is an assistant professor at Department of Computer Science at George Mason University since August 2021. Before that he was a postdoctoral fellow at Rafik B. Hariri Institute for Computing at Boston University from 2020-2021. He received his Ph.D. in Computer Science at The University of Iowa in August 2020. He has also spent time working at IBM T. J. Watson Research Center. His research interests include machine learning, mathematical optimization, statistical learning theory, and deep learning.