Width of Minima Reached by Stochastic Gradient Descent Is Influenced by Learning Rate to Batch Size Ratio

Lecture Notes in Computer Science - Germany
doi 10.1007/978-3-030-01424-7_39
Full Text
Abstract

Available in full text

Date
Authors
Publisher

Springer International Publishing