Skip to content

Scaling SGD Batch Size to 32K for ImageNet Training #15

@nocotan

Description

@nocotan

一言でいうと

大規模バッチ学習のためのLayer-wise Adaptive Rate Scaling (LARS)を提案.

論文リンク

https://digitalassets.lib.berkeley.edu/techreports/ucb/text/EECS-2017-156.pdf

著者/所属機関

Yang You, Igor Gitman, Boris Ginsburg (UC Berkeley)

投稿日付(yyyy/MM/dd)

2017/09/16

概要

Screen Shot 2021-01-04 at 14 11 16

新規性・差分

異なるレイヤーで異なる学習率を適用する初の手法.

手法

Screen Shot 2021-01-04 at 14 14 30

Screen Shot 2021-01-04 at 14 14 40

結果

Screen Shot 2021-01-04 at 14 11 26

Screen Shot 2021-01-04 at 14 11 39

Screen Shot 2021-01-04 at 14 11 46

Screen Shot 2021-01-04 at 14 12 02

Screen Shot 2021-01-04 at 14 12 15

コメント

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions