nntoolbox.optim.layerwise module¶
Scaling the learning rate layerwise (HIGHLY EXPERIMENTAL)
-
class
nntoolbox.optim.layerwise.
LAMB
(params, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, scaling_fn: Callable[[torch.Tensor], torch.Tensor] = <function LAMB.<lambda>>, amsgrad: bool = False, correct_bias: bool = True)[source]¶ Bases:
torch.optim.adam.Adam
Implement LAMB algorithm for training with large batch and learning rate
Note that in second version of the paper, bias correction for betas is missing.
References:
-
class
nntoolbox.optim.layerwise.
LARS
(params, lr: float, momentum: float = 0.0, weight_decay: float = 0.0, trust_coefficient: float = 0.001, eps: float = 1e-08)[source]¶ Bases:
torch.optim.sgd.SGD
Implement Layer-wise Adaptive Rate Scaling (LARS) algorithm for training with large batch and learning rate
References: