nntoolbox.optim.layerwise module¶

Scaling the learning rate layerwise (HIGHLY EXPERIMENTAL)

class nntoolbox.optim.layerwise.LAMB(params, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, scaling_fn: Callable[[torch.Tensor], torch.Tensor] = <function LAMB.<lambda>>, amsgrad: bool = False, correct_bias: bool = True)[source]¶

Bases: torch.optim.adam.Adam

Implement LAMB algorithm for training with large batch and learning rate

Note that in second version of the paper, bias correction for betas is missing.

References:

https://arxiv.org/pdf/1904.00962.pdf

step(closure=None)[source]¶

Performs a single optimization step.

Arguments:

closure (callable, optional): A closure that reevaluates the model: and returns the loss.

class nntoolbox.optim.layerwise.LARS(params, lr: float, momentum: float = 0.0, weight_decay: float = 0.0, trust_coefficient: float = 0.001, eps: float = 1e-08)[source]¶

Bases: torch.optim.sgd.SGD

Implement Layer-wise Adaptive Rate Scaling (LARS) algorithm for training with large batch and learning rate

References:

https://arxiv.org/pdf/1708.03888.pdf

step(closure=None)[source]¶

Performs a single optimization step.

Arguments:

closure (callable, optional): A closure that reevaluates the model: and returns the loss.

nntoolbox.optim.layerwise module¶

nn-toolbox

Navigation

Related Topics