nntoolbox.optim.layerwise module

Scaling the learning rate layerwise (HIGHLY EXPERIMENTAL)

class nntoolbox.optim.layerwise.LAMB(params, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, scaling_fn: Callable[[torch.Tensor], torch.Tensor] = <function LAMB.<lambda>>, amsgrad: bool = False, correct_bias: bool = True)[source]

Bases: torch.optim.adam.Adam

Implement LAMB algorithm for training with large batch and learning rate

Note that in second version of the paper, bias correction for betas is missing.

References:

step(closure=None)[source]

Performs a single optimization step.

Arguments:
closure (callable, optional): A closure that reevaluates the model

and returns the loss.

class nntoolbox.optim.layerwise.LARS(params, lr: float, momentum: float = 0.0, weight_decay: float = 0.0, trust_coefficient: float = 0.001, eps: float = 1e-08)[source]

Bases: torch.optim.sgd.SGD

Implement Layer-wise Adaptive Rate Scaling (LARS) algorithm for training with large batch and learning rate

References:

step(closure=None)[source]

Performs a single optimization step.

Arguments:
closure (callable, optional): A closure that reevaluates the model

and returns the loss.