nal): power for eta update (default: 0.75) t0 (float, optional): point at which to start averaging (default: 1e6) weight_decay (float, optional): weight decay (L2 penalty) (default: 0) {foreach} {maximize} {differentiable} .. _Acceleration of stochastic approximation by averaging: https://dl.acm.org/citation.cfm?id=131098 )