A Differential Equation for Modeling Nesterov’s
Accelerated Gradient Method: Theory and Insights
Weijie Su, Stephen Boyd, E...
Overview
1 Introduction
Smooth Unconstrained Optimization
Accelerated Scheme
Ordinary Differential Equation
2 Derivation of...
Introduction
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 3 / 40
Smooth Unconstrained Optimization
We wish to minimize a smooth convex function
minimize f (x)
where f : Rn → R has a Lipsc...
Introduction: Accelerated Scheme
Nesterov’s Accelerated Gradient Scheme
xk = yk−1 − s f (yk−1) (1)
yk = xk +
k − 1
k + 2
(...
Accelerated Scheme: Oscillation Problem
In general, Nesterov’s scheme is not monotone in the objective function.
(due to i...
Introduction: Second order ODE
Derive a second order ordinary differential equation(ODE), which is the
exact limit of Neste...
An Example: Trajectories
Minimize f = 0.02x2
1 + 0.005x2
2
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 20...
An Example: Zoomed Trajectories
Minimize f = 0.02x2
1 + 0.005x2
2
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Confer...
An Example: Errors f − f ∗
Minimize f = 0.02x2
1 + 0.005x2
2
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference ...
Derivation of the ODE
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 11 / 40
Derivation of the ODE
Assmue f ∈ FL for L > 0, combine two equations of (1) and applying
rescaling give
xk+1 − xk
√
s
=
k ...
Derivation of the ODE
The formula (4) can be rewritten as
˙X(t) +
1
2
¨X(t)
√
s + o(
√
s)
= (1 −
3
√
s
t
){ ˙X(t) −
1
2
¨X...
Simple Properties
Invariance
ODE is invariant under time change and invariant under transformation
Initial asymptotic
Assu...
Connections and Interpretations
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2...
Analogous Convergence Rate
Now, we exhibit approximate equivalence between the ODE and
Nesterov’s scheme in terms of conve...
Proof of Theorem 3.2
Consider energy functional defined as
ε(t) := t2
(f (X(t)) − f ∗
) + 2||X +
t
2
˙X − x∗
||2
whose time...
Quadratic f and Bessel function
For quadratic f , we have
f (x) =
1
2
x, Ax + b, x
where A ∈ Rn×n. Simple translation can ...
Quadratic f and Bessel function
Introduce Yi (u) = uXi (u/
√
λi ), which satisfies
u2 ¨Yi + u ˙Yi + (u2
− 1)Yi = 0
This is ...
Quadratic f and Bessel function: Example
Minimizef = 0.02x2
1 + 0.005x2
2 , whose eigenvalues are λ1,2 = 0.02, 0.005
f (X)...
Equivalence between the ODE and Nesterov’s scheme
We study the stable step size for numerically solving ODE. The finite
diff...
A family of generalized Nesterov’s schemes
Exploit the power of ODE. We would be interested in studying the ODE
(3) with t...
Generalized Nesterov’s Scheme: Continuous Optimization
Theorem (4.1)
Suppose r > 3 and let X be the unique solution to (6)...
Generalized Nesterov’s Scheme: Continuous Optimization
For example, the solution to (6) with f (x) = ||x||2/2 is
X(t) =
2
...
Generalized Nesterov’s Scheme: Composite Optimization
Skip this part
min
x∈Rn
f (x) = g(x) + h(x)
where g ∈ FL for some L ...
New Restart Scheme
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 26 / 40
Restart Scheme: Previous Works
Restart: Erase the memory of previous iterations and resets the
momentum back to zero
Funct...
Restart Scheme: Previous Works
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 20...
Restart Scheme: Previous Works
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 20...
New Restart Scheme: Speed Restart
This work provides a new restarting strategy, called speed restarting
scheme. The underl...
Accelerating to linear convergence by restarting
The speed restarting ODE is thus
¨X(t) +
3
tsr
˙X(t) + f (X(t)) = 0 (7)
w...
Accelerating to linear convergence by restarting
Lemma 5.2
There is a universal constant C > 0 such that
f (X(T)) − f (x∗
...
Accelerating to linear convergence by restarting
Applying Lemma 5.2, 5.3, we have
Theorem (5.1)
There exists positive cons...
Numerical examples: algorithm of speed restarting
Below we present a discrete analog to restarted scheme.
Weijie Su, Steph...
Quadratic
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 35 / 40
Log-sum-exp
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 36 / 40
Matrix compleltion
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 37 / 40
Lasso in l1-constrainted form with large space design
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (M...
References
W. Su, S. Boyd, E. Candes (2014)
A Differential Equation for Modeling Nesterovs Accelerated Gradient Method:
The...
thanks!
Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 40 / 40
of 40

NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method

Paper reading and presentation of the work of NIPS 2014
Published on: Mar 3, 2016
Published in: Data & Analytics      
Source: www.slideshare.net


Transcripts - NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method

  • 1. A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 speaker: kv MCLab, CITI Academia Sinicas May 14, 2015 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 1 / 40
  • 2. Overview 1 Introduction Smooth Unconstrained Optimization Accelerated Scheme Ordinary Differential Equation 2 Derivation of the ODE Simple Properties 3 Equivalence between the ODE and Nesterov’s scheme Analogous Convergence Rate Quadratic f and Bessel function Equivalence between the ODE and Nesterov’s Scheme 4 A family of generalized Nesterov’s schemes Continuous Optimization Composite Optimization 5 New Restart Scheme 6 Accelerating to linear convergence by restarting Numerical examples 7 Discussion Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 2 / 40
  • 3. Introduction Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 3 / 40
  • 4. Smooth Unconstrained Optimization We wish to minimize a smooth convex function minimize f (x) where f : Rn → R has a Lipschitz continuous gradient f (x) − f (y) 2 < L x − y 2 µ-strong convexity f (x) − µ x 2 /2 In this paper, FL denotes the class of convex function f with L-Lipschitz continuous gradients defined on Rn; Sµ denotes the class of µ-strongly convex function f on Rn. We set Sµ,L = FL ∩ Sµ Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 4 / 40
  • 5. Introduction: Accelerated Scheme Nesterov’s Accelerated Gradient Scheme xk = yk−1 − s f (yk−1) (1) yk = xk + k − 1 k + 2 (xk − xk−1) (2) For fixed step size s = 1/L, where L is Lipschitz constant of f , this scheme exhibits the convergence rate f (xk) − f ∗ ≤ O( L||x0 − x∗||2 k2 ) This improvement relies on the introduction to momentum xk − xk−1 and the particularly tuned coefficient (k − 1)/(k + 2) = 1 − 3/(k + 2) Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 5 / 40
  • 6. Accelerated Scheme: Oscillation Problem In general, Nesterov’s scheme is not monotone in the objective function. (due to introduction to the momentum) Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 6 / 40
  • 7. Introduction: Second order ODE Derive a second order ordinary differential equation(ODE), which is the exact limit of Nesterov’s scheme by taking small step size ¨X + 3 t ˙X + f (X) = 0 (3) for t > 0, with initial condition X(0) = x0, ˙X(0) = 0; here x0 is the starting point in Nesterov’s scheme. ˙X denotes to velocity and ¨X is acceleration. Small t: large 3/t leads the ODE to be an over-damped system Large t: As t increases, system behaves like under-damped system, oscillating with amplitude decreases to zero Time parameter in this ODE is related to step size t ∼ k √ s Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 7 / 40
  • 8. An Example: Trajectories Minimize f = 0.02x2 1 + 0.005x2 2 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 8 / 40
  • 9. An Example: Zoomed Trajectories Minimize f = 0.02x2 1 + 0.005x2 2 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 9 / 40
  • 10. An Example: Errors f − f ∗ Minimize f = 0.02x2 1 + 0.005x2 2 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 10 / 40
  • 11. Derivation of the ODE Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 11 / 40
  • 12. Derivation of the ODE Assmue f ∈ FL for L > 0, combine two equations of (1) and applying rescaling give xk+1 − xk √ s = k − 1 k + 2 xk − xk−1 √ s − √ s f (yk) (4) Introducfelis e the ansatz xk ∼ X(k √ s) for smooth curve X(t) define for t > 0. With these approximations, we get Taylor expansions: (xk+1 − xk)/ √ s = ˙X(t) + 1 2 ¨X(t) √ s + o( √ s) (xk − xk−1)/ √ s = ˙X(t) − 1 2 ¨X(t) √ s + o( √ s) √ s f (yk) = √ s f (X(t)) + o( √ s) where the third equality we use yk − X(t) = o(1) Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 12 / 40
  • 13. Derivation of the ODE The formula (4) can be rewritten as ˙X(t) + 1 2 ¨X(t) √ s + o( √ s) = (1 − 3 √ s t ){ ˙X(t) − 1 2 ¨X(t) √ s + o( √ s)} − √ s f (X(t)) + o( √ s) By comparing the coefficients of √ s, we obtain ¨X + 3 t ˙X + f (X) = 0 Theorem (Well posed ODE, Existence and Uniqueness) For any f ∈ F∞ := UL>0FL and any x0 ∈ Rn, the ODE (3) with initial conditions X(0) = x0, ˙X(0) = 0 has an unique global solution X. Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 13 / 40
  • 14. Simple Properties Invariance ODE is invariant under time change and invariant under transformation Initial asymptotic Assume sufficient smoothness of X, that limt→0 ¨X exists. The Mean Value Theorem guarantees the existence ζ ∈ (0, t) that ˙X(t)/t = ( ˙X − X(0))/t = ¨X(ζ) Hence the ODE deduces to ¨X(t) + 3 ¨X(ζ) + f (X(t)) = 0 Taking t → 0 (for small t), we have X(t) = − f (x0)t2 8 + x0 + o(t2 ) Consistent with the empirical observation the Nesterov’s scheme moves slowly in the beginning. Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 14 / 40
  • 15. Connections and Interpretations Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 15 / 40
  • 16. Analogous Convergence Rate Now, we exhibit approximate equivalence between the ODE and Nesterov’s scheme in terms of convergence rate. Theorem (Discrete Nesterov Scheme 3.1) For any f ∈ FL, the sequence {xk} in 1 with step size s ≤ 1/L obeys f (xk) − f ∗ ≤ 2||x0 − x∗||2 s(k + 1)2 First result indicates the trajectory of ODE (3) closely resembles the sequence {xk} in terms of the convergence rate Theorem (Continuous ODE Scheme 3.2) For any f ∈ F∞, let X(t) be the unique global solution to (3) with initial conditions X(0) = x0, ˙X(0) = 0, for any t > 0 f (X(t)) − f ∗ ≤ 2||x0 − x∗||2 t2 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 16 / 40
  • 17. Proof of Theorem 3.2 Consider energy functional defined as ε(t) := t2 (f (X(t)) − f ∗ ) + 2||X + t 2 ˙X − x∗ ||2 whose time derivative is ˙ε(t) = 2t(f (X) − f ∗ ) + t2 f , ˙X + 4 X + t 2 ˙X − x∗ , 3 2 ˙X + t 2 ¨X Substituting 3 ˙X/2 + t ¨X/2 with −t f (X)/2 ˙ε(t) = 2t(f (X) − f ∗ ) + 4 X − x∗ , −t f (X)/2 = 2t(f (X) − f ∗ ) − 2t X − x∗ , f (X) ≤ 0 Hence the monotonicity of ε and non-negativity of 2||X + t 2 ˙X − x∗||2, the gap obeys f (X(t)) − f ∗ ≤ ˙ε(t) t2 ≤ ˙ε(0) t2 = 2||x0 − x∗||2 t2 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 17 / 40
  • 18. Quadratic f and Bessel function For quadratic f , we have f (x) = 1 2 x, Ax + b, x where A ∈ Rn×n. Simple translation can absorb the liner term b, x . We assume A is positive semi-definite, admitting spectral decomposition A = QT ΛQ, replace x with Qx, we assume f = 1/2 x, Λx . The ODE admits ¨Xi + 3 t ˙Xi + λi Xi = 0 where i = 1, ..., n Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 18 / 40
  • 19. Quadratic f and Bessel function Introduce Yi (u) = uXi (u/ √ λi ), which satisfies u2 ¨Yi + u ˙Yi + (u2 − 1)Yi = 0 This is the Bessel’s Differential Equation with order 1. Apply asymptotic expansion, we obtain Xi (t) = 2xx0,i t √ λi J1(t λi ) For t is large, the Bessel function has asymptotic form J1(t) = 2 πt (cos(t − 3π/4) + O(1/t)) Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 19 / 40
  • 20. Quadratic f and Bessel function: Example Minimizef = 0.02x2 1 + 0.005x2 2 , whose eigenvalues are λ1,2 = 0.02, 0.005 f (X) − f ∗ = f (X) = n i=1 2x2 0,i t2 J1(t λi )2 Denote two major period T1, T2. We get T1 = π/ √ λ1 = 22.214 and T2 = π/ √ λ2 = 44.423 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 20 / 40
  • 21. Equivalence between the ODE and Nesterov’s scheme We study the stable step size for numerically solving ODE. The finite difference approximation of (3) by the forward Euler method X(t + ∆t) − 2X(t) + X(t + ∆t) ∆t2 + 3 t X(t) − X(t − ∆t) ∆t + f (X(t)) = 0 which is equivalent to X(t + ∆t) = (2 − 3∆t t )X(t) − ∆t2 f (X(t)) − (1 − ∆t t )X(t − ∆t) Assuming that f is sufficiently smooth, for small perturbation. The characteristic equation of this finite difference scheme is approximately (identify k = t/∆t) det{λ2 − (2 − ∆t2 2 f − 3∆t t )λ + 1 − 3∆t t } = 0 (5) For numerical stability, all the roots of (5) should lie on unit circle. Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 21 / 40
  • 22. A family of generalized Nesterov’s schemes Exploit the power of ODE. We would be interested in studying the ODE (3) with the number of 3 appearing the coefficient of ˙X/t replaced by a general constant r as ¨X + r t ˙X + f (X) = 0, X(0) = x0, ˙X(0) = 0 (6) Using the argument similar to theorem 2.1, this ODE is guaranteed to assume a unique global solution for any f ∈ F∞ Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 22 / 40
  • 23. Generalized Nesterov’s Scheme: Continuous Optimization Theorem (4.1) Suppose r > 3 and let X be the unique solution to (6) for some f ∈ F∞. Then X(t) obeys f (X(t)) − f ∗ ≤ (r − 1)2||x0 − x∗||2 2t2 and ∞ 0 t(f (X(t)) − f ∗ )dt ≤ (r − 1)2||x0 − x∗||2 2(r − 3) Theorem (4.2) For any f ∈ Sµ,L(Rn), the unique solution X to (6) with r ≥ 9/2 obeys f (X(t)) − f ∗ ≤ Cr5/2||x0 − x∗||2 t3√ µ Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 23 / 40
  • 24. Generalized Nesterov’s Scheme: Continuous Optimization For example, the solution to (6) with f (x) = ||x||2/2 is X(t) = 2 r−1 2 Γ((r + 1)/2)J(r−1)/2(t) t(r−1)/2 where J(r−1)/2(.) is the first kind of Bessel function of order (r − 1)/2. For large t, this Bessel function obeys J(r−1)/2(t) = 2/(πt)(cos(t − rπ/4) + O(1/t). Hence f (X(t)) − f ∗ ≤ ||x0 − x∗ ||2 /tr Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 24 / 40
  • 25. Generalized Nesterov’s Scheme: Composite Optimization Skip this part min x∈Rn f (x) = g(x) + h(x) where g ∈ FL for some L > 0 and h is convex on Rn with possible extended value ∞. Define proximal subgradient Gs(x) := x − arg minz{||z − (x − s g(x))||2/(2s) + h(z)} s Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 25 / 40
  • 26. New Restart Scheme Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 26 / 40
  • 27. Restart Scheme: Previous Works Restart: Erase the memory of previous iterations and resets the momentum back to zero Function Scheme: we restart whenever f (xk ) > f (xk−1 ) Gradient Scheme: we restart whenever f (yk−1 )T (xk − xk−1 ) > 0 Refer to: Adaptive Restart for Accelerated Gradient Schemes, Brendan ODonoghue Emmanuel Cands, 2012 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 27 / 40
  • 28. Restart Scheme: Previous Works Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 28 / 40
  • 29. Restart Scheme: Previous Works Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 29 / 40
  • 30. New Restart Scheme: Speed Restart This work provides a new restarting strategy, called speed restarting scheme. The underlying motivation is to maintain relatively high velocity ˙X along the trajectory. Definition 5.1 For ODE (3) with X(0) = x0, ˙X(0) = 0, let T = T(f , x0) = sup{t > 0 : ∀u ∈ (0, t), || ˙X(u)||2 du > 0} be the speed restarting time. In words, T is the first time the velocity || ˙X|| decreases. Indeed, f (X(t)) is the decreasing function before time T, for t < T, df (X(t)) dt =< f (X), ˙X >= − 3 t || ˙X||2 − 1 2 || ˙X||2 dt < 0 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 30 / 40
  • 31. Accelerating to linear convergence by restarting The speed restarting ODE is thus ¨X(t) + 3 tsr ˙X(t) + f (X(t)) = 0 (7) where tsr is set to zero whenever < ˙X, ¨X >= 0. We have following observations Xsr (t) is continuous for t ≤ 0, with Xsr (0) = x0 Xsr (t) satisfied (3) for 0 < t < T1 := T(x0; f ) Recursively define Ti+1 = Ti (Xsr ( Tj ; f ) for i ≤ 1 and ˜X(t) = Xsr ( Tj + t) Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 31 / 40
  • 32. Accelerating to linear convergence by restarting Lemma 5.2 There is a universal constant C > 0 such that f (X(T)) − f (x∗ ) ≤ (1 − Cµ L )(f (x0) − f ∗ ) Guarantee each restarting reduces the error by a constant factor Lemma 5.3 There is a universal constant ˜C such that T ≤ 4exp( ˜CL µ ) 5 √ L An upper bound for T. It conforms that restartings are adequate Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 32 / 40
  • 33. Accelerating to linear convergence by restarting Applying Lemma 5.2, 5.3, we have Theorem (5.1) There exists positive constants c1 and c2, which only depend on the condition number L/µ, such that for any f ∈ Sµ,L, we have f (Xsr (t)) − f (x∗ ) ≤ c1L||x0 − x∗||2 2 exp−c2t √ L The theorem guarantees linear convergence of solution to (7). This is new result in the literature. where c1 = exp(Cµ/L) and c2 = 5Cµ 4L e(−˜Cµ/L) Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 33 / 40
  • 34. Numerical examples: algorithm of speed restarting Below we present a discrete analog to restarted scheme. Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 34 / 40
  • 35. Quadratic Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 35 / 40
  • 36. Log-sum-exp Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 36 / 40
  • 37. Matrix compleltion Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 37 / 40
  • 38. Lasso in l1-constrainted form with large space design Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 38 / 40
  • 39. References W. Su, S. Boyd, E. Candes (2014) A Differential Equation for Modeling Nesterovs Accelerated Gradient Method: Theory and Insights NIPS 2014 Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 39 / 40
  • 40. thanks! Weijie Su, Stephen Boyd, Emmanuel J. Candes, NIPS Conference 2014 (MCLab)ODE-NAG May 14, 2015 40 / 40

Related Documents