Review
Reminded that the general form of the optimization problem is as follows
Now in unconstrained optimization, we minimize an objective function that depends on real variables, with no restrictions at all on the values of these variables. The mathematical formulation is
where $\ x\in\R^n\ $ is a real vector with $\ n\ge 1\ $ components and $\ f:\R^n\rightarrow\R\ $ is a smooth function.
What Is A Solution
global minimizer
We say a point $\ x^{*}\ $ is a global minimizer if
local minimizer
Then a local minimizer is that the point is only achieves the smallest value of f in its neighborhood. Formally, we say a point $\ x^{*}\ $ is a local minimizer if there is a neighborhood $\ \mathcal{N}\ $ of $\ x^{*}\ $ such that
A point that satisfies this definition is sometimes called a weak local minimizer. A strict local minimizer is then the outright winner in its neighborhood. Formally, A point $\ x^{*}\ $ is a strict local minimizer
The last type of minimizer is called an isolated minimizer, which there is a neighborhood$\ \mathcal{N}\ $of$\ x^{*}\ $such that$\ x^{*}\ $is the only local minimizer in$\ \mathcal{N}\ $.
All isolated local minimizers are strict, while the reverse does not hold.
Recognizing A Local Minimum
It seems that we have to check all the points to make sure none of them has a smaller function value then as $\ x^{*}$’s did. Fortunately, supposing the objective function to be twice continuously differentiable, we may be able to tell that whether $\ x^{*}\ $is a local minimizer by examining just the gradient $\ \triangledown f(x^{*})\ $and the Hessian matrix $\ \triangledown^2 f(x^{*})$.
Theorem (Taylor’s Theorem)
suppose that $\ f:\R^n\rightarrow\R\ $is continuously differentiable and that $p \in \R^n$. Then we have that
for some $t\in (0,1)$. Moreover, if f is twice continuously differentiable, we have that
and that
for some $t\in (0,1)$.
Theorem (First-Order Necessary Conditions)
If $\ x^{*}\ $ is local minimizer and f is continuously differentiable in an open neighborhood of $\ x^{*}\ $, then $\ \triangledown f(x^{*})=0$.
Theorem (Second-Order Necessary Conditions)
If $\ x^{*}\ $is a local minimizer of f and $\ \triangledown^2f\ $ exists and is continuous in an open neighborhood of $\ x^{*}\ $, then $\ \triangledown f(x^{*})=0\ $and $\ \triangledown^2 f(x^{*})\ $is positive semidefinite.
Theorem (Second-Order Sufficient Conditions)
Suppose that $\ \triangledown^2f\ $ is continuous in an open neighborhood of $\ x^{*}\ $, and that $\ \triangledown f(x^{*})=0\ $and $\ \triangledown^2 f(x^{*})\ $is positive definite. Then $\ x^{*}\ $ is a strict local minimizer of f.
Theorem (Transformation from local-m to global-m)
When f is convex, any local minimizer $\ x^{*}\ $is a global minimizer of f. If in addition f is differentiable, then any stationary point $\ x^{*}\ $is a global minimizer of f.
Overview of Algorithm
Our idea is simple. Beginning at a starting point $\ x_0$, optimization algorithms generate a sequence of iterates $\ \{x_k\}_{k=0}^{\infty}\ $ that terminate when either no more progress can be made or when it seems that a solution point has been approximated by current point $\ x_k\ $with sufficient accuracy.
Intuitively, when we generate such a sequence of iterates, we hope that the corresponding function value sequence $\ \{f(x_k)\}_{k=0}^{\infty}\ $ ought to decease (if we try to solve a minimization problem, otherwise to increase when facing a maximization problem) or at least we can decrease the objective function value $\ f(x_k)\ $ with several steps of iteration, that is $\ f(x_k)\lt f(x_{k-m})$.
There are two fundamental strategies for moving from the current point $\ x_k\ $to a new iterate $\ x_{k+1}$. Most of the algorithms describe in this book follow one of these approaches.
two strategies: line search and trust region
line search
In the line search strategy, the algorithm chooses a direction $\ p_k\ $and searches along this direction from the current iterate $\ x_k\ $for a new iterate with a lower function value. The distance to move along $\ p_k\ $can be found by approximately solving the following one dimensional minimization problem to find a step length $\ \alpha$:
By solving the problem above exactly, we would derive the maximum benefit from the direction $\ p_k$, but an exact minimization may be expensive. Instead, the line search algorithm generates a limited number of trial step lengths until it finds one that loosely approximates the minimum of the problem above.
trust region
In the second algorithmic strategy, known as trust region, the information gathered about f is used to construct a model function $\ m_k\ $whose behavior near the current point $\ x_k\ $is similar to that of the actual objective function f. Because the model $\ m_k\ $may be a good approximation of f when x is far from $\ x_k$. In other words, we restrict the search for a minimizer of $\ m_k\ $to some region around $\ x_k$. In other words, we find the candidate step p by approximately solving the following subproblem:
If the candidates solution does not produce a sufficient decrease in f, we conclude that the trust region is too large, and we shrink it and re-solve the problem.
The model $\ m_k\ $in the problem above is usually defined to be a quadratic function of the form
where $\ f_k$, $\triangledown f_k$, and $B_k\ $are scalar, vector, and matrix, respectively. The matrix $\ B_k\ $is either the Hessian $\ \triangledown^2f_k\ $or some approximation to it.
Reference
Numerical Optimization (2nd) by Jorge Nocedal, Stephen J. Wright