Volume 1, Issue 1, Year 2012 - Pages 1-8
DOI: 10.11159/ijecs.2012.001
Constrained Nonlinear Least Squares: A Superlinearly Convergent Projected Structured Secant Method
Nezam Mahdavi-Amiri, Narges Bidabadi
Sharif University of Technology, Faculty of Mathematical Sciences
Azadi Street, Tehran, Iran
nezamm@sina.sharif.edu, n_bidabadi@mehr.sharif.ir
Abstract - Numerical solution of nonlinear least-squares problems is an important computational task in science and engineering. Effective algorithms have been developed for solving nonlinear least squares problems. The structured secant method is a class of efficient methods developed in recent years for optimization problems in which the Hessian of the objective function has some special structure. A primary and typical application of the structured secant method is to solve the nonlinear least squares problems. We present an exact penalty method for solving constrained nonlinear least-squares problems, when the structured projected Hessian is approximated by a projected version of the structured BFGS formula and give its local two-step Q-superlinear convergence. For robustness, we employ a special nonsmooth line search strategy, taking account of the least squares objective. We discuss the comparative results of the testing of our programs and three nonlinear programming codes from KNITRO on some randomly generated test problems due to Bartels and Mahdavi-Amiri. Numerical results also confirm the practical relevance of our special considerations for the inherent structure of the least squares.
Keywords: Constrained Nonlinear Least Squares, Exact Penalty Methods, Projected Hessian Updating, Structured Secant Updates, Two-Step Superlinear Convergence
© Copyright 2012 Authors - This is an Open Access article published under the Creative Commons Attribution License terms. Unrestricted use, distribution, and reproduction in any medium are permitted, provided the original work is properly cited.
1. Introduction
Consider the constrained nonlinear least squares (CNLLS) problem,

where, ,
,
,
, and
,
, are
functions from
to
, all assumed
to be twice continuously differentiable. These problems arise as experimental
data analysis problems in various areas of science and engineering such as
electrical engineering, medical and biological imaging, chemistry, robotics,
vision, and environmental sciences; e.g., see Nievergelt (2000), Golub and
Pereyra
(2003) and Mullen et al. (2007). The gradient and Hessian of
can be expressed as

and

where, is the matrix whose columns are the
gradients
, and

Using this special structure for approximating the Hessian matrix has been the subject of numerous research papers; see Fletcher and Xu (1987), Dennis et al. (1989), Mahdavi-Amiri and Bartels (1989), Li et al. (2002), Mahdavi-Amiri and Ansari (2012a), and Bidabadi and Mahdavi-Amiri (2012). Here, we consider the structured BFGS approximate formula (Dennis et al., 1989) for the structured least squares projected Hessian and a special line search scheme given by Mahdavi-Amiri and Ansari (2012c) leading to a more efficient algorithm than general nonlinear programming schemes. An exact penalty function for nonlinear programming problems is defined to be

where, is a penalty parameter. If
is a stationary point of (1) and the
gradients of the active constraints at
are
linearly independent, then there exists a real number
such
that
is also a stationary point of
, for each
. Mahdavi-Amiri and Bartels (1989), based
on a general approach for nonlinear problems (See Coleman and Conn, 1982a),
proposed a special structured algorithm for minimization of
with a fixed value of
in solving the CNLLS
problems. Let
be a small
positive number used to identify the near-active (
-active)
constraint set. The algorithm obtains search directions by using an
-active set

and a corresponding merit function:

where,

Step lengths are chosen
by using, which is
with
. The optimality conditions are checked by
using
, as well. The gradient and Hessian of
are:

and

where,

It is well known that
the necessary conditions for to be an isolated local
minimizer for
, under the assumptions made above on
and the
, are
that there exist multipliers,
, for
, such that


A point for which only (12) above is satisfied is
called a stationary point of
. A minimizer
must be a stationary
point satisfying (13). One major premise of the algorithm is that the
multipliers are only worth estimating in the neighborhoods of stationary
points. Nearness to a stationary point is governed by a stationary tolerance
. The algorithm is considered to be in a
local state, if norm of the projected or reduced gradient, i.e.,
, is smaller than this tolerance, and it is
in a global state, otherwise. Fundamental to the approach is the following
quadratic problem:

where,

the are the Lagrange multipliers associated
with (14) in a local state (in the proximity of a stationary point) and the
are taken to be zero when the algorithm is
in a global state (far from a stationary point). In practice, the QR
decomposition of
is used to solve the quadratic
problem:

If we set , for some
, then
is to be found by solving

Therefore, for solving
the quadratic problem (14), we need an approximation for the projected or
reduced Hessian . We are to present a
projected structured BFGS update formula for computing an approximation
by providing
a quasi-Newton approximation
, where,

with the setting . We give the asymptotic convergence
results of exact penalty methods using this projected structured BFGS updating
scheme. Consider the asymptotic case, that is, the case that the final active
set has been identified, so that for all further
, with
designating the optimal point, we have

Suppose that

and we want to update to
,
approximating

Note that, where
is the
number of active constraints. We assume rank (
) =
, for all
. Letting
and
, we have
, where, in order to simplify notation,
the presence of a bar above a quantity indicates that it is taken at iteration
, and the absence of a bar indicates
iteration
. If the constraints are linear, then
, for
. In the
nonlinear case, asymptotically it is expected that
be
negligible and thus we obtain (Mahdavi-Amiri and Bartels, 1989):

Thus, we use the
quasi-Newton formula to update to
according to the secant equation
, when
has actually
become negligible, that is, when

for, where
is the
iteration number and
denotes the
Euclidean norm, as suggested by Nocedal and Overton (1985) for general
constrained nonlinear programs. Therefore, an approximation
to
is

where,

A general framework for exact penalty algorithm to minimize (5) is given below (see Mahdavi-Amiri and Bartels, 1989).
Algorithm 1: An Exact Penalty Method.
Step 0: Give an
initial point and
.
Step 1: Determine,
,
,
. Identify
the matrix
by computing the QR decomposition of
and set global=true, optimal=false.
Step 2: If , then obtain the search direction
that is a solution of the quadratic
problem (14), where,
approximates the global
Hessian
, and go to Step 5.
Step 3: Determine the
Lagrange multipliers,
, as a solution of

If conditions (13) are
not satisfied, then choose an index for which
one of (13) is violated, and determine the search direction
that satisfies the system of equations
, where,
is the
th unit vector, and
, and go to Step 5.
Step 4: Set
global=false. Determine the direction , where,
is the solution to the quadratic problem
(14),
approximates the local Hessian
with
the
being the Lagrange multipliers
associated with (14) as determined in Step 3. The vertical direction
, is the solution to the system

where, is the vector of the constraint
functions, ordered in accordance with the columns of
.
Set
the step length
and go to Step 6.
Step 5: Determine
step length using a line search procedure on
.
Step 6: Compute . If a sufficient decrease has been
obtained, then set
, else go to Step 8.
Step 7: If
global=false, then check the optimality conditions for . If
is optimal, then set optimal=true and
stop, else go to Step 1.
Step 8: If
global=true and , then reduce
to change
, else reduce
so that
becomes
large tested against
.
Step 9: If
(global=true and ) or
is too small or
is too
small, then report failure and stop, else go to Step 1.
Remarks: In Step 6, we
have used an appropriate nonsmooth line search strategy to determine the step
length satisfying a sufficient decrease in that is
characterized by the line search assumption; see Coleman and Conn (1982b), part
(v), P. 152. For a recent line search strategy for CNLLS problems, see
Mahdavi-Amiri and Ansari (2012a, 2012c). In Step 7, the optimality conditions
are checked as follows: If
is small
enough, then determine the multipliers
,
, as the least squares solution of (26).
If the conditions (13) are satisfied, then
is considered
to satisfy the first order conditions, and thus
being a
stationary point, the algorithm stops, as commonly practiced in optimization
algorithms. Of course, second order conditions are needed to be checked to
ascertain optimality of
.
In the remainder of our
work, we drop the index from
and
, for
simplicity.
For approximating the
projected structured Hessians in solving the quadratic problem, we make use of the
ideas of Dennis et al. (1989) in a different context. We consider the
structured BFGS update of , given by

where,



Remark: Clearly, , given by (32), satisfies the secant
equation
.
Here, we use the BFGS secant updates for approximating the projected structured Hessians in solving the constrained nonlinear least squares problem.
The remainder of our work is organized as follows. In § 2, we give the asymptotic two-step superlinear convergence of the algorithm. Competitive numerical results are reported in § 3. The results are compared with the ones obtained by the three algorithms in the KNITRO software package for solving general nonlinear programs. We conclude in § 4.
2. Local Convergence
Here, we give our local two-step superlinear convergence result. We make the following assumptions.
Assumptions:
(A1) Problem (1)
has a local solution. Let
.
(A2) , generated by Algorithm 1 for minimizing
is so that
,
.
(A3) The function
and
,
, are twice continuously differentiable
on the compact set
.
(A4) and
are locally Lipschitz
continuous at
.
(A5) is locally Lipschitz continuous at
, that is, there exist a constant
such that

(A6) The
gradients of the active constraints at , for all
, are linearly independent on
.
(A7) There exist
positive constants and
such that

where, is the projected Hessian at
.
We will make use of the following two theorems from Mahdavi-Amiri and Ansari (2012a) and Nocedal and Overton (1985).
Theorem 1.
(Mahdavi-Amiri and Ansari, 2012a) There exists such that if
and
, then

where, ,
is a
constant independent of
, and if equation (8)
holds, then

where,

and .
Theorem 2. (Nocedal and
Overton, 1985) Suppose that Algorithm 1 is applied with any update rule for
approximating the projected Hessian matrix. If ,
, for all
, and

then , at a two-step Q-superlinear rate.
We now give the so-called bounded deterioration inequality for the structured BFGS update formula.
Theorem 3. Suppose that
the inequality (23) and Assumptions (A5) and (A7) hold. Let be the projected structured BFGS secant
update formula, that is,
where,
is the BFGS update formula of
obtained by using (28). Then, there
exist a positive constant
and the neighborhood
such that

for all.
Now, we give a linear convergence result.
Theorem 4. Suppose that
Assumptions (A1)-(A7) hold. Let be the structured BFGS
secant update formula, that is,
where,
is the BFGS update formula of
obtained by using (28). Let the sequence
be generated by Algorithm 1. For any
, there exist positive constants
and
such that if
and
, then

that is, converge to
, at least at
a two-step Q-linear rate.
The next theorem shows the satisfaction of (34) in Theorem 2 for the structured BFGS update formula.
Theorem 5. If exists so that for every iteration
, we have

and we update the using the structured BFGS secant formula
for each
, that is,
,
where,
is the BFGS update formula of
obtained by using (28), then

In §1, we pointed out that we use the structured quasi-Newton update formula for approximating the projected Hessian in Algorithm 1, if the inequality (23) holds (this is expected to happen when the algorithm is in its local phase with the iterate being close to a stationary point). Now, we give the superlinear convergence result for Algorithm 1.
Theorem 6. Suppose that
Assumptions (A1)-(A7) hold. Let the sequence be generated
by Algorithm 1 and
be obtained by

where, is the BFGS update formula of
obtained by using (28). Then,
converges to
with a two
step superlinear rate, that is,

3. Numerical Experiments
We coded our algorithm
in MATLAB 7.6.0. In the global steps, a line search strategy is necessary. In
our implementation, for the line search strategy, we used the approach
specially designed for nonlinear least squares given by Mahdavi-Amiri and
Ansari (2012a, b, c). We put in (23), as suggested
by Nocedal and Overton (1985) and set
and
. The initial matrix
is set to be the identity matrix. For
robustness, we followed the computational considerations provided by
Mahdavi-Amiri and Bartels (1989). We tested our algorithm on 35 randomly
generated test problems using the test problem generation scheme given by
Bartels and Mahdavi-Amiri (1986).
A simple test generating scheme in Bartels and Mahdavi-Amiri (1986) is described here. The following parameters are set for the least squares problem:
: number of variables;
: number of components in F(x);
: number of equality constraints;
: number of inequality constraints;
: number of active inequality constraints.
The random values for
the components of an optimizer and the corresponding
Lagrange multipliers, the
, of the active equality
constraints are set between -1 and 1, while the Lagrange multipliers of the
active inequality constraints are chosen between 0 and 1 (the Lagrange
multipliers of the inactive constraints are set to zero). Then, the Lagrangian
Hessian matrix at
is determined as follows:

where, is a
upper
triangular matrix. Except for
, which is set to 2,
all the components of
are random values in the range
-1 and 1.
is also an upper triangular matrix with
random components being set between -1 and 1. The scaler
is under user control, and it dictates
whether the Hessian of the Lagrangian of the generated problem is positive
definite (
) or indefinite (
).
Next, for generating the
random least squares problem, we consider ,where,

The values of and
are
determined such that
and
, to satisfy
the sufficient conditions for local optimality of
. The
constraints of the problem are set as follows:

where, the constants are set so
that
for the active constraints and
for the
inactive constraints. For setting the
, the gradient of
at
is set
to the
th column of the identity matrix, to have
the linear independence of the active constraints at hand.
The parameters of
these random problems are reported in Table 1. All random numbers needed for
the random problems were generated by the function ''rand'' in MATLAB. We generated
35 random problems, composed of 5 problems in each one of the categories
numbered in Table 1 along with the parameter settings. For the generated
problem sets 1-3, 4-5 and 6-7, all quantities were exactly the same and only differed in
each set by having a different value of
. A
variety of problems having different number of variables and number of
constraints were used. We generated problems not only having positive definite,
but also indefinite Hessians of the Lagrangian (with positive definite projected
Hessians).
Table 1. The parameters of random problems.
Problem Number |
![]() |
![]() |
![]() ![]() |
![]() |
![]() |
![]() |
1 |
5 |
5 |
2 |
3 |
2 |
1 |
2 |
5 |
5 |
2 |
3 |
2 |
-1 |
3 |
5 |
5 |
2 |
3 |
2 |
-10 |
4 |
10 |
10 |
5 |
5 |
2 |
1 |
5 |
10 |
10 |
5 |
5 |
2 |
-1 |
6 |
20 |
20 |
8 |
12 |
2 |
1 |
7 |
20 |
20 |
8 |
12 |
2 |
-1 |
For comparison, we solved these random problems by the three algorithms in KNITRO 6.0 (Interior-point/Direct, Interior-point/CG and Active set algorithms). In keeping the three algorithms of KNITRO in line with our computing features, we set the parameters 'GradObj' and 'GradConstr' to 'on', so that exact gradients are used, and the other parameters were set to the default parameter values (this way, the BFGS updating rule was used for Hessian approximations).
For our comparisons, we
explain the notion of a performance profile (Dolan and More', 2002) as a means
to evaluate and compare the performance of the set of solvers on a test set
. We
assume that we have
solvers and
problems. We are interested in
using the number of function evaluations as a performance measure. Suppose
is the number of function evaluations
required to solve problem
by algorithm
. We compare the performance on problem
by solver
with
the best performance by any solver on this problem; that is, we use the
performance ratio

We assume that a
parameter is chosen so that
,
for all
and
, and we
put
if and only if algorithm
does not solve problem
. If we define

then is the probability for solver
that a performance ratio
is within a factor
of the best possible ratio. The
function
is the distribution function for the
performance ratio. The value of
is the probability that
the solver will win over the rest of the solvers (Dolan and More', 2002).
Figure 1 shows the performance profiles of the five solvers. The most
significant aspect of Figure 1 is that on this test set our algorithm
outperforms all other solvers. The performance profile for our algorithm lies
above all the others for all performance ratios. According to Figure 1, we
observe that our algorithm has shown to be substantially more efficient more
often than the three programs in KNITRO.

4. Conclusion
We proposed a projected structured BFGS scheme for approximating the projected structured Hessian matrix in an exact penalty method for solving constrained nonlinear least squares problems. We established the local two-step superlinear convergence of the proposed algorithm. Comparative numerical results using the performance profile of Dolan and More' showed the efficiency and robustness of the algorithm.
Acknowledgements
The authors thank Iran National Science Foundation (INSF) for supporting this work under grant no. 88000800.
References
Bartels, R. H., Mahdavi-Amiri, N. (1986). On generating test problems for nonlinear programming algorithms. SIAM Journal of Scientific and Statistical Computing, 7(3), pp. 769–798. View Article
Bidabadi, N., Mahdavi-Amiri, N. (2012). A two-step superlinearly convergent projected structured BFGS method for constrained nonlinear least squares. To appear in Optimization View Article
Coleman, T. F., Conn, A. R. (1982a). Nonlinear programming via an exact penalty function: Asymptotic analysis. Mathematical Program., 24, pp. 123–136. View Article
Coleman, T. F., Conn, A. R. (1982b). Nonlinear programming via an exact penalty function: Global analysis. Mathematical Program., 24, pp. 137–161. View Article
Dennis, J. E., Martinez, H. J., Tapia R. A. (1989). Convergence theory for the structured BFGS secant method with an application to nonlinear least squares. Journal of Optimization Theory and Applications, 61, pp. 161-178. View Article
Dolan, E. D., More, J. J. (2002). Benchmarking optimization software with performance profiles. Math. Program., 91, pp. 201-213. View Article
Golub, G., Pereyra, V. (2003). Separable nonlinear least squares: the variable projection method and its applications, topical review. Inverse Problems 19, pp. R1–R26 View Article
Fletcher, R., Xu, C. (1987). Hybrid methods for nonlinear least squares. IMA Journal of Numerical Analysis, 7, pp. 371-389. View Article
Li, Z. F., Osborne, M. R., Prvan, T. (2002). Adaptive algorithm for constrained least-squares problems. Journal of Optimization Theory and Applications, 114(2), pp. 423-441. View Article
Mahdavi-Amiri, N., Ansari, M. R. (2012a). Superlinearly convergent exact penalty projected structured Hessian updating schemes for constrained nonlinear least squares: asymptotic analysis. To appear in Bulletin of the Iranian Mathematical Society, View Article
Mahdavi-Amiri, N., Ansari, M. R. (2012b). Superlinearly convergent exact penalty projected structured Hessian updating schemes for constrained nonlinear least squares: global analysis. To appear in Optimization View Article
Mahdavi-Amiri, N., Ansari, M. R. (2012c). A Superlinearly Convergent Penalty Method with Nonsmooth Line Search for Constrained Nonlinear Least Squares. SQU Journal for Science, 17(1), pp. 103-124. View Article
Mahdavi-Amiri, N., Bartels, R. H. (1989). Constrained nonlinear least squares: An exact penalty approach with projected structured quasi-Newton updates. ACM Transactions on Mathematical Software, 15(3), pp. 220–242 View Article
Mullen, K. M., Vengris, M., van Stokkum, I. H. M. (2007). Algorithms for separable nonlinear least squares with application to modeling time-resolved spectra. Journal of Global Optimization, 38, pp. 201–213 View Article
Nievergelt, Y. (2000). A tutorial history of least squares with applications to astronomy and geodesy. Journal of Computational and Applied Mathematics, 121, pp. 37-72. View Article
Nocedal, J., Overton, M. L. (1985). Projected Hessian updating algorithms for nonlinearly constrained optimization. SIAM Journal of Numerical Analysis, 22(5), pp. 821–850. View Article