Volume 1, Issue 1, Year 2012 - Pages 1-8
DOI: 10.11159/ijecs.2012.001
Constrained Nonlinear Least Squares: A Superlinearly Convergent Projected Structured Secant Method
Nezam Mahdavi-Amiri, Narges Bidabadi
Sharif University of Technology, Faculty of Mathematical Sciences
Azadi Street, Tehran, Iran
nezamm@sina.sharif.edu, n_bidabadi@mehr.sharif.ir
Abstract - Numerical solution of nonlinear least-squares problems is an important computational task in science and engineering. Effective algorithms have been developed for solving nonlinear least squares problems. The structured secant method is a class of efficient methods developed in recent years for optimization problems in which the Hessian of the objective function has some special structure. A primary and typical application of the structured secant method is to solve the nonlinear least squares problems. We present an exact penalty method for solving constrained nonlinear least-squares problems, when the structured projected Hessian is approximated by a projected version of the structured BFGS formula and give its local two-step Q-superlinear convergence. For robustness, we employ a special nonsmooth line search strategy, taking account of the least squares objective. We discuss the comparative results of the testing of our programs and three nonlinear programming codes from KNITRO on some randomly generated test problems due to Bartels and Mahdavi-Amiri. Numerical results also confirm the practical relevance of our special considerations for the inherent structure of the least squares.
Keywords: Constrained Nonlinear Least Squares, Exact Penalty Methods, Projected Hessian Updating, Structured Secant Updates, Two-Step Superlinear Convergence
© Copyright 2012 Authors - This is an Open Access article published under the Creative Commons Attribution License terms. Unrestricted use, distribution, and reproduction in any medium are permitted, provided the original work is properly cited.
1. Introduction
Consider the constrained nonlinear least squares (CNLLS) problem,
where, , , , , and, , are functions from to, all assumed to be twice continuously differentiable. These problems arise as experimental data analysis problems in various areas of science and engineering such as electrical engineering, medical and biological imaging, chemistry, robotics, vision, and environmental sciences; e.g., see Nievergelt (2000), Golub and Pereyra (2003) and Mullen et al. (2007). The gradient and Hessian of can be expressed as
and
where, is the matrix whose columns are the gradients, and
Using this special structure for approximating the Hessian matrix has been the subject of numerous research papers; see Fletcher and Xu (1987), Dennis et al. (1989), Mahdavi-Amiri and Bartels (1989), Li et al. (2002), Mahdavi-Amiri and Ansari (2012a), and Bidabadi and Mahdavi-Amiri (2012). Here, we consider the structured BFGS approximate formula (Dennis et al., 1989) for the structured least squares projected Hessian and a special line search scheme given by Mahdavi-Amiri and Ansari (2012c) leading to a more efficient algorithm than general nonlinear programming schemes. An exact penalty function for nonlinear programming problems is defined to be
where, is a penalty parameter. If is a stationary point of (1) and the gradients of the active constraints at are linearly independent, then there exists a real number such that is also a stationary point of, for each. Mahdavi-Amiri and Bartels (1989), based on a general approach for nonlinear problems (See Coleman and Conn, 1982a), proposed a special structured algorithm for minimization of with a fixed value of in solving the CNLLS problems. Let be a small positive number used to identify the near-active (-active) constraint set. The algorithm obtains search directions by using an -active set
and a corresponding merit function:
where,
Step lengths are chosen by using, which is with. The optimality conditions are checked by using, as well. The gradient and Hessian of are:
and
where,
It is well known that the necessary conditions for to be an isolated local minimizer for, under the assumptions made above on and the, are that there exist multipliers, , for, such that
A point for which only (12) above is satisfied is called a stationary point of. A minimizer must be a stationary point satisfying (13). One major premise of the algorithm is that the multipliers are only worth estimating in the neighborhoods of stationary points. Nearness to a stationary point is governed by a stationary tolerance. The algorithm is considered to be in a local state, if norm of the projected or reduced gradient, i.e.,, is smaller than this tolerance, and it is in a global state, otherwise. Fundamental to the approach is the following quadratic problem:
where,
the are the Lagrange multipliers associated with (14) in a local state (in the proximity of a stationary point) and the are taken to be zero when the algorithm is in a global state (far from a stationary point). In practice, the QR decomposition of is used to solve the quadratic problem:
If we set , for some , then is to be found by solving
Therefore, for solving the quadratic problem (14), we need an approximation for the projected or reduced Hessian . We are to present a projected structured BFGS update formula for computing an approximation by providing a quasi-Newton approximation, where,
with the setting . We give the asymptotic convergence results of exact penalty methods using this projected structured BFGS updating scheme. Consider the asymptotic case, that is, the case that the final active set has been identified, so that for all further , with designating the optimal point, we have
Suppose that
and we want to update to , approximating
Note that, where is the number of active constraints. We assume rank () =, for all. Letting and , we have , where, in order to simplify notation, the presence of a bar above a quantity indicates that it is taken at iteration , and the absence of a bar indicates iteration . If the constraints are linear, then, for. In the nonlinear case, asymptotically it is expected that be negligible and thus we obtain (Mahdavi-Amiri and Bartels, 1989):
Thus, we use the quasi-Newton formula to update to according to the secant equation, when has actually become negligible, that is, when
for, where is the iteration number and denotes the Euclidean norm, as suggested by Nocedal and Overton (1985) for general constrained nonlinear programs. Therefore, an approximation to is
where,
A general framework for exact penalty algorithm to minimize (5) is given below (see Mahdavi-Amiri and Bartels, 1989).
Algorithm 1: An Exact Penalty Method.
Step 0: Give an initial point and .
Step 1: Determine, ,, . Identify the matrix by computing the QR decomposition of and set global=true, optimal=false.
Step 2: If , then obtain the search direction that is a solution of the quadratic problem (14), where, approximates the global Hessian , and go to Step 5.
Step 3: Determine the Lagrange multipliers, , as a solution of
If conditions (13) are not satisfied, then choose an index for which one of (13) is violated, and determine the search direction that satisfies the system of equations , where, is the th unit vector, and , and go to Step 5.
Step 4: Set global=false. Determine the direction , where, is the solution to the quadratic problem (14), approximates the local Hessian with the being the Lagrange multipliers associated with (14) as determined in Step 3. The vertical direction , is the solution to the system
where, is the vector of the constraint functions, ordered in accordance with the columns of . Set the step length and go to Step 6.
Step 5: Determine step length using a line search procedure on .
Step 6: Compute . If a sufficient decrease has been obtained, then set , else go to Step 8.
Step 7: If global=false, then check the optimality conditions for . If is optimal, then set optimal=true and stop, else go to Step 1.
Step 8: If global=true and , then reduce to change , else reduce so that becomes large tested against .
Step 9: If (global=true and ) or is too small or is too small, then report failure and stop, else go to Step 1.
Remarks: In Step 6, we have used an appropriate nonsmooth line search strategy to determine the step length satisfying a sufficient decrease in that is characterized by the line search assumption; see Coleman and Conn (1982b), part (v), P. 152. For a recent line search strategy for CNLLS problems, see Mahdavi-Amiri and Ansari (2012a, 2012c). In Step 7, the optimality conditions are checked as follows: If is small enough, then determine the multipliers , , as the least squares solution of (26). If the conditions (13) are satisfied, then is considered to satisfy the first order conditions, and thus being a stationary point, the algorithm stops, as commonly practiced in optimization algorithms. Of course, second order conditions are needed to be checked to ascertain optimality of .
In the remainder of our work, we drop the index from and , for simplicity.
For approximating the projected structured Hessians in solving the quadratic problem, we make use of the ideas of Dennis et al. (1989) in a different context. We consider the structured BFGS update of , given by
where,
Remark: Clearly, , given by (32), satisfies the secant equation .
Here, we use the BFGS secant updates for approximating the projected structured Hessians in solving the constrained nonlinear least squares problem.
The remainder of our work is organized as follows. In § 2, we give the asymptotic two-step superlinear convergence of the algorithm. Competitive numerical results are reported in § 3. The results are compared with the ones obtained by the three algorithms in the KNITRO software package for solving general nonlinear programs. We conclude in § 4.
2. Local Convergence
Here, we give our local two-step superlinear convergence result. We make the following assumptions.
Assumptions:
(A1) Problem (1) has a local solution. Let.
(A2) , generated by Algorithm 1 for minimizing is so that , .
(A3) The function and , , are twice continuously differentiable on the compact set .
(A4) and are locally Lipschitz continuous at .
(A5) is locally Lipschitz continuous at , that is, there exist a constant such that
(A6) The gradients of the active constraints at , for all , are linearly independent on .
(A7) There exist positive constants and such that
where, is the projected Hessian at .
We will make use of the following two theorems from Mahdavi-Amiri and Ansari (2012a) and Nocedal and Overton (1985).
Theorem 1. (Mahdavi-Amiri and Ansari, 2012a) There exists such that if and , then
where, , is a constant independent of , and if equation (8) holds, then
where,
and .
Theorem 2. (Nocedal and Overton, 1985) Suppose that Algorithm 1 is applied with any update rule for approximating the projected Hessian matrix. If ,, for all , and
then , at a two-step Q-superlinear rate.
We now give the so-called bounded deterioration inequality for the structured BFGS update formula.
Theorem 3. Suppose that the inequality (23) and Assumptions (A5) and (A7) hold. Let be the projected structured BFGS secant update formula, that is, where, is the BFGS update formula of obtained by using (28). Then, there exist a positive constant and the neighborhood such that
for all.
Now, we give a linear convergence result.
Theorem 4. Suppose that Assumptions (A1)-(A7) hold. Let be the structured BFGS secant update formula, that is, where, is the BFGS update formula of obtained by using (28). Let the sequence be generated by Algorithm 1. For any , there exist positive constants and such that if and , then
that is, converge to , at least at a two-step Q-linear rate.
The next theorem shows the satisfaction of (34) in Theorem 2 for the structured BFGS update formula.
Theorem 5. If exists so that for every iteration , we have
and we update the using the structured BFGS secant formula for each , that is, , where, is the BFGS update formula of obtained by using (28), then
In §1, we pointed out that we use the structured quasi-Newton update formula for approximating the projected Hessian in Algorithm 1, if the inequality (23) holds (this is expected to happen when the algorithm is in its local phase with the iterate being close to a stationary point). Now, we give the superlinear convergence result for Algorithm 1.
Theorem 6. Suppose that Assumptions (A1)-(A7) hold. Let the sequence be generated by Algorithm 1 and be obtained by
where, is the BFGS update formula of obtained by using (28). Then, converges to with a two step superlinear rate, that is,
3. Numerical Experiments
We coded our algorithm in MATLAB 7.6.0. In the global steps, a line search strategy is necessary. In our implementation, for the line search strategy, we used the approach specially designed for nonlinear least squares given by Mahdavi-Amiri and Ansari (2012a, b, c). We put in (23), as suggested by Nocedal and Overton (1985) and set and . The initial matrix is set to be the identity matrix. For robustness, we followed the computational considerations provided by Mahdavi-Amiri and Bartels (1989). We tested our algorithm on 35 randomly generated test problems using the test problem generation scheme given by Bartels and Mahdavi-Amiri (1986).
A simple test generating scheme in Bartels and Mahdavi-Amiri (1986) is described here. The following parameters are set for the least squares problem:
- : number of variables;
- : number of components in F(x);
- : number of equality constraints;
- : number of inequality constraints;
- : number of active inequality constraints.
The random values for the components of an optimizer and the corresponding Lagrange multipliers, the , of the active equality constraints are set between -1 and 1, while the Lagrange multipliers of the active inequality constraints are chosen between 0 and 1 (the Lagrange multipliers of the inactive constraints are set to zero). Then, the Lagrangian Hessian matrix at is determined as follows:
where, is a upper triangular matrix. Except for , which is set to 2, all the components of are random values in the range -1 and 1. is also an upper triangular matrix with random components being set between -1 and 1. The scaler is under user control, and it dictates whether the Hessian of the Lagrangian of the generated problem is positive definite () or indefinite ().
Next, for generating the random least squares problem, we consider ,where,
The values of and are determined such that and , to satisfy the sufficient conditions for local optimality of . The constraints of the problem are set as follows:
where, the constants are set so that for the active constraints and for the inactive constraints. For setting the, the gradient of at is set to the th column of the identity matrix, to have the linear independence of the active constraints at hand.
The parameters of these random problems are reported in Table 1. All random numbers needed for the random problems were generated by the function ''rand'' in MATLAB. We generated 35 random problems, composed of 5 problems in each one of the categories numbered in Table 1 along with the parameter settings. For the generated problem sets 1-3, 4-5 and 6-7, all quantities were exactly the same and only differed in each set by having a different value of. A variety of problems having different number of variables and number of constraints were used. We generated problems not only having positive definite, but also indefinite Hessians of the Lagrangian (with positive definite projected Hessians).
Table 1. The parameters of random problems.
Problem Number | ||||||
1 |
5 |
5 |
2 |
3 |
2 |
1 |
2 |
5 |
5 |
2 |
3 |
2 |
-1 |
3 |
5 |
5 |
2 |
3 |
2 |
-10 |
4 |
10 |
10 |
5 |
5 |
2 |
1 |
5 |
10 |
10 |
5 |
5 |
2 |
-1 |
6 |
20 |
20 |
8 |
12 |
2 |
1 |
7 |
20 |
20 |
8 |
12 |
2 |
-1 |
For comparison, we solved these random problems by the three algorithms in KNITRO 6.0 (Interior-point/Direct, Interior-point/CG and Active set algorithms). In keeping the three algorithms of KNITRO in line with our computing features, we set the parameters 'GradObj' and 'GradConstr' to 'on', so that exact gradients are used, and the other parameters were set to the default parameter values (this way, the BFGS updating rule was used for Hessian approximations).
For our comparisons, we explain the notion of a performance profile (Dolan and More', 2002) as a means to evaluate and compare the performance of the set of solvers on a test set . We assume that we have solvers and problems. We are interested in using the number of function evaluations as a performance measure. Suppose is the number of function evaluations required to solve problem by algorithm. We compare the performance on problem by solver with the best performance by any solver on this problem; that is, we use the performance ratio
We assume that a parameter is chosen so that, for all and, and we put if and only if algorithm does not solve problem. If we define
then is the probability for solver that a performance ratio is within a factor of the best possible ratio. The function is the distribution function for the performance ratio. The value of is the probability that the solver will win over the rest of the solvers (Dolan and More', 2002). Figure 1 shows the performance profiles of the five solvers. The most significant aspect of Figure 1 is that on this test set our algorithm outperforms all other solvers. The performance profile for our algorithm lies above all the others for all performance ratios. According to Figure 1, we observe that our algorithm has shown to be substantially more efficient more often than the three programs in KNITRO.
4. Conclusion
We proposed a projected structured BFGS scheme for approximating the projected structured Hessian matrix in an exact penalty method for solving constrained nonlinear least squares problems. We established the local two-step superlinear convergence of the proposed algorithm. Comparative numerical results using the performance profile of Dolan and More' showed the efficiency and robustness of the algorithm.
Acknowledgements
The authors thank Iran National Science Foundation (INSF) for supporting this work under grant no. 88000800.
References
Bartels, R. H., Mahdavi-Amiri, N. (1986). On generating test problems for nonlinear programming algorithms. SIAM Journal of Scientific and Statistical Computing, 7(3), pp. 769–798. View Article
Bidabadi, N., Mahdavi-Amiri, N. (2012). A two-step superlinearly convergent projected structured BFGS method for constrained nonlinear least squares. To appear in Optimization View Article
Coleman, T. F., Conn, A. R. (1982a). Nonlinear programming via an exact penalty function: Asymptotic analysis. Mathematical Program., 24, pp. 123–136. View Article
Coleman, T. F., Conn, A. R. (1982b). Nonlinear programming via an exact penalty function: Global analysis. Mathematical Program., 24, pp. 137–161. View Article
Dennis, J. E., Martinez, H. J., Tapia R. A. (1989). Convergence theory for the structured BFGS secant method with an application to nonlinear least squares. Journal of Optimization Theory and Applications, 61, pp. 161-178. View Article
Dolan, E. D., More, J. J. (2002). Benchmarking optimization software with performance profiles. Math. Program., 91, pp. 201-213. View Article
Golub, G., Pereyra, V. (2003). Separable nonlinear least squares: the variable projection method and its applications, topical review. Inverse Problems 19, pp. R1–R26 View Article
Fletcher, R., Xu, C. (1987). Hybrid methods for nonlinear least squares. IMA Journal of Numerical Analysis, 7, pp. 371-389. View Article
Li, Z. F., Osborne, M. R., Prvan, T. (2002). Adaptive algorithm for constrained least-squares problems. Journal of Optimization Theory and Applications, 114(2), pp. 423-441. View Article
Mahdavi-Amiri, N., Ansari, M. R. (2012a). Superlinearly convergent exact penalty projected structured Hessian updating schemes for constrained nonlinear least squares: asymptotic analysis. To appear in Bulletin of the Iranian Mathematical Society, View Article
Mahdavi-Amiri, N., Ansari, M. R. (2012b). Superlinearly convergent exact penalty projected structured Hessian updating schemes for constrained nonlinear least squares: global analysis. To appear in Optimization View Article
Mahdavi-Amiri, N., Ansari, M. R. (2012c). A Superlinearly Convergent Penalty Method with Nonsmooth Line Search for Constrained Nonlinear Least Squares. SQU Journal for Science, 17(1), pp. 103-124. View Article
Mahdavi-Amiri, N., Bartels, R. H. (1989). Constrained nonlinear least squares: An exact penalty approach with projected structured quasi-Newton updates. ACM Transactions on Mathematical Software, 15(3), pp. 220–242 View Article
Mullen, K. M., Vengris, M., van Stokkum, I. H. M. (2007). Algorithms for separable nonlinear least squares with application to modeling time-resolved spectra. Journal of Global Optimization, 38, pp. 201–213 View Article
Nievergelt, Y. (2000). A tutorial history of least squares with applications to astronomy and geodesy. Journal of Computational and Applied Mathematics, 121, pp. 37-72. View Article
Nocedal, J., Overton, M. L. (1985). Projected Hessian updating algorithms for nonlinearly constrained optimization. SIAM Journal of Numerical Analysis, 22(5), pp. 821–850. View Article