Volume 1, Issue 1, Year 2012 - Pages 1-8
DOI: 10.11159/ijecs.2012.001

Constrained Nonlinear Least Squares: A Superlinearly Convergent Projected Structured Secant Method

Nezam Mahdavi-Amiri, Narges Bidabadi

Sharif University of Technology, Faculty of Mathematical Sciences
Azadi Street, Tehran, Iran
nezamm@sina.sharif.edu, n_bidabadi@mehr.sharif.ir

Abstract - Numerical solution of nonlinear least-squares problems is an important computational task in science and engineering. Effective algorithms have been developed for solving nonlinear least squares problems. The structured secant method is a class of efficient methods developed in recent years for optimization problems in which the Hessian of the objective function has some special structure. A primary and typical application of the structured secant method is to solve the nonlinear least squares problems. We present an exact penalty method for solving constrained nonlinear least-squares problems, when the structured projected Hessian is approximated by a projected version of the structured BFGS formula and give its local two-step Q-superlinear convergence. For robustness, we employ a special nonsmooth line search strategy, taking account of the least squares objective. We discuss the comparative results of the testing of our programs and three nonlinear programming codes from KNITRO on some randomly generated test problems due to Bartels and Mahdavi-Amiri. Numerical results also confirm the practical relevance of our special considerations for the inherent structure of the least squares.

Keywords: Constrained Nonlinear Least Squares, Exact Penalty Methods, Projected Hessian Updating, Structured Secant Updates, Two-Step Superlinear Convergence

© Copyright 2012 Authors - This is an Open Access article published under the Creative Commons Attribution License terms. Unrestricted use, distribution, and reproduction in any medium are permitted, provided the original work is properly cited.

1. Introduction

Consider the constrained nonlinear least squares (CNLLS) problem,


where, , , , , and, , are functions from  to, all assumed to be twice continuously differentiable. These problems arise as experimental data analysis problems in various areas of science and engineering such as electrical engineering, medical and biological imaging, chemistry, robotics, vision, and environmental sciences; e.g., see Nievergelt (2000), Golub and Pereyra (2003) and Mullen et al. (2007). The gradient and Hessian of can be expressed as




 where, is the matrix whose columns are the gradients, and


Using this special structure for approximating the Hessian matrix has been the subject of numerous research papers; see Fletcher and Xu (1987), Dennis et al. (1989), Mahdavi-Amiri and Bartels (1989), Li et al. (2002), Mahdavi-Amiri and Ansari (2012a), and Bidabadi and Mahdavi-Amiri (2012). Here, we consider the structured BFGS approximate formula (Dennis et al., 1989) for the structured least squares projected Hessian and a special line search scheme given by Mahdavi-Amiri and Ansari (2012c) leading to a more efficient algorithm than general nonlinear programming schemes. An exact penalty function for nonlinear programming problems is defined to be


where, is a penalty parameter. If  is a stationary point of (1) and the gradients of the active constraints at are linearly independent, then there exists a real number  such that is also a stationary point of, for each. Mahdavi-Amiri and Bartels (1989), based on a general approach for nonlinear problems (See Coleman and Conn, 1982a), proposed a special structured algorithm for minimization of  with a fixed value of  in solving the CNLLS problems. Let  be a small positive number used to identify the near-active (-active) constraint set. The algorithm obtains search directions by using an -active set


and a corresponding merit function:




Step lengths are chosen by using, which is  with. The optimality conditions are checked by using, as well. The gradient and Hessian of  are:






It is well known that the necessary conditions for  to be an isolated local minimizer for, under the assumptions made above on  and the, are that there exist multipliers, , for, such that


A point  for which only (12) above is satisfied is called a stationary point of. A minimizer  must be a stationary point satisfying (13). One major premise of the algorithm is that the multipliers are only worth estimating in the neighborhoods of stationary points. Nearness to a stationary point is governed by a stationary tolerance. The algorithm is considered to be in a local state, if norm of the projected or reduced gradient, i.e.,, is smaller than this tolerance, and it is in a global state, otherwise. Fundamental to the approach is the following quadratic problem:




the  are the Lagrange multipliers associated with (14) in a local state (in the proximity of a stationary point) and the  are taken to be zero when the algorithm is in a global state (far from a stationary point). In practice, the QR decomposition of  is used to solve the quadratic problem:


If we set , for some , then  is to be found by solving


Therefore, for solving the quadratic problem (14), we need an approximation for the projected or reduced Hessian . We are to present a projected structured BFGS update formula for computing an approximation  by providing a quasi-Newton approximation, where,


with the setting . We give the asymptotic convergence results of exact penalty methods using this projected structured BFGS updating scheme. Consider the asymptotic case, that is, the case that the final active set has been identified, so that for all further , with  designating the optimal point, we have


Suppose that


and we want to update  to , approximating


Note that, where  is the number of active constraints. We assume rank () =, for all. Letting  and , we have , where, in order to simplify notation, the presence of a bar above a quantity indicates that it is taken at iteration , and the absence of a bar indicates iteration . If the constraints are linear, then, for. In the nonlinear case, asymptotically it is expected that  be negligible and thus we obtain (Mahdavi-Amiri and Bartels, 1989):


Thus, we use the quasi-Newton formula to update  to  according to the secant equation, when  has actually become negligible, that is, when


for, where  is the iteration number and  denotes the Euclidean norm, as suggested by Nocedal and Overton (1985) for general constrained nonlinear programs. Therefore, an approximation  to  is




A general framework for exact penalty algorithm to minimize (5) is given below (see Mahdavi-Amiri and Bartels, 1989).

Algorithm 1: An Exact Penalty Method.

Step 0: Give an initial point  and .

Step 1: Determine, ,, . Identify the matrix  by computing the QR decomposition of  and set global=true, optimal=false.

Step 2: If , then obtain the search direction  that is a solution of the quadratic problem (14), where,  approximates the global Hessian , and go to Step 5.

Step 3: Determine the Lagrange multipliers, , as a solution of


If conditions (13) are not satisfied, then choose an index  for which one of (13) is violated, and determine the search direction  that satisfies the system of equations , where,  is the th unit vector, and , and go to Step 5.

Step 4: Set global=false. Determine the direction , where,  is the solution to the quadratic problem (14),  approximates the local Hessian  with the  being the Lagrange multipliers associated with (14) as determined in Step 3. The vertical direction ,  is the solution to the system


where,  is the vector of the constraint functions, ordered in accordance with the columns of . Set the step length  and go to Step 6.

Step 5: Determine step length  using a line search procedure on .

Step 6: Compute . If a sufficient decrease has been obtained, then set , else go to Step 8.

Step 7: If global=false, then check the optimality conditions for . If  is optimal, then set optimal=true and stop, else go to Step 1.

Step 8: If global=true and , then reduce  to change , else reduce  so that  becomes large tested against .

Step 9: If (global=true and ) or  is too small or  is too small, then report failure and stop, else go to Step 1.

Remarks: In Step 6, we have used an appropriate nonsmooth line search strategy to determine the step length satisfying a sufficient decrease in  that is characterized by the line search assumption; see Coleman and Conn (1982b), part (v), P. 152. For a recent line search strategy for CNLLS problems, see Mahdavi-Amiri and Ansari (2012a, 2012c). In Step 7, the optimality conditions are checked as follows: If  is small enough, then determine the multipliers , , as the least squares solution of (26). If the conditions (13) are satisfied, then is considered to satisfy the first order conditions, and thus  being a stationary point, the algorithm stops, as commonly practiced in optimization algorithms. Of course, second order conditions are needed to be checked to ascertain optimality of .

 In the remainder of our work, we drop the index  from  and , for simplicity.

For approximating the projected structured Hessians in solving the quadratic problem, we make use of the ideas of Dennis et al. (1989) in a different context. We consider the structured BFGS update of ,  given by




Remark: Clearly, , given by (32), satisfies the secant equation .

Here, we use the BFGS secant updates for approximating the projected structured Hessians in solving the constrained nonlinear least squares problem.

The remainder of our work is organized as follows. In § 2, we give the asymptotic two-step superlinear convergence of the algorithm. Competitive numerical results are reported in § 3. The results are compared with the ones obtained by the three algorithms in the KNITRO software package for solving general nonlinear programs. We conclude in § 4.

2. Local Convergence

Here, we give our local two-step superlinear convergence result. We make the following assumptions.


(A1) Problem (1) has a local solution. Let.

(A2) , generated by Algorithm 1 for minimizing is so that , .

(A3) The function  and , , are twice continuously differentiable on the compact set .

(A4)  and  are locally Lipschitz continuous at .

(A5)  is locally Lipschitz continuous at , that is, there exist a constant  such that


(A6) The gradients of the active constraints at , for all , are linearly independent on .

(A7) There exist positive constants  and  such that


where,  is the projected Hessian at .

We will make use of the following two theorems from Mahdavi-Amiri and Ansari (2012a) and Nocedal and Overton (1985).

Theorem 1. (Mahdavi-Amiri and Ansari, 2012a) There exists such that if  and , then


where, ,  is a constant independent of , and if equation (8) holds, then




and .

Theorem 2. (Nocedal and Overton, 1985) Suppose that Algorithm 1 is applied with any update rule for approximating the projected Hessian matrix. If ,, for all , and


then , at a two-step Q-superlinear rate.

We now give the so-called bounded deterioration inequality for the structured BFGS update formula.

Theorem 3. Suppose that the inequality (23) and Assumptions (A5) and (A7) hold. Let  be the projected structured BFGS secant update formula, that is, where,  is the BFGS update formula of  obtained by using (28). Then, there exist a positive constant  and the neighborhood  such that


for all.

Now, we give a linear convergence result.

Theorem 4. Suppose that Assumptions (A1)-(A7) hold. Let  be the structured BFGS secant update formula, that is,  where,  is the BFGS update formula of  obtained by using (28). Let the sequence  be generated by Algorithm 1. For any , there exist positive constants  and  such that if  and , then


that is,  converge to , at least at a two-step Q-linear rate.

The next theorem shows the satisfaction of (34) in Theorem 2 for the structured BFGS update formula.

Theorem 5. If exists so that for every iteration , we have


and we update the  using the structured BFGS secant formula for each , that is, , where,  is the BFGS update formula of  obtained by using (28), then


In §1, we pointed out that we use the structured quasi-Newton update formula for approximating the projected Hessian in Algorithm 1, if the inequality (23) holds (this is expected to happen when the algorithm is in its local phase with the iterate being close to a stationary point). Now, we give the superlinear convergence result for Algorithm 1.

Theorem 6. Suppose that Assumptions (A1)-(A7) hold. Let the sequence  be generated by Algorithm 1 and  be obtained by


where,  is the BFGS update formula of  obtained by using (28). Then,  converges to  with a two step superlinear rate, that is,


3. Numerical Experiments

We coded our algorithm in MATLAB 7.6.0. In the global steps, a line search strategy is necessary. In our implementation, for the line search strategy, we used the approach specially designed for nonlinear least squares given by Mahdavi-Amiri and Ansari (2012a, b, c). We put  in (23), as suggested by Nocedal and Overton (1985) and set and . The initial matrix is set to be the identity matrix. For robustness, we followed the computational considerations provided by Mahdavi-Amiri and Bartels (1989). We tested our algorithm on 35 randomly generated test problems using the test problem generation scheme given by Bartels and Mahdavi-Amiri (1986).

A simple test generating scheme in  Bartels and Mahdavi-Amiri (1986) is described here.  The following parameters are set for the least squares problem:

  • : number of variables;
  • : number of components in F(x);
  • : number of equality constraints;
  • : number of inequality constraints;
  • : number of active inequality constraints.

 The random values for the components of an optimizer  and the corresponding  Lagrange multipliers, the , of the active equality constraints  are set between -1 and 1, while the Lagrange multipliers of the active inequality constraints are chosen between 0 and 1 (the Lagrange multipliers of the inactive constraints are set to zero). Then,  the Lagrangian Hessian matrix at  is determined as follows:


where,  is a  upper triangular matrix. Except for  , which is set to 2, all the components of  are random values in the range -1 and 1.  is also an upper triangular matrix with random components being set between -1 and 1. The scaler  is under user control, and it dictates whether the Hessian of the Lagrangian of the generated problem is positive definite () or indefinite ().

Next, for generating the random least squares problem, we consider ,where,


The values of  and  are determined such that  and , to satisfy the sufficient conditions for local optimality of . The constraints of the problem are set as follows:


where, the constants   are set so that  for the active constraints and  for the inactive constraints. For setting the, the gradient of  at  is set to the th column of the identity matrix, to have the linear independence of the active constraints at hand.

  The parameters of these random problems are reported in Table 1. All random numbers needed for the random problems were generated by the function ''rand'' in MATLAB. We generated 35 random problems, composed of 5 problems in each one of the categories numbered in Table 1 along with the parameter settings. For the generated problem sets 1-3, 4-5 and 6-7, all quantities were exactly the same and only differed in each set by having a different value of. A variety of problems having different number of variables and number of constraints were used. We generated problems not only having positive definite, but also indefinite Hessians of the Lagrangian (with positive definite projected Hessians).

Table 1. The parameters of random problems.

Problem Number            


















































For comparison, we solved these random problems by the three algorithms in KNITRO 6.0 (Interior-point/Direct, Interior-point/CG and Active set algorithms). In keeping the three algorithms of KNITRO in line with our computing features, we set the parameters 'GradObj' and 'GradConstr' to 'on', so that exact gradients are used, and the other parameters were set to the default parameter values (this way, the BFGS updating rule was used for Hessian approximations).

For our comparisons, we explain the notion of a performance profile (Dolan and More', 2002) as a means to evaluate and compare the performance of the set of solvers  on a test set . We assume that we have  solvers and  problems. We are interested in using the number of function evaluations as a performance measure. Suppose  is the number of function evaluations required to solve problem by algorithm. We compare the performance on problem  by solver  with the best performance by any solver on this problem; that is, we use the performance ratio


We assume that a parameter  is chosen so that, for all  and, and we put  if and only if algorithm  does not solve problem. If we define


then  is the probability for solver  that a performance ratio  is within a factor  of the best possible ratio. The function  is the distribution function for the performance ratio. The value of is the probability that the solver will win over the rest of the solvers (Dolan and More', 2002). Figure 1 shows the performance profiles of the five solvers. The most significant aspect of Figure 1 is that on this test set our algorithm outperforms all other solvers. The performance profile for our algorithm lies above all the others for all performance ratios. According to Figure 1, we observe that our algorithm has shown to be substantially more efficient more often than the three programs in KNITRO.

Figure 1. Performance profile for the algorithms.

4. Conclusion

We proposed a projected structured BFGS scheme for approximating the projected structured Hessian matrix in an exact penalty method for solving constrained nonlinear least squares problems. We established the local two-step superlinear convergence of the proposed algorithm. Comparative numerical results using the performance profile of Dolan and More' showed the efficiency and robustness of the algorithm.


The authors thank Iran National Science Foundation (INSF) for supporting this work under grant no. 88000800.


Bartels, R. H., Mahdavi-Amiri, N. (1986). On generating test problems for nonlinear programming algorithms. SIAM Journal of Scientific and Statistical Computing, 7(3), pp. 769–798. View Article

Bidabadi, N., Mahdavi-Amiri, N. (2012). A two-step superlinearly convergent projected structured BFGS method for constrained nonlinear least squares. To appear in Optimization View Article

Coleman, T. F., Conn, A. R. (1982a). Nonlinear programming via an exact penalty function: Asymptotic analysis. Mathematical Program., 24, pp. 123–136. View Article

Coleman, T. F., Conn, A. R. (1982b). Nonlinear programming via an exact penalty function: Global analysis. Mathematical Program., 24, pp. 137–161. View Article

Dennis, J. E., Martinez, H. J., Tapia R. A. (1989). Convergence theory for the structured BFGS secant method with an application to nonlinear least squares. Journal of Optimization Theory and Applications, 61, pp. 161-178. View Article

Dolan, E. D., ‎More, J. J. (2002). Benchmarking optimization software with performance profiles. ‎Math‎. Program‎., 91‎, pp. 201-213‎. View Article

Golub, G., Pereyra, V. (2003). Separable nonlinear least squares: the variable projection method and its applications, topical review. Inverse Problems 19, pp. R1–R26 View Article

Fletcher, R., Xu, C. (1987). Hybrid methods for nonlinear least squares. IMA Journal of Numerical Analysis, 7, pp. 371-389. View Article

Li, Z. F., Osborne, M. R., Prvan, T. (2002). Adaptive algorithm for constrained least-squares problems. Journal of Optimization Theory and Applications, 114(2), pp. 423-441. View Article

Mahdavi-Amiri, N., Ansari, M. R. (2012a). Superlinearly convergent exact penalty projected structured Hessian updating schemes for constrained nonlinear least squares: asymptotic analysis. To appear in Bulletin of the Iranian Mathematical Society, View Article

Mahdavi-Amiri, N., Ansari, M. R. (2012b). Superlinearly convergent exact penalty projected structured Hessian updating schemes for constrained nonlinear least squares‎: ‎global analysis. To appear in Optimization View Article

Mahdavi-Amiri, N., Ansari, M. R. (2012c). A Superlinearly Convergent Penalty Method with Nonsmooth Line Search for Constrained Nonlinear Least Squares. ‎SQU Journal for Science‎, 17(1), pp. 103-124‎. View Article

Mahdavi-Amiri, N., Bartels, R. H. (1989). Constrained nonlinear least squares: An exact penalty approach with projected structured quasi-Newton updates. ACM Transactions on Mathematical Software, 15(3), pp. 220–242 View Article

Mullen, K. M., Vengris, M., van Stokkum, I. H. M. (2007). Algorithms for separable nonlinear least squares with application to modeling time-resolved spectra. Journal of Global Optimization, 38, pp. 201–213 View Article

Nievergelt, Y. (2000). A tutorial history of least squares with applications to astronomy and geodesy. Journal of Computational and Applied Mathematics, 121, pp. 37-72. View Article

Nocedal, J., Overton, M. L. (1985). Projected Hessian updating algorithms for nonlinearly constrained optimization. SIAM Journal of Numerical Analysis, 22(5), pp. 821–850. View Article