Centre for Informatics and Applied Optimization, School of Science, Information Technology and Engineering, University of Ballarat, Victoria, Australia
Received date: February 2010, Revised date: September 2011, Accepted date: May 2012
Visit for more related articles at Global Journal of Technology and Optimization
Solving systems of nonlinear equations is a relatively complicated problem for which a number of different approaches have been presented. In this paper, a new algorithm is proposed for the solutions of systems of nonlinear equations. This algorithm uses a combination of the gradient and the Newton’s methods. A novel dynamic combinatory is developed to determine the contribution of the methods in the combination. Also, by using some parameters in the proposed algorithm, this contribution is adjusted. We use the gradient method due to its global convergence property, and the Newton’s method to speed up the convergence rate. We consider two different combinations. In the first one, a step length is determined only along the gradient direction. The second one is finding a step length along both the gradient and the Newton’s directions. The performance of the proposed algorithm in comparison to the Newton’s method, the gradient method and an existing combination method is explored on several well known test problems in solving systems of nonlinear equations. The numerical results provide evidence that the proposed combination algorithm is generally more robust and efficient than other mentioned methods on some important and difficult problems.
Systems of nonlinear equations, Newton’s Method, Gradient method, Line search, Global convergence
The solutions of systems of equations have a well-developed mathematical and computational theory when solving linear systems, or a single nonlinear equation. The situation is much more complicated when the equations in the system do not exhibit nice linear or polynomial properties. In this general case, both the mathematical theory and computational practices are far from complete understanding of the solution process.
Systems of nonlinear equations arise in various domains of practical importance such as engineering, medicines, chemistry, and robotics [15, 21, 37]. They appear also in many geometric computations such as intersections, minimum distance, creation of centenary curves, and when solving initial or boundary value problems in ordinary or partial differential equations  and . The application of nonlinear systems in load flow calculation in power system has been done by Spong and et. all  in which their results of block Guass-Sidel iteration are compared with those of Newton-Raphson iteration. Solving such a system involves finding all the solutions of equations contained in the mentioned system.
In this paper, we consider the problem of finding solutions to a system of nonlinear equations of the form
where and refers to n variables We denote the i-th component of F by fi, where is a nonlinear function and twice continuously differentiable on a convex set
There is a class of methods for the numerical solutions of the system (1), which arises from iterative procedure used for systems of linear equations . These methods use reduction to simpler one-dimensional nonlinear equations for the components
There are some iterative methods for solving systems of nonlinear equations in the book written by Kelley . A wide range class of iterative methods for solving systems of nonlinear equations has been suggested in the papers [2, 11, 25, 26].
Most of the methods for solving (1) are optimization-based methods [1, 4, 6, 11, 17, 22, 37]. In the approach proposed in , the system (1) is transformed in to a constraint optimization problem. At each step, some equations that are satisfied at the current point are treated as constraints and the other ones as objective functions. In a strategy based on optimization methods, at each iteration, a quadratic function is minimized to determine the next feasible point to step to. The quadratic function is the squared norm of the original system.
To find a solution of (1), one can transform the system (1) into an unconstrained optimization problem and then solving the new unconstrained problem instead by applying an optimization method. The transformed problem is formulated as:
where, here and throughout the paper, stands for the Euclidean norm. Obviously, optimal solutions of problem (2) with the zero value of the objective function correspond to global solutions of system (1).
In the last decades, many publications, both in theoretical and especially numerical issues, have been done for solving the problem (2) [3, 5, 9, 10, 18, 24, 27, 31, 33, 35]. Many search direction methods such as the gradient method, the Newton’s method, the quasi-Newton methods, the conjugate gradient and coordinate direction methods have been applied to find a minimizer of (2).
The steepest descent method (or gradient method) is a commonly used method. It has the globally convergence property, however, this method suffers from the slow speed and is easy plunging into local minima. In order to accelerate these difficulties, many methods have been used . One way is the use of combination of different local optimization methods. It has been found that these methods show significant reduction in the number of iterations and the expense of function evaluations. In recent years, there has been a growing interest in applying these combination methods [7, 29, 30, 36]. Buckley  proposed a strategy of using a conjugate gradient search direction for most iterations and using periodically a quasi- Newton step to improve the convergence. This algorithm offers the user the opportunity to specify the amount of available storage. Wang and et al.  proposed a revised conjugate gradient projection method, that is, a combination of the conjugate projection gradient and the quasi-Newton methods for nonlinear inequality constrained optimization problems. Recently, Y. Shi  proposed a combined method of the Newton’s and the steepest descent methods for solving nonlinear systems of equations within each iteration. Further in , in order to deal with an unconstrained problem, the combination of the steepest descent with the Newton and the quasi-Newton methods were developed and compared with some traditional and existing methods.
Our procedure here for solving systems of nonlinear equations is based on the combination of local optimization methods. We apply the gradient and the Newton’s methods for our combination algorithm. They are combined into an integrated procedure, and especially the dynamic combination is of our interest challenge. The combined algorithms proposed in this paper are different from the existing algorithms [7, 29, 30, 36]. In the other words, we propose a novel algorithm with a new combination which offers the user the opportunity to specify the amount contribution of the methods.
The rest of the paper is organized as follows: Section 2 gives a brief review to preliminaries about optimization. In Section 3, we review the descent methods. We present the proposed combination algorithm in Section 4. The global convergence property of this algorithm has been proved in Section 5. We have demonstrated the efficiency of the proposed algorithm with some experiments in Section 6. Section 7 concludes the paper.
Usually, optimization methods are iterative. The basic idea is that, with an initial guess of the optimal values of the variables, an optimization method generates a sequence an optimization method generates a sequence of improved estimates until it reaches a solution. When is a finite sequence, the last point is the optimal solution; when is infinite, it has a limit point which is the optimal solution of the problem. The strategy used to move from one iterate to the next distinguishes one algorithm from another. A typical behavior of an algorithm which is regarded as acceptable is that the iterates move steadily towards the neighborhood of a point local minimizer, and then rapidly converge to that point. When a given convergence rule is satisfied, the iteration will be terminated. In general, the most natural stopping criterion is
where stands for and f is defined by (2). is a prescribed error tolerance.
Let be the k-th iterate search direction, and step length, then the k-th iteration is (4)
In the trust region strategy, the information gathered about f is used to construct a model function whose behavior near the current point xkis similar to that of the actual objective function f. When x is far from xk the model may not be a good approximation of f. Therefore, the search for a minimizer of the model is restricted to some region around xk.
In the line search strategy, the algorithm chooses a direction dk and searches along this direction from the current iterate xk for a new iterate with a lower function value.
The line search and trust-region approaches differ in the order in which they choose the direction and distance of the move to the next iterate. Line search starts by fixing the direction dk and then identifying an appropriate distance, namely the step length In trust region, firstly a maximum distance is chosen, the trust region radius, and then a direction and a step that attain the best possible improvement subject to this distance constraint is found. If this step proves to be unsatisfactory, the distance measure will be reduced and tried again .
A trust region method is effective since it limits the step to a region of greater confidence in the local model and attempts to utilize more information from the local model for finding a shortened step. However, trust region models are more difficult to formulate and solve than a line search strategy . In this paper, we will focus on line search strategies.
The success of a line search method depends on effective choices of both the direction dk and the step length . It is clarified that the search direction plays a main role in the algorithm and that step length guarantees the global convergence in some cases.
There are two alternatives for finding the distance to move along namely the exact line search and inexact line search [19, 28, 31, 33]. In the exact line search, the following onedimensional minimization problem will be solved to find a step length α
If we choose such that the objective function has acceptable descent amount, i.e., it means the descent (6)
is acceptable by users, such a line search is called inexact line search. Since, in practical computation, exact optimal step length generally cannot be found, and it is also expensive to find almost exact step length, therefore the inexact line search with less computation load is highly popular.
A simple condition we could impose on % in an inexact line search is to require a reduction in :
It has been shown that this requirement is not enough to produce convergence to optimal point [24, 33]. The difficulty is that there is not always a sufficient reduction in f at each step, a concept we discuss next.
There are several inexact line search rules for choosing an appropriate step length for example the Armijo rule, the Goldstein rule, and the Wolfe-Powell rules [24, 31, 33], which are described briefly in the following
Armijo Rule and Goldstein Rule
Armijo rule is as follows:
are tried successively until the above inequality is satisfied for m=mk
Goldstein presented the following rule. Let
be an interval. In order to guarantee the function decreases sufficiently, we want to choose α such that it is away from the two end points of the interval I.
The Goldstein conditions are
(9) and (10)
which exclude those points near the right end point and the left end point.
It is possible that the rule (10) excludes the minimizing value of α outside the acceptable interval. Instead, the Wolfe-Powell gives another rule to replace (10):
Therefore, the step length αk in the Wolfe-Powell rule will be determined along the direction dk satisfying:
The Wolfe-Powell rule is a popular inexact line search rule. We will use it in our algorithm and all experiments in this paper.
The search direction in gradient-based methods often has the form
corresponds to the Newton’s method with being available, where Hk is an exact Hessian of f [24,33]. In quasi-Newton methods, Bk is an approximation to the Hessian Hk that is updated at every iteration by means of a low-rank formula [5, 9, 24, 33]. In the conjugate gradient method, dk is defined by
When dk is defined by (13) and Bk is positive definite, we have and therefore dk is a descent direction.
The search direction dk is generally required to satisfy the descent condition:
Many techniques have been devoted for solving (2), as well as (1). These problems are usually carried out using iterative methods due to the fact that there are generally no analytical methods to solve these problems. Among the variety of the exiting methods, the descent direction methods are the most popular techniques because of their fast convergence property. A general descent direction algorithm is given in the Algorithm 1.
Algorithm 1. A General Descent Framework
0. Lets be a given initial point, and an error tolerance. Each iteration of a descent direction method contains the following steps:
1. If then stop.
2. Compute a descent direction dk at xk satisfying (14).
3. Determine an appropriate step length αk > 0.
4. Set and go to the next iteration.
Let be the level set, and consider the Wolfe-Powell conditions (11) and (12) to determine αk then the global convergence of the Algorithm 1 is given by the following Theorem .
Theorem 1. Let αk in the above descent direction algorithm be defined by (11) and (12). Let also dk satisfies
for some and for all k, where is the angle between dk and If exists and is uniformly continuous on the level set Ω then either gk = 0 for some k, or
Proof can be found in , Theorem 2.5.4.
One of the most widely used methods satisfying Theorem 1 is the gradient method, in which Although the method is globally convergent and usually works well in some early steps, as a stationary point is approached, it may descend very slowly. In fact, it is shown that the convergence rate of the gradient method is at least linear, and the following bound holds (16)
where are the largest and the smallest eigenvalues of the Hessian matrix, respectively.
In order to cope with the above-mentioned difficulties, one can use the Newton’s method with the quadratic convergence property. At the k-th iteration, the classical Newton’s direction is the solution of the following system:
where Hk is the Hessian matrix at xk . If H is positive definite, then the Newton’s direction is a descent direction and consequently the system has a unique solution. Even when H is positive definite, it is not guaranteed that Newton’s method will be globally convergent. Although the Newton’s method generally converges faster than the gradient method, it depends strongly on a starting point. On the other hand, the application of the Newton’s method for solving the nonlinear equations is expensive due to the direct calculations of second order derivatives of the function, H. A number of techniques avoiding the direct computation of H may be used. Upon different approximation there are different methods. In this category are the quasi-Newton methods which approximate second derivatives in a most subtle and efficient way. Another alternative is the use of a fusion of different local optimization methods which lead naturally to powerful algorithms and has been attracted extensive attention in recent years. One of the most successful methods of this category, introduced by Shi , uses a combination of the gradient method and the Newton’s method. This algorithm is an efficient algorithm for solving problem (2) due to its global convergence property. In our experiments, we compare our results with this combination algorithm and refer it by ShA. The direction in algorithm ShA is very close to the Newton’s direction. However, practical implementations show that, in some cases the gradient method can be a more suitable choice than the Newton’s method. For instance, when the difference of the function values, in two previous iterations, and also the value of the gradient in the previous iteration is large enough, the gradient method may work better than the Newton’s method.
Our aim here is to present an algorithm with two different combinations for solving the problem (2), as well as the problem (1). Both proposed combinations are constructed so that they satisfy in the condition of descent methods and as well as the Theorem 1.
Let be four parameters so that Take any positive constants such that and initialize by 1. Let also T and be very large and small positive numbers, respectively, and let The steps of the proposed algorithm are as follows.
Algorithm 2. A Combination of the Gradient and Newton’s Methods
0. Choose a starting point and an error tolerance . For do
1. If then stop.
2. If the Newton’s direction d1is not computable, due to the singularity of the Hessian, then compute the gradient direction
3. Compute the gradient direction d2 and the Newton’s direction d1 at xk that satisfies (17).
4. Set and
5. If step 6.
7. If go to step 9, otherwise go to the next step.
8. Use rules (11) and (12) to determine a step length along the direction and go to step 12.
9. Compute as follows:
10. If and and go back to step 9.
11. Consider one of the following two versions to calculate Sk: a. Use rules (11) and (12) to determine a step length along the direction . If otherwise set
b. Use rules (11) and (12) to determine a step length along the direction
Parameters b1 ,b2 and b3are positive constants so that they offer the user the opportunity to specify the amount contribution of the methods. More precisely, when the slope of the function is slight, the algorithm tends to the Newton’s method, otherwise the contribution of gradient is increased and is considered close to the gradient method. In (18), when a difference between two previous values of the function is high then is close to 0 and Moreover, this equation is a dynamic form and has a crucial rule in the algorithm so that it specifies the amount contribution of the methods. It, also, guaranties that, near the solution, we get the optimal point with a super-linear convergence rate.
In step 11 of the above algorithm, we use two different strategies by means of the combination. Step 11.α is a new combination and different from the existing methods in the literature. In this combination, the step length αk is determined only along the gradient direction. In other words, we use a novel combination of the pure Newton’s method (i.e., αk = 1) and the gradient method. The second one is the usual combination which has been developed in some research works. The step length in this case is found along a combination of the gradient and the Newton’s directions.
Global convergence Theorem
Here, we establish the global convergence of the proposed combination algorithm based on the global convergence property of the Theorem 1.
Theorem 2. Consider using the Algorithm 2 to solve the problem (2). Assume that exists and is uniformly continuous on the level set . Then either gk = 0 for some k, or
Proof. Let assume that is bounded below for al k. It is clear that in this case, for all k. Denote We will show that the direction dk obtained by the algorithm satisfies condition (15) of the Theorem
for all where is the angle between
Suppose is obtained at Step 8. Then (Step 8), and it is easy to see that it means (15) holds. Now, we consider other cases: case 1: is obtained via Step 11.b and case 2: is obtained via Step 11.a. We will proof each case separately as follows:
Take any intege k
1. In this case, we assume that is obtained at Step 11.b. Then is chosen as a descent direction and according to steps 9-10, the number can be chosen so that the inequality Therefore, we have
that is, (19) holds and therefore the obtained direction, dk, satisfies in the assumption of the Theorem 1, hence the remainder proof is similar to the proof of the Theorem 1 in . 2. In this case, we assume Sk is obtained at Step 11.a, i.e. Sk =
If the number of cases in Sk obtained by Step 11.a is finite, then it means Sk is defined by the gradient direction,d2 for all sufficiently large k and therefore the proof will be easily obtained.
Now suppose it is not finite, i.e., there is a subsequence such that is obtained via Step 11.a.
By considering the first condition in 11.a, since fk is bounded below we have
In addition, from we obtain
Now, we are going to show Suppose it is not true. Then there exist a subsequence such that
Here we consider two cases: does not converge to zero. The case (i) leads to contradiction by applying the second condition in Step 11.a. In the case (ii), let us consider which is contradiction by Therefore, the proof is complete,
Experiments and Results
We have evaluated the performance of the proposed algorithm for several well known benchmark test problems given in [20, 34].In the proposed algorithm, we use two different combination as described in steps 11.a and 11.b; we refer these cases as 'Ala' and 'Alb', respectively. The group of methods we have compared includes Ala, Alb, the gradient method (GM), the Newton’s method (NM), and ShA presented in . In all algorithms we use the Wolfe-Powell line search rules to find an acceptable step length.
The calculations were carried out using MATLAB. The comparison of the methods is based on the following criteria: all methods are terminated if the gradient converges to a predefined tolerance, or the iteration number exceeds 500.
The parameters used in this paper are:
1. Dimension n.
2. Function definition,
3. Standard initial point x0.
Problem 1. Helical Valley function
Problem 2. Powell Singular function
Problem 3. Wood function
Problem 4. Watson function
Problem 5. Extended Kearfott function
Problem 6. Extended Eiger-Sikorski-Stenger
Problem 7. Variably dimensional function
Problem 8. Discrete Boundary Value function
Problem 9. Extended Rosenbrock function
Problem 10. Trigonometric function
Table 1 lists the performance of the above-mentioned algorithms relative to the number of iterations used. We have multiplied the given initial points by 10 to have an additional initial point. In this table, “TP” and “IP” stand for test problem and initial point, respectively. Table 2 shows the summary of convergence results for the Table 1. In order to compare the algorithms with more initial points, we have generated 50 random initial points uniformly distributed from their domains with the intersection of The summary of the convergence results of the algorithms considering these random
initial points is given in Table 3. In these tables, notations “AC” and “NC” stand for the almost convergence and not convergence, respectively. Convergence means that the method finds the solution and almost convergence means that the method finds a solution almost close to the optimal local solution and otherwise not convergence.
Te numerical results in Tables 1 to 3, demonstrate the high performance of the proposed combination algorithm compared to other mentioned methods. This is confirmed by the number of iterations obtained, and the convergence properties. For example, the proposed algorithm, Ala, converges in all test problems for two different initial points. Alb converges in nine test problems out of ten. This algorithm finds the solution in the Wood function almost near the optimal solution. Although the algorithm proposed by Shi, ShA, convergences in nine test problems out of ten, but it fails to find the solution in the problem 3. Also, the number of iterations obtained by ShA is more than the proposed algorithms, in average. This is worse for the Newton’s and the gradient methods with more AC and NC properties.
A combined algorithm of the gradient and the Newton’s methods has been presented for solving systems of nonlinear equations. We have considered two different combinations. One of them is a usual case which has been recently introduced in some research works. Another one is a new combination and different from others in the literature. According to the numerical experiments, it is clear the proposed algorithm, especially the proposed algorithm with the new combination, is more efficient than others.