Karush-Kuhn-Tucker (KKT) conditions
What do you need to know to understand this topic?
- The importance of gradients into finding the minimum/maximum of functions
- The method of Lagrange Multipliers
Sections
What are the Karush-Kuhn-Tucker (KKT) ?
The method of Lagrange Multipliers is used to find the solution for optimization problems constrained to one or more equalities. When our constraints also have inequalities, we need to extend the method to the KKT conditions.
The new problem can be formulated as:
x∗=argminxf(x)
subject to hi(x)=0,∀i=1,..,m
subject to gi(x)≤0,∀i=1,..,n
In words, find the solution that minimizes f(x), as long as all equalities hi(x)=0 and all inequalities gi(x)≤0 hold. It is easy to see that any equality or inequality constraint can be defined, so long as all terms are in the left side of the equation. The inequality conditions are added to the method of Lagrange Multipliers in a similar way to the equalities: Put the cost function as well as the constraints in a single minimization problem, but multiply each equality constraint by a factor λi and the inequality constraints by a factor μi (the KKT multipliers). In our example, we would have mequalities and n inequalities. Hence the expression for the optimization problem becomes:
x∗=argmin xL(x,λ,μ)=argmin xf(x)+∑mi=1λihi(x)+∑ni=1μigi(x),
where L(x,λ,μ) is the Lagrangian and depends also on λ and μ, which are vectors of the multipliers.
As usual, we find the roots of the gradient of the loss function with respect to x to find the extremum of the function. However, the constraints in the function will make x depend on λ and μ. Furthermore, we have number of variables equal to the elements in x (say k) plus the number of multipliers (m+n), and, as of now, we only have k equations coming from the gradient with respect to x. We have seen before that we can differentiate the function with respect to each lagrange multiplier λi to get m more equations. These equations are restricting the set of solutions to the ones that meet the equality constraints.
The new challenge is how to come up with n more equations coming from the inequality constraints. In order to do so, think of what the inequality constraints mean. If the extremum of the original function is in gi(x∗)<0, then this constraint will never play any role in changing the extremum compared with the problem without the constraint. Therefore, its coefficient μican be set to zero. If, on the other hand, the new solution is at the border of the constraint, then gi(x∗)=0. The next graphical representation helps to understand this concept.
Fig. 1 - Graphical explanation for the KKT conditions.
In both situations, the equation:
μigi(x)=0
is necessary for the solution to our new problem. Therefore, we get n equations from the inequality constraints. The constraint terms are always zero in the set of possible solutions, thereby not affecting the result of the loss function. The coefficients λi can have any value. However, the coefficients μi are limited to nonnegative values. To see why that is, and with the aid of Fig. 2, imagine x∗ is in the region gi(x)=0, so that μi can be different from zero.
Fig. 2 - Graphical explanation for the sign of the μ.
x∗=argmin xf(x)+μigi(x)
0=∇f(x)+μi∇gi(x)
μi=−∇f(x)∇gi(x)(1)
At such point x∗, the gradient of f(x) and of gi(x) both with respect to x have opposite directions. Therefore, according to (1), μi must be positive.
The KKT conditions
We are now ready to enumerate the KKT conditions:
∇xf(x)+∑mi=1∇xλihi(x)−∑ni=1μi∇xgi(x)=0 (maximization)
-
Stationarity
∇xf(x)+∑mi=1∇xλihi(x)+∑ni=1μi∇xgi(x)=0 (minimization)
-
Equality constraints
∇λf(x)+∑mi=1∇λλihi(x)+∑ni=1μi∇λgi(x)=0
-
Inequality constraints a.k.a. complementary slackness condition
μigi(x)=0,∀i=1,..,n
μi≥0,∀i=1,..,n
An example
Consider we are trying to maximize the transmission rate of a multi-carrier communication system with N channels. Each carrier/channel can carry a signal power pi≥0 under noise ni>0. The total power must be smaller or equal than P. The transmission rate of each carrier is proportional to:
log2(1+pini)
Given this information, and noting that maximizing ln(x) also maximizes log2(x), the problem is:
max∑Ni=1ln(1+pini)
subject to ∑Ni=1pi≤P
subject to pi≥0,∀i=1,..N
Changing pi≥0 to −pi≤0 and noting that this a maximization problem, the Lagrangian is then:
L(p,μ)=ln(1+pini)−μ0(∑Ni=1pi−P)−∑Ni=1μi(−pi)
L(p,μ)=ln(1+pini)+μ0(P−∑Ni=1pi)+∑Ni=1μipi
Taking the stationarity condition, we get:
∇piL(p,μ)=1pi+ni−μ0+μi=0
pi+ni=1μ0−μi
Since ni>0, then μ0>μi, which also means that μ0>0. From the complimentary slackness conditions:
μ0(P−∑Ni=1pi)=0
μipi=0
μ0,μi≥0
and since μ0>0, we know that
P−∑Ni=1pi=0
P=∑Ni=1pi
which means that pi cannot be zero (all of them since they all have the same role in the optimization problem), forcing μi=0,∀i=1,..,N. Then
pi+ni=1μ0−μi=1μ0
pi=1μ0−ni
The final equations to solve the problem are:
pi=1μ0−ni,∀i=1,..,N
∑Ni=1pi=P
which are easily solvable.
Sufficiency and regularization
The KKT conditions are necessary to find an optimum, but not necessarily sufficient. A set of problems where these conditions are also sufficient are the ones where the functions f(x) and gi(x) are continuously differentiable and convex, and the functions hi(x) are linear.
Furthermore, a certain value of x only satisfies these conditions if it is regular. There are several ways of determining the regularity of x, some of which are in the wikipedia page, while a more extensive explanation can be found in the book Nonlinear Programming by Bertsekas.
The table below summarizes the KKT conditions depending on these two types of conditions. The problem can either have sufficient conditions or not and x can either be regular or not.
Sufficient conditions? | x is regular? | |
No | Yes | |
No | Do not hold | Necessary |
Yes | Do not hold | Sufficient |
the implementing python code of optimization problem following
Exercise
import numpy as np
from scipy.optimize import minimize
def objective(x):
return x[0]*x[3]*(x[0]+x[1]+x[2])+x[2]
def constraint1(x):
return x[0]*x[1]*x[2]*x[3]-25.0
def constraint2(x):
sum_eq = 40.0
for i in range(4):
sum_eq = sum_eq - x[i]**2
return sum_eq
# initial guesses
n = 4
x0 = np.zeros(n)
x0[0] = 1.0
x0[1] = 5.0
x0[2] = 5.0
x0[3] = 1.0
# show initial objective
print('Initial SSE Objective: ' + str(objective(x0)))
# optimize
b = (1.0,5.0)
bnds = (b, b, b, b)
con1 = {'type': 'ineq', 'fun': constraint1}
con2 = {'type': 'eq', 'fun': constraint2}
cons = ([con1,con2])
solution = minimize(objective,x0,method='SLSQP',\
bounds=bnds,constraints=cons)
x = solution.x
# show final objective
print('Final SSE Objective: ' + str(objective(x)))
# print solution
print('Solution')
print('x1 = ' + str(x[0]))
print('x2 = ' + str(x[1]))
print('x3 = ' + str(x[2]))
print('x4 = ' + str(x[3]))