<<Up     Contents

Lagrange multiplier

Lagrange multipliers are a method for solving certain optimization problems. The class of problems to be solved deals with finding local extrema of multidimensional functions with one-dimensional contraints in the form of a curve. The method reduces the constrained problem to an unconstrained problem which can be solved by the usual gradient method.

Formalism of the method of Lagrange multipliers

Let f (x1, x2, … xn) be a function defined on an n-dimensional open set. f has to have continuous partial derivatives on that set. The constraint is defined by the curve which satisfies g (x1, x2, … xn) = C which is contained in the open set (C is a constant). Additionally, Del.gif g0 everywhere on the curve (where Del.gif is the gradient). The task is now to find a local maximum (or minimum) of f(x) on the set {xRn | g (x) = C}. By setting the derivative of f along the curve g (x) = C to zero, one can show that there exists a real number λ (the Lagrange multiplier) such that

<math>\nabla (f+\lambda g)=0</math>
or - written differently
<math>\frac{\partial f}{\partial x_i} + \lambda \frac{\partial g}{\partial x_i} = 0 \mbox{ for all } i = 1,\cdots,n</math>
for each x that is such a local extremum. Because the solution of this equation still gives values outside of the curve, the constraint g (x) = C has to be used again. This results in all possible solutions.

Example

Suppose we wish to find the discrete probability distribution with maximal information entropy. Then

<math>f(p_1,p_2,\cdots,p_n) = -\sum_{k=1}^n p_k\log_2 p_k</math>
Of course, the sum of these probabilities equals 1, so our constraint function is
<math>g(p_1,p_2,\cdots,p_n)=\sum_{k=1}^n p_k=1.</math>

To find the point of maximum entropy (depending on the probabilities), we can use the Lagrange multiplier. We set:

<math>\mbox{For all } i = 1,\cdots,n</math>

<math>\frac{\partial}{\partial p_i}\left(-\sum_{k=1}^n p_k \log_2 p_k + \lambda\sum_{k=1}^n p_k\right) = 0</math>
Carrying out the differentation of these n equations, we get
<math>-\left(\frac{1}{\ln 2}+\log_2 p_i \right) + \lambda = 0</math>
This shows that all pi are equal (because they depend on λ only). By using the constraint ∑k pk = 1 again, we get the uniform distribution:
<math>p_i = \frac{1}{n}</math>

For another example, see also Derivation of the partition function.

wikipedia.org dumped 2003-03-17 with terodump