Aug 1, 2021

An intro to NLP (not the one that you think!)



In the data science era, the first thing that may come to your mind when you hear "NLP" is probably the Natural Language Processing, isn't it? I have studied natural language processing to some extent and no doubt that it is a really interesting topic to dive into! But, there is still another (much older) NLP: Non-Linear Programming: a non-linear optimization problem with non-linear (equality or inequality) constraints. In this post, I am going to talk about this NLP but quite briefly. A more in-depth discussion requires writing a book, not a blog post!
.

If you have passed some elementary machine learning or optimization course you should have learned the Gradient Descent Method which is very famous in deep learning these days. But, the GD method (without any modification) is not well-suited for constrained optimization problems, where you need to minimize a function while satisfying a set of other equalities and/or inequalities. Let me give you an example:
  • minimize \(f(x_1,x_2) = -x_1\,x_2\)
  • subject to
    • \(h(x_1,x_2) = 20x_1+15x_2-30=0\)
    • \(g(x_1,x_2) = (x_1/2)^2 + x_2^2-1\leq 0\)
    • \(0\leq x_1,x_2\leq 3\)
In this example, we have three type of constraints: equality constraint \(h(x_1,x_2)=0\), inequality constraint \(g(x_1,x_2)\leq 0\) and bound constraint on design or state variable \(x_1,x_2\in [0,3]\). So, the main question is how to take these constraints into the minimization problem.

Lagrange multiplier method

The Lagrange multiplier method is well-known for equality-constrained optimization. Simply speaking, we minimize \(f(\mathbf{x})-\lambda h(\mathbf{x})\) over \(x\) and \(\lambda\). I am not going to discuss the details, so if you feel like you need more explanation, do not hesitate to watch this video:
If we did not have the inequality constraint, the problem was almost solved with this Lagrange multiplier method. But how to bring that constraint into the play? This is where a very interesting concept of the NLP appears: the slack variable, whose role is to transform the inequality constraint into an equality constraint: we define a new variable \(z\) such that \[g(x_1,x_2)+z^2 = (x_1/2)^2 + x_2^2-1+z^2= 0\] and now we have the equality constraint!

We can then apply the Lagrange multiplier method with this additional slack variable \(z\) and find the Lagrangian as: \[ \mathcal{L}(x_1,x_2,z,\lambda,\beta) = f(x_1,x_2) - \lambda\,h(x_1,x_2) -\beta\, \left(g(x_1,x_2)+z^2\right) \] We use \(\lambda\) for the Lagrange multiplier of the equality constraint and \(\beta\leq 0 \) for the inequality constraint (for a maximizing problem \(\beta\geq 0\))

So, the first-order optimality condition will be obtained as: \[ \partial_{x_1}\mathcal{L} = \partial_{x_1}f-\lambda\,\partial_{x_1}h-\beta\,\partial_{x_1}g=0\\ \partial_{x_1}\mathcal{L} = \partial_{x_2}f-\lambda\,\partial_{x_2}h-\beta\,\partial_{x_2}g=0\\ \partial_{z}\mathcal{L} =2\beta\,z=0\\ \partial_{\lambda}\mathcal{L} = h(x_1,x_2)=0\\ \partial_{\beta}\mathcal{L} = g(x_1,x_2)+z^2=0 \] Note that the slack variable can be easily removed from the system (using the third and last equations) and we are left with: \[ \partial_{x_1}f-\lambda\,\partial_{x_1}h-\beta\,\partial_{x_1}g=0\\ \partial_{x_2}f-\lambda\,\partial_{x_2}h-\beta\,\partial_{x_2}g=0\\ \beta\,g=0\\ h = 0 \] Now we can see that this system is equivalent to applying the Lagrange multiplier method to for \(\mathcal{L} = f - \lambda\,h -\beta\, g\) subject to \(h=0\) and \(\beta\, g = 0\), as if the slack variable has never been existed!


The equation \(\beta\,g=0\) brings up two cases:
  • \(\beta = 0\) and \(g< 0\) (non-binding or inactive constraint)
  • \(\beta\neq 0\) and \(g=0\) (binding or active constraint)

Non-binding constraint case

For the non-binding constraint case, the constraint \(g\leq 0\) is removed completely from the system and we should solve \[ \partial_{x_1}f-\lambda\,\partial_{x_1}h=0\\ \partial_{x_2}f-\lambda\,\partial_{x_2}h=0\\ h = 0 \] or \[ -x_2-20\lambda = 0\\ -x_1-15\lambda = 0\\ 20x_1+15x_2=30 \] which gives the result \[\boxed{x_1=0.75, \qquad x_2=1, \qquad \lambda = -0.05}\] 
Notice that it is the same solution that one could have found naively by substituting \(x_1\) or \(x_2\) in the function \(f\) using the equality constraint (which is a line). However, as one can see in the following picture, the solution is not a feasible solution as it does not respect the inequality constraint:

Binding constraint case

For the binding constraint case, one gets a non-linear system of equations to be solved: \[ -x_2-20\lambda-0.5\,\beta\,x_1 = 0\\ -x_1-15\lambda-2\,\beta\,x_2 = 0\\ 20x_1+15x_2=30\\ 0.25x_1^2+x_2^2=1 \] which can be solved, for example, using fsolve function in scipy.optimize, resulting in \[\boxed{ x_1 = 0.81511541, \qquad x_2=0.91317946, \qquad \lambda = -0.04391383, \qquad \beta = -0.08563926}\] which satisfies all the constraints, also \(\beta<0\).

Closing remarks

  • A very similar procedure will be applied in the general case with several equality and inequality constraint, under the name of Karush–Kuhn–Tucker conditions.
  • One can also take the design variable bounds into the Lagrangian, for example check out the Interior-point or barrier method.
  • For numerical purposes, rather than handling all different binding and non-binding cases separately, one would use, for example, Exterlor Penalty Functlon method or Augmented Lagrange Multiplier method, which adds a penalty term to the main minimization problem and enforces respecting the constraints, like: \[ \alpha_h \Big[\sum_k h_k(\mathbf{X})^2\Big]+\alpha_g\Big[\left(\max(0,g(\mathbf{x})\right)^2\Big] \] for some positive constants \(\alpha_h\) and \(\alpha_g\), which penalizes non-zero values of \(h\) and positive values of \(g\) over the discretized domain indexed by \(k\). Then, an unconstrained optimization method (such as GD) can be used: This is for example the case in minimize function in scipy.optimize. In the jupyter notebook you can see that the result is the same as the "analytical" one.

No comments:

Post a Comment