What I Learnt From AUT Optimization Course

I took the optimization course this semester (Fall 2018), which was offered for the master students under supervision of Dr. Maryam Amirmazlaghani at AUT. As many of my friends and other students asked me about how the course was offered and what are the main topics convered in the course and how do they relate to machine learning, I decided to write this article to summarize what I learnt and further clarify what you can learn by taking this course. Feel free to contribute to this article and hope you’ll find it useful!

Table of Contents

Sources

First, I mention the source materials that were used in this course.

The source books

Slides

Also, the slides used to teach in the classroom can be found here (available for AUT students only!).

Introduction

The course started with some introductions of the optimization problems and then it moved on to defining an optimization problem and the algorithms used to solve them, and also further categorizing them into sub-sections.

What is optimization?

Optimization is portrayed in terms of:

  • Defining an objective (a quantitive measurement of performance)
  • Finding the variables the objective depends on
  • Finding the values of the variables that optimize the objective

Mathematically, we model our problem as an optimization problem, and then use an optimization algorithm to solve it.

What are optimization problems?

Optimization problems are a set of equations in the form below, in which \(x\) is the input vector of variables, \(f\) is the objective function and \(C_i\) s are the constraint functions (scalar functions of \(x\) ) that define certain equations and inequalities that \(x\) must satisfy.

$$\;{\tiny{\begin{matrix}\\ \normalsize{\text{minimize}} \\ ^{\large x}\end{matrix}} }\; \qquad f(x) \qquad \qquad \qquad \qquad \quad \; \; \,$$

$$\text{subject to} \qquad g_i(x) \le 0, \quad i=1,\cdots,m$$ $$\qquad \qquad \; \, \qquad h_j(x) = 0, \quad j=1,\cdots,p$$

The set of \(x\) ’s that satisfy all of the \(C_i\) ’s and are in Domain of \(f\) , are called the feasible set for this problem; Also, if there is no \(C_i\) , the problem is called an unconstrained optimization problem, and otherwise, it’s a constrained or general optimization problem.

How do optimization algorithms work?

Optimization algorithms are normally some iterative algorithms, which begin with an initial guess of \(x\) and then generate a sequence of improved estimates until they terminate (satisfy a certain criterion). The strategy to improve the estimation is the key difference between different algorithms.

Where is optimization used?

As you might’ve guessed already, optimization can be used in a vast variety of fields and solve many kinds of problems, from solving a certain company’s employee allocation problems, to defining best exchange portfolio based on a person’s budget, or some well known (and hot – these days at least) topics like finding the optimal parameters for a vide variety of machine learning algorithms like:

  • Different Regression Algorithms
  • Linear and Kernel SVMs
  • K-Means Clustering
  • Neural Network Training
  • and so many more!

As they say nowadays:

Optimization is the heart of machine learning.

Solving Optimization Problems

Solving the optimization problems at the general form is a really tough challenge! There are currently (and probably, always :D) no efficient algorithms to solve the general optimization problem. Instead, we can solve some specific optimization problems efficiently called linear programs, least square problems and convex problems. General problems are usually solved in terms of defining the solving their convex subproblems.

Defining Convex Problems

The course then goes through defining convex sets, convex functions and some other related linear algebra stuff and some methods for checking if a function or set is convex or not, including some optimization problem equivalences that help defining convexity of functions and sets. It then uses these concepts to define and verify a convex problem, and also studying some attributes of these problems that help us in building efficient algorithms for solving these kind of problems.

Multi Objective Optimization Problems

Another type of convex problems discussed are the multi objective optimization problems which introduces the idea of multiple dimensioned objective functions (or optimizing alike problems at the same times). There comes the introduction of pareto optimal problems and scalarization for these kinds of problems.

Unconstrained Optimization

Then the course moves on to introducing some algorithms for solving the unconstrained problems. The algorithms discusses are divided into two types:

Line Search Methods

These algorithms include two main steps in each iterations:

  1. Defining the direction in which we’ll make the next step towards to optimal point. Approaches for this step include:

  2. Calculating the step length for each iteration to take onside the direction chosen. The approaches for this part include:

Trust Region Methods

These algorithms define an area around the current point, and find a direction and a step length simultaneously to go to another point within the area defined, the algorithms discussed in this part are:

Implementation of these algorithms:

I’ve implemented these algorithms and demonstrated their example usage in this repository as a homework for this course: Unconstrained Optimization on GitHub

Unconstrained Optimization Applications

Then two main applications of unconstrained optimization are described which include:

Approximation And Fitting

These applications include several topics:

  • Norm Approximation
  • Least Norm
  • Least Penalty
  • Regularized Least Penalty (Multi Objective)
  • Optimal Input Design
  • Robust Approximation

Statistical Estimation

These problems include estimating parameters of a known distribution and solving the well known MLE and MAP problems (using the methods described) which are used vastly in machine learning.

Constrained Optimization

The course moves through defining some algorithms for solving constrained optimization problems. As duality theorem is used in these algorithms, or as it’s used elsewhere for solving these problems (which we’ll come to shortly), first the duality theorem and some of its applications in optimization are discussed and then they’re used in solving the constrained optimization problems.

Duality Theorem

According to the duality theorem, we can find a dual problem for every optimization problem, which gives us a lower bound on the optimal value of \(p^*\) (the optimal value for the original problem). Then comes two types of duality: ( \(d^*\) is the optimal value for the dual problem)

  • Weak Duality: \(p^* \ge d^*\)
  • Strong Duality: \(p^* = d^*\)

And so, we call \(p^* - d^*\) the duality gap for our dual and original problems. Thus, the dual problem can be used to find (exact or inexact) lower bounds on the optimal values of original problem. There comes some conditions such as Salter’s conditions and KKT conditions to check for strong duality.

SVM for example, is a sample of an algorithm which can be solved using its dual problem.

Solving Equality Constrained Problems

Finally, the course moves on to solving the constrained problems. First we consider the equality constrained problems, in which each \(C_i\) is an equality constraint, and there exists at least one \(C_i\) . The methods for solving these problems include:

  • Eliminating Equality Constraints
  • Solving Dual Problem of the original problem
  • Newton’s method
    • This is a modification of the original newton algorithm which considers the equality constraint and uses duality to find a step length which both moves through the optimal point and takes into account the feasibility of the point according to the equality constraints.
    • There is also another modification which enables this method to use some infeasible starting point.

Solving General Constrained Problems

As the last topic, we move on solving the general constrained optimization problem. The approach taken here is to approximately formulate the inequality constrained problem as an equality constrained problem. We use a logarithmic barrier to make this approximation work, which gives us a central path defined based on the parameter of the logarithmic barrier (called \(t\) ). Then two algorithms are used for calculating the optimal value using the central path idea:

  • Log Barrier Method
    • The idea behind this method is to choose a small \(t\) and a random starting point at the beginning and then increasing \(t\) based on satisfying certain criterion until reaching the desired tolerance for each step.
  • Primal Dual Interior Point Method
    • The idea behind this algorithm is to use a modified version of the KKT conditions and compute a primal dual search direction in each step and use it in conjunction with a line search method to make a step in each iteration and continuing until reaching desired tolerance for primal and dual residuals.

I’ve implemented these two algorithms to run on a QP Program and demonstrated their example usage in this repository as a homework for this course: Constrained Optimization for QP on GitHub