Optimization is a frequently encountered problem in real life. We need to make a decision to achieve something within a set of constraints, and we need to maximize or minimize our objective based on some measurement. For example, a restaurant may need to decide how many workers (of each position) to hire to serve the customers, with the constraint that workers cannot work overtime, and the objective is to minimize the cost. A car manufacturer may need to decide how many units of each model to be produced, within the constraint of the storage capacity of its warehouse, and maximize the profit of its sales.
If there aren’t a lot of choices, an exhaustive search (e.g. breath-first, depth-first, best-first, A* ... etc.) can be employed to evaluate each option, see if it meets all the constraints (i.e., a feasible solution), and then compute its objective value. Then we sort all feasible solutions based on their corresponding objective value and pick as our decision the solution that has the max (or min) objective value. Unfortunately, real world problems usually involve a large number (exponentially large due to combinatorial explosion) of choices, making the exhaustive search in many cases impractical.
When this happens, two other solution approaches are commonly used:
1) Mathematical Programming
2) Local Greedy Search
Mathematical programming is a classical way to solve an optimization problem. It is a family of approaches including linear programming, integer programming, quadratic programming, and even non-linear programming. The development process usually goes through the following steps.
From a problem description, the modeler will express the problem into a mathematical structure containing three parts:
- Variables: There are two types of variables.
- Data variables contain the current value of your business environment (e.g. cost of hiring a waiter, price of a car).
- Decision variables hold the decision you make to optimize your objective (e.g. how many staff to hire, how many cars to make).
- Constraints: A set of rules that you cannot break. Effectively, constraints disallow certain combinations of your decision variables and are mainly used to filter out infeasible solutions. In typical settings, constraints are expressed as a system of linear inequality equations where a linear expression of decision variables are specified on the left-hand side and a value is specified on the right-hand side after an inequality comparison.
- Objective function: Encapsulates the quantitative measure of how our goal has been achieved. In a typical setting, the objective function is expressed as a single linear (or quadratic) combination of decision variables
After the mathematical structure is defined, the modeler will submit it to a solver, which will output the best solution. The process can be depicted as follows:
Expressing the problem in the mathematical structure is the key design of the solution. There are many elements to be considered, which I describe below.
The first consideration is how to represent your decision, especially whether the decision is a quantity (real number), a number of units (integer) or a binary decision (integer 0, 1).
The next step is to represent the constraints in terms of inequality equations of linear combination of decision variables. You need to think whether the constraints are hard or soft constraints. Hard constraints should be expressed in the constraint part of the problem. Notice the solver will not consider any solution once it violates any constraints. Therefore, if the solver cannot find a feasible solution that fulfills all constraints, it will simply abort. In other words, it won't return you a solution that violates the smallest number of constraints. If you want the solver to tell you that because you have rooms to relax them, you should model these soft constraints in the objective function rather than in the constraint section. Typically, you define an objective function that quantifies the degree of violation. The solver will then give you the optimal solution (violating least number of constraints) rather than just telling you no solution is found.
Finally, you define the objective function. In most real-life problems, you have multiple goals in mind (e.g. you want to minimize your customers' wait time in the queue, but you also want to minimize your cost of hiring staff). First, you express each goal as a linear expression of decision variables and then take the weighted average among different goals (which is also a linear combination of all decision variables) to form the final objective function. How to choose the weights among different goals is a business question, based on how much you are willing to trade off between conflicting goals.
There are some objectives that cannot be expressed by a linear expression of decision variables. One frequently encountered example is about minimizing the absolute deviation from a target value (i.e. no matter the deviation is a positive and negative value). A common way is to minimize the sum of the square of the difference. But after we square it, it is no longer a linear expression. To address this requirement, there is a more powerful class of solver called "quadratic programming" which relaxes the objective function to allow a 2-degree polynomial expression.
After you express the problem in the mathematical structure, you can pass it to a solver (there are many open source and commercial solvers available), which you can treat it as a magical black box. The solver will output an optimal solution (with a value assigned to each decision variable) that fulfills all constraints and maximizes (or minimizes) the objective function.
Once you receive the solution from the solver, you need to evaluate how "reliable" this optimal solution is. There may be fluctuations in the data variables due to collection errors, the data may have a volatile value that changes rapidly, or the data is an estimation of another unknown quantity.
Ideally, the optimal solution doesn't change much when we fluctuate the data values within its error bound. In this case, our optimal solution is stable against errors in our data variables and therefore is a good one. However, if the optimal solution changes drastically when the data variable fluctuates, we say the optimal solution is unreliable and cannot use it. In this case, we usually modify each data variable one at a time to figure out which data variable is causing a big swing of optimal solutions and try to reduce our estimation error of that data variable.
The sensitivity analysis is an important step to evaluate the stability and hence the quality of our optimal solution. It also provides guidance on which area we need to invest effort to make the estimation more accurate.
Mathematical Programming allows you to specify your optimization problem in a very declarative manner and output an optimal solution if it exists. It should be the first-to-go solution. The downside of Mathematical Programming is that it requires linear constraints and linear (or quadratic) objectives. Additionally, it has limits in terms of the number of decision variables and constraints that it can store (and this limitation varies among different implementations). Although there are non-linear solvers, the number of variables it can take is even smaller.
Usually the first thing to do is to test out if your problem is small enough to fit in the mathematically programming solver. If not, you may need to use another approach.
Greedy Local Search
The key words here are "local" and "greedy." A greedy local search starts at a particular selection of decision variables, and then it looks around the surrounding neighborhood (hence the term "local") and within that neighborhood finds the best solution (hence the term "greedy"). Then it moves the current point to this best neighbor and then the process repeats until the new solution stays at the same place (i.e. there is no better neighbor found).
If the initial point is selected to be a non-feasible solution, the greedy search will first try to find a feasible solution by looking for neighbors that has fewer constraint violations. After the feasible solution is found, the greedy search will only look for neighbors that fulfill all constraints and within which finds a neighbor with the best objective value. Another good initialization strategy is to choose the initial point to be a feasible (of course not optimal) solution and then start the local search from there.
Because local search limits its search within a neighborhood, it can control the degree of complexity by simply limiting the scope of neighbors. Greedy local search can evaluate a large number of variables by walking across a chain of neighborhoods. However, because a local search only navigates towards the best neighbor within a limited scope, it loses the overall picture and relies on the path to be a convex shape. If there are valleys along the path, the local search will stop there and never reach the global optimum. This is called a local optimal trap because the search is trapped within a local maximum and not able to escape. There are some techniques to escape from local optimum that I describe in my previous blog post.
When performing a greedy local search, it is important to pick the right greedy function (also called the heuristic function) as well as the right neighborhood. Although it is common to choose the greedy function to be the same objective function itself, they don't have to be the same. In many good practices, the greedy function is chosen to be the one that has a high correlation with the objective function, but can be computed much faster. On the other hand, evaluating objective functions of every point in the neighborhood can also be a very expensive operation, especially when the data has a high dimension and the number of neighbors can be exponentially large. A good neighbor function can be combined with a good greedy function such that we don't need to evaluate each neighbor to figure out which one has the best objective value.
Combining the Two Approaches
In my experience, combining the two approaches can be a very powerful solution itself. At the outer layer, we are using the greedy local search to navigate the direction towards a better solution. However, when we search for the best neighbor within the neighborhood, we use Mathematical Programming by taking the greedy function as the objective function, and we use the constraints directly. This way, we can use a very effective approach to search for the best solution within the neighborhood and can use the flexible neighborhood scoping of local search to control the complexity of the mathematical programming solver.