## Bipartite Matching Assignment Problem Matrix

Here’s a problem: Your business assigns contractors to fulfill contracts. You look through your rosters and decide which contractors are available for a one month engagement and you look through your available contracts to see which of them are for one month long tasks. Given that you know how effectively each contractor can fulfill each contract, how do you assign contractors to maximize the overall effectiveness for that month?

This is an example of the assignment problem, and the problem can be solved with the classical Hungarian algorithm.

The Hungarian algorithm (also known as the Kuhn-Munkres algorithm) is a polynomial time algorithm that maximizes the weight matching in a weighted bipartite graph. Here, the contractors and the contracts can be modeled as a bipartite graph, with their effectiveness as the weights of the edges between the contractor and the contract nodes.

In this article, you will learn about an implementation of the Hungarian algorithm that uses the Edmonds-Karp algorithm to solve the linear assignment problem. You will also learn how the Edmonds-Karp algorithm is a slight modification of the Ford-Fulkerson method and how this modification is important.

## The Maximum Flow Problem

The maximum flow problem itself can be described informally as the problem of moving some fluid or gas through a network of pipes from a single source to a single terminal. This is done with an assumption that the pressure in the network is sufficient to ensure that the fluid or gas cannot linger in any length of pipe or pipe fitting (those places where different lengths of pipe meet).

By making certain changes to the graph, the assignment problem can be turned into a maximum flow problem.

## Preliminaries

The ideas needed to solve these problems arise in many mathematical and engineering disciplines, often similar concepts are known by different names and expressed in different ways (e.g., adjacency matrices and adjacency lists). Since these ideas are quite esoteric, choices are made regarding how generally these concepts will be defined for any given setting.

This article will not assume any prior knowledge beyond a little introductory set theory.

The implementations in this post represent the problems as directed graphs (digraph).

### DiGraphs

A **digraph** has two attributes, **setOfNodes** and **setOfArcs**. Both of these attributes are sets (unordered collections). In the code blocks on this post, I’m actually using Python’s frozenset, but that detail isn’t particularly important.

(Note: This, and all other code in this article, are written in Python 3.6.)

### Nodes

A **node** is composed of two attributes:

- : This represents any data object.

### Arcs

An **arc** is composed of three attributes:

: This is a

**node**, as defined above.: This is a

**node**, as defined above.: This represents any data object.

The set of **arcs** in the **digraph** represents a binary relation on the **nodes** in the **digraph**. The existence of **arc** implies that a relationship exists between and .

In a directed graph (as opposed to an undirected graph), the existence of a relationship between and does **not** imply that a similar relationship between and exists.

This is because in an undirected graph, the relationship being expressed is not necessarily symmetric.

### DiGraphs

**Nodes** and **arcs** can be used to define a **digraph**, but for convenience, in the algorithms below, a **digraph** will be represented using as a dictionary.

Here’s a method that can convert the graph representation above into a dictionary representation similar to this one:

And here’s another one that converts it into a dictionary of dictionaries, another operation that will be useful:

When the article talks about a **digraph** as represented by a dictionary, it will mention to refer to it.

Sometimes it’s helpful to fetch a **node** from a **digraph** by it up through its (unique identifier):

In defining graphs, people sometimes use the terms **node** and vertex to refer to the same concept; the same is true of the terms **arc** and edge.

Two popular graph representations in Python are this one which uses dictionaries and another which uses objects to represent graphs. The representation in this post is somewhere in between these two commonly used representations.

This is my **digraph** representation. There are many like it, but this one is mine.

### Walks and Paths

Let be a finite sequence (ordered collection) of **arcs** in a **digraph** such that if is any **arc** in except for the last, and follows in the sequence, then there must be a **node** in such that .

Starting from the first **arc** in , and continuing until the last **arc** in , collect (in the order encountered) all **nodes** as defined above between each two consecutive **arcs** in . Label the ordered collection of **nodes** collected during this operation .

If any

**node**appears more than once in the sequence then call a Walk on**digraph**.Otherwise, call a path from to on

**digraph**.

### Source Node

Call **node** a **source node** in **digraph** if is in and contains no **arc** such that .

### Terminal Node

Call **node** a **terminal node** in **digraph** if is in and contains no **arc** such that .

### Cuts and s-t Cuts

A cut of a connected**digraph** is a subset of **arcs** from which partitions the set of **nodes** in . is connected if every **node** in and has at least one **arc** in such that either or , but .

The definition above refers to a subset of **arcs**, but it can also define a partition of the **nodes** of .

For the functions and , is a **node** in set **G.setOfNodes** of **digraph**, and is a **cut** of :

Let be a **cut** of **digraph**.

is a **cut** of **digraph** if: is called an **x-y cut** if . When the **node** in a **x-y cut** is a **source node** and **node** in the **x-y cut** is a **terminal node**, then this **cut** is called a **s-t cut**.

### Flow Networks

You can use a **digraph** to represent a flow network.

Assign each **node**, where is in an that is a :

Assign each **arc**, where is in and that is a .

and are positivereal numbers.

and are also positive real numbers.

For every node **node** in :

**Digraph** now represents a **flow network**.

The **flow** of refers to the for all **arcs** in .

### Feasible Flows

Let **digraph** represent a **flow network**.

The **flow network** represented by has **feasible flows** if:

For every

**node**in except for**source nodes**and**terminal nodes**: .For every

**arc**in : .

Condition 1 is called a conservation constraint.

Condition 2 is called a capacity constraint.

## Cut Capacity

The **cut capacity** of an **s-t cut** with **source node** and **terminal node** of a **flow network** represented by a **digraph** is:

## Minimum Capacity Cut

Let be an **s-t cut** of a **flow network** represented by a **digraph**.

is the **minimum capacity cut** of the **flow network** represented by if there is no other **s-t cut** in this **flow network** such that:

## Stripping the Flows Away

I would like to refer to the **digraph** that would be the result of taking a **digraph** and stripping away all the flow data from all the **nodes** in and also all the **arcs** in .

## Maximum Flow Problem

A **flow network** represented as a **digraph**, a **source node** in and a **terminal node** in , can represent a **maximum flow problem** if:

Label this representation:

Where , , and is an identifier for the problem instance.

## Maximum Flow Solution

Let represent a **maximum flow problem**. The solution to can be represented by a **flow network** represented as a **digraph**.

**Digraph** is a **feasible** solution to the **maximum flow problem** on input if:

.

is a

**flow network**and has**feasible flows**.

If in addition to 1 and 2:

- There can be no other
**flow network**represented by**digraph**such that and .

Then is also an **optimal** solution to .

In other words a **feasible maximum flow solution** can be represented by a **digraph**, which:

Is identical to

**digraph**of the corresponding**maximum flow problem**with the exception that the , and the of any of the**nodes**and**arcs**may be different.Represents a

**flow network**that has**feasible flows**.

And, it can represent an **optimal maximum flow solution** if additionally:

- The for the
**node**corresponding to the**terminal node**in the**maximum flow problem**is as large as possible (when conditions 1 and 2 are still satisfied).

If **digraph** represents a **feasible maximum flow solution** : this follows from the **max flow, min cut theorem** (discussed below). Informally, since is assumed to have **feasible flows** this means that **flow** can neither be ‘created’ (except at **source node**) nor ‘destroyed’ (except at **terminal node**) while crossing any (other) **node** (**conservation constraints**).

Since a **maximum flow problem** contains only a single **source node** and a single **terminal node**, all flow ‘created’ at must be ‘destroyed’ at or the **flow network** does **not** have **feasible flows** (the **conservation constraint** would have been violated).

Let **digraph** represent a **feasible maximum flow solution**; the value above is called the **s-t Flow Value** of .

Let:

This means that is a **successor state** of , which just means that is exacly like with the exception that the values of for arcs in may be different than for arcs in .

Here’s a visualization of a along with its associated . Each **arc** in the image has a label, these labels are , each **node** in the image has a label, and these labels are .

## s-t Cut Flow

Let represent a and let represent a **cut** of . The **cut flow** of is defined:

**s-t cut flow** is the sum of flows from the partition containing the **source node** to the partition containing the **terminal node** minus the sum of flows from the partition containing the **terminal node** to the partition containing the **source node**.

## Max Flow, Min Cut

Let represent a **maximum flow problem** and let the solution to be represented by a **flow network** represented as **digraph**.

Let be the **minimum capacity cut** of the **flow network** represented by .

Because in the **maximum flow problem** flow originates in only a single **source node** and terminates at a single **terminal node** and, because of the **capacity constraints** and the **conservation constraints**, we know that the all of the flow entering must cross any **s-t cut**, in particular it must cross . This means:

## Solving the Maximum Flow Problem

The basic idea for solving a **maximum flow problem** is to start with a **maximum flow solution** represented by **digraph**. Such a starting point can be . The task is then to use and by some greedy modification of the values of some **arcs** in to produce another **maximum flow solution** represented by **digraph** such that cannot still represent a **flow network** with **feasible flows** and . As long as this process continues, the quality () of the most recently encountered **maximum flow solution** () is better than any other **maximum flow solution** that has been found. If the process reaches a point that it knows that no other improvement is possible, the process can terminate and it will return the optimal**maximum flow solution**.

The description above is general and skips many proofs such as whether such a process is possible or how long it may take, I’ll give a few more details and the algorithm.

## The Max Flow, Min Cut Theorem

From the book Flows in Networks by Ford and Fulkerson, the statement of the **max flow, min cut theorem** (Theorem 5.1) is:

For any network, the maximal flow value from to is equal to the minimum cut capacity of all cuts separating and .

Using the definitions in this post, that translates to:

The solution to a represented by a **flow network** represented as **digraph** is optimal if:

I like this proof of the theorem and Wikipedia has another one.

The **max flow, min cut theorem** is used to prove the correctness and completeness of the **Ford-Fulkerson method**.

I’ll also give a proof of the theorem in the section after **augmenting paths**.

## The Ford-Fulkerson Method and the Edmonds-Karp Algorithm

CLRS defines the Ford-Fulkerson method like so (section 26.2):

## Residual Graph

The Residual Graph of a **flow network** represented as the **digraph** can be represented as a **digraph**:

returns the sum of for all

**arcs**in the subset of where**arc**is in the subset if and .returns the sum of for all

**arcs**in the subset of where**arc**is in the subset if and .

Briefly, the **residual graph** represents certain actions which can be performed on the **digraph**.

Each pair of **nodes** in of the **flow network** represented by **digraph** can generate 0, 1, or 2 **arcs** in the **residual graph** of .

The pair does not generate any

**arcs**in if there is no**arc**in such that and .The pair generates the

**arc**in where represents an**arc**labeled a**push flow arc**from to if .The pair generates the

**arc**in where represents an**arc**labeled a**pull flow arc**from to if .

Each

**push flow arc**in represents the action of adding a total of flow to**arcs**in the subset of where**arc**is in the subset if and .Each

**pull flow arc**in represents the action of subtracting a total of flow to**arcs**in the subset of where**arc**is in the subset if and .

Performing an individual **push** or **pull** action from on the applicable **arcs** in might generate a **flow network** without **feasible flows** because the **capacity constraints** or the **conservation constraints** might be violated in the generated **flow network**.

Here’s a visualization of the **residual graph** of the previous example visualization of a **maximum flow solution**, in the visualization each **arc** represents .

## Augmenting Path

Let be a **max flow problem**, and let be the **residual graph** of .

An augmenting path for is any **path** from to .

It turns out that an augmenting **path** can be applied to a **max flow solution** represented by **digraph** generating another **max flow solution** represented by **digraph** where if is not **optimal**.

Here’s how:

In the above, is some tolerance value for rounding the flow values in the network. This is to avoid cascading imprecision of floating point calculations. So, for example, I used to mean round to 10 significant digits.

Let , then represents a **feasible max flow solution** for . For the statement to be true, the **flow network** represented by must have **feasible flows** (not violate the **capacity constraint** or the **conservation constraint**.

Here’s why: In the method above, each **node** added to the new **flow network** represented by **digraph** is either an exact copy of a **node** from **digraph** or a **node** which has had the same number added to its as its . This means that the **conservation constraint** is satisfied in as long as it was satisfied in . The **conservation constraint** is satisfied because we explicitly check that any new **arc** in the network has ; thus, as long as the **arcs** from the set which were copied unmodified into do not violate the **capacity constraint**, then does not violate the **capacity constraint**.

It’s also true that if is not **optimal**.

Here’s why: For an **augmenting path** to exist in the **digraph** representation of the **residual graph** of a **max flow problem** then the last **arc** on must be a ‘push’ **arc** and it must have . An **augmenting path** is defined as one which terminates at the **terminal node** of the **max flow problem** for which it is an **augmenting path**. From the definition of the **residual graph**, it is clear that the last **arc** in an **augmenting path** on that **residual graph** must be a ‘push’ **arc** because any ‘pull’ **arc** in the **augmenting path** will have and from the definition of **path**. Additionally, from the definition of **path**, it is clear that the **terminal node** is only modified once by the method. Thus modifies exactly once and it increases the value of because the last **arc** in the must be the **arc** which causes the modification in during . From the definition of as it applies to ‘push’ **arcs**, the can only be increased, not decreased.

## Some Proofs from Sedgewick and Wayne

The book Algorithms, fourth edition by Robert Sedgewich and Kevin Wayne has some wonderful and short proofs (pages 892-894) that will be useful. I’ll recreate them here, though I’ll use language fitting in with previous definitions. My labels for the proofs are the same as in the Sedgewick book.

**Proposition E:** For any **digraph** representing a **feasible maximum flow solution** to a **maximum flow problem**, for any .

**Proof:** Let . **Proposition E** holds for directly from the definition of **s-t flow value**. Suppose that there we wish to move some **node** from the s-partition () and into the t-partition , to do so we need to change , which could change and invalidate **proposition E**. However, let’s see how the value of will change as we make this change. **node** is at equilibrium meaning that the sum of flow into **node** is equal to the sum of flow out of it (this is necessary for to represent a **feasible solution**). Notice that all flow which is part of the entering **node** enters it from the s-partition (flow entering **node** from the t-partition either directly or indirectly would not have been counted in the value because it is heading the wrong direction based on the definition). Additionally, all flow exiting will eventually (directly or indirectly) flow into the **terminal node** (proved earlier). When we move **node** into the t-partition, all the flow entering from the s-partition must be added to the new value; however, all flow exiting must the be subtracted from the new value; the part of the flow heading directly into the t-partition is subtracted because this flow is now internal to the new t-partition and is not counted as . The part of the flow from heading into **nodes** in the s-partition must also be subtracted from : After is moved into the t-partition, these flows will be directed from the t-partition and into the s-partition and so must not be accounted for in the , since these flows are removed the inflow into the s-partition and must be reduced by the sum of these flows, and the outflow from the s-partition into the t-partition (where all flows from s-t must end up) must be reduced by the same amount. As **node** was at equilibrium prior to the process, the update will have added the same value to the new value as it subtracted thus leaving **proposition E** true after the update. The validity of **proposition E** then follows from induction on the size of the t-partition.

Here are some example **flow networks** to help visualize the less obvious cases where **proposition E** holds; in the image, the red areas indicate the s-partition, the blue areas represent the t-partition, and the green **arcs** indicate an **s-t cut**. In the second image, flow between **node** A and **node** B increases while the flow into **terminal node** t doesn’t change.:

**Corollary:** No **s-t cut flow** value can exceed the capacity of any **s-t cut**.

**Proposition F. (max flow, min cut theorem):** Let be an **s-t flow**. The following 3 conditions are equivalent:

There exists an

**s-t cut**whose capacity equals the value of the flow .is a

**max flow**.There is no

**augmenting path**with respect to .

Condition 1 implies condition 2 by the corollary. Condition 2 implies condition 3 because the existence of an augmenting path implies the existence of a flow with larger values, contradicting the maximality of . Condition 3 implies condition 1: Let be the set of all **nodes** that can be reached from with an **augmenting path** in the **residual graph**. Let be the remaining **arcs**, then must be in (by our assumption). The **arcs** crossing from to then form an **s-t cut** which contains only **arcs** where either or . If this were otherwise then the **nodes** connected by an **arc** with remaining residual capacity to would be in the set since there would then be an **augmenting path** from to such a **node**. The flow across the **s-t cut** is equal to the **s-t cut’s** capacity (since **arcs** from to have flow equal to capacity) and also to the value of the **s-t flow** (by **proposition E**).

This statement of the **max flow, min cut theorem** implies the earlier statement from Flows in Networks.

**Corollary (integrality property):** When capacities are integers, there exists an integer-valued max flow, and the Ford-Fulkerson algorithm finds it.

Proof: Each **augmenting path** increases the flow by a positive integer, the minimum of the unused capacities in the ‘push’ **arcs** and the flows in the ‘pull’ **arcs**, all of which are always positive integers.

This justifies the **Ford-Fulkerson method** description from **CLRS**. The method is to keep finding **augmenting paths** and applying to the latest coming up with better solutions, until no more **augmenting path** meaning that the latest **maximum flow solution** is **optimal**.

## From Ford-Fulkerson to Edmonds-Karp

The remaining questions regarding solving **maximum flow problems** are:

How should

**augmenting paths**be constructed?Will the method terminate if we use real numbers and not integers?

How long will it take to terminate (if it does)?

The **Edmonds-Karp algorithm** specifies that each **augmenting path** is constructed by a **breadth first search** (BFS) of the **residual graph**; it turns out that this decision of point 1 above will also force the algorithm to terminate (point 2) and allows the asymptotic time and space complexity to be determined.

First, here’s a **BFS** implementation:

I used a deque from the python collections module.

To answer question 2 above, I’ll paraphrase another proof from Sedgewick and Wayne: **Proposition G.** The number of **augmenting paths** needed in the **Edmonds-Karp** algorithm with **nodes** and **arcs** is at most . Proof: Every **augmenting path** has a *bottleneck***arc**- an **arc** that is deleted from the **residual graph** because it corresponds either to a ‘push’ **arc** that becomes filled to capacity or a ‘pull’ **arc** through which the flow becomes 0. Each time an **arc** becomes a bottleneck **arc**, the length of any **augmenting path** through it must increase by a factor of 2. This is because each **node** in a **path** may appear only once or not at all (from the definition of **path**) since the **paths** are being explored from shortest **path** to longest that means that at least one more **node** must be visited by the next path that goes through the particular bottleneck **node** that means an additional 2 **arcs** on the path before we arrive at the **node**. Since the **augmenting path** is of length at most each **arc** can be on at most **augmenting paths**, and the total number of **augmenting paths** is at most .

The **Edmonds-Karp algorithm** executes in . If at most **paths** will be explored during the algorithm and exploring each **path** with **BFS** is then the most significant term of the product and hence the asymptotic complexity is .

Let be a .

The version above is inefficient and has worse complexity than since it constructs a new **maximum flow solution** and new a **residual graph** each time (rather than modifying existing **digraphs** as the algorithm advances). To get to a true solution the algorithm must maintain both the **digraph** representing the **maximum flow problem state** and its associated **residual graph**. So the algorithm must avoid iterating over **arcs** and **nodes** unnecessarily and update their values and associated values in the **residual graph** only as necessary.

To write a faster **Edmonds Karp** algorithm, I rewrote several pieces of code from the above. I hope that going through the code which generated a new **digraph** was helpful in understanding what’s going on. In the fast algorithm, I use some new tricks and Python data structures that I don’t want to go over in detail. I will say that and are now treated as strings and uids to **nodes**. For this code, let be a

Here’s a visualization of how this algorithm solves the example **flow network** from above. The visualization shows the steps as they are reflected in the **digraph** representing the most up-to-date **flow network** and as they are reflected in the **residual graph** of that flow network. **Augmenting paths** in the **residual graph** are shown as red paths, and the **digraph** representing the problem the set of **nodes** and **arcs** affected by a given **augmenting path** is highlighted in green. In each case, I’ll highlight the parts of the graph that will be changed (in red or green) and then show the graph after the changes (just in black).

Here’s another visualization of how this algorithm solving a different example **flow network**. Notice that this one uses real numbers and contains multiple **arcs** with the same and values.

**Also notice that because Arcs with a ‘pull’ ResidualDatum may be part of the Augmenting Path, the nodes affected in the DiGraph of the Flown Network _may not be on a path in .

## Bipartite Graphs

Suppose we have a **digraph**, is bipartite if it’s possible to partition the **nodes** in into two sets ( and ) such that for any **arc** in it **cannot be true** that in and in . It **also cannot be true** that in and in .

In other words is **bipartite** if it can be partitioned into two sets of **nodes** such that every **arc** must connect a **node** in one set to a **node** in the other set.

## Testing Bipartite

Suppose we have a **digraph**, we want to test if it is **bipartite**. We can do this in by greedy coloring the graph into two colors.

First, we need to generate a new **digraph**. This graph will have will have the same set of **nodes** as , but it will have more **arcs** than . Every **arc** in will create 2 **arcs** in ; the first **arc** will be identical to , and the second **arc** reverses the director of ( ).

## Matchings and Maximum Matchings

Suppose we have a **digraph** and is a subset of **arcs** from . is a matching if for any two **arcs** and in : . In other words, no two **arcs** in a **matching** share a **node**.

**Matching**, is a maximum matching if there is no other **matching** in such that . In other words, is a **maximum matching** if it is the largest set of **arcs** from that still satisfies the definition of **matching** (the addition of any **arc** not already in the matching will break the **matching** definition).

A **maximum matching** is a **perfect matching** if every for **node** in there exists an **arc** in where .

## Maximum Bipartite Matching

A **maximum bipartite matching** is a **maximum matching** on a **digraph** which is **bipartite**.

Given that is **bipartite**, the problem of finding a **maximum bipartite matching** can be transformed into a **maximum flow problem** solvable with the **Edmonds-Karp** algorithm and then the **maximum bipartite matching** can be recovered from the solution to the **maximum flow problem**.

Let be a **bipartition** of .

To do this, I need to generate a new **digraph** () with some new **nodes** () and some new **arcs** (). contains all the **nodes** in and two more **nodess**, (a **source node**) and (a **terminal node**).

will contain one **arc** for each . If an **arc** is in and is in and is in then include in (adding a ).

If is in and is in , then include in .

The definition of a **bipartite** graph ensures that no **arc** connects any **nodes** where both **nodes** are in the same partition. also contains an **arc** from **node** to each **node** in . Finally, contains an **arc** each **node** in to **node**. for all in .

First partition the **nodes** in the two disjoint sets ( and ) such that no **arc** in is directed from one set to the same set (this partition is possible because is **bipartite**). Next, add all **arcs** in which are directed from to into . Then create a single **source node** and a single **terminal node** and create some more **arcs**

Then, construct a .

## Minimal Node Cover

A node cover in a **digraph** is a set of **nodes** () from such that for any **arc** of this must be true: .

A minimal node cover is the smallest possible set of **nodes** in the graph that is still a **node cover**. König’s theorem states that in a **bipartite** graph, the size of the **maximum matching** on that graph is equal to the size of the **minimal node cover**, and it suggests how the **node cover** can recovered from a **maximum matching**:

Suppose we have the **bipartition** and the **maximum matching**. Define a new **digraph**, , the **arcs** in are the union of two sets.

The first set is **arcs** in , with the change that if and then and are swapped in the created **arc** give such **arcs** a attribute to indicate that they were derived from **arcs** in a **matching**.

The second set is **arcs** NOT in , with the change that if and then and are swapped in the created **arc** (give such **arcs** a attribute).

Next, run a **depth first search** (DFS) starting from each **node** in which is neither nor for any **arc** in . During the DFS, some **nodes** are visited and some are not (store this information in a field). The **minimum node cover** is the union of the **nodes** and the **nodes**.

This can be shown to lead from a **maximum matching** to a **minimal node cover** by a proof by contradiction, take some **arc** that was supposedly not covered and consider all four cases regarding whether and belong (whether as or ) to any **arc** in **matching**. Each case leads to a contradiction due to the order that DFS visits **nodes** and the fact that is a **maximum matching**.

Suppose we have a function to execute these steps and return the set of **nodes** comprising the **minimal node cover** when given the **digraph**, and the **maximum matching**:

## The Linear Assignment Problem

The linear assignment problem consists of finding a maximum weight matching in a weighted bipartite graph.

Problems like the one at the very start of this post can be expressed as a **linear assignment problem**. Given a set of workers, a set of tasks, and a function indicating the profitability of an assignment of one worker to one task, we want to maximize the sum of all assignments that we make; this is a **linear assignment problem**.

Assume that the number of tasks and workers are equal, though I will show that this assumption is easy to remove. In the implementation, I represent **arc weights** with an attribute for an **arc**.

## Kuhn-Munkres Algorithm

The Kuhn-Munkres Algorithm solves the **linear assignment problem**. A good implementation can take time, (where is the number of **nodes** in the **digraph** representing the problem). An implementation that is easier to explain takes (for a version which regenerates **DiGraphs**) and for (for a version which maintains **DiGraphs**). This is similar to the two different implementations of the **Edmonds-Karp** algorithm.

For this description, I’m only working with complete bipartite graphs (those where half the **nodes** are in one part of the **bipartition** and the other half in the second part). In the worker, task motivation this means that there are as many workers as tasks.

This seems like a significant condition (what if these sets are not equal!) but it is easy to fix this issue; I talk about how to do that in the last section.

The version of the algorithm described here uses the useful concept of **zero weight arcs**. Unfortunately, this concept only makes sense when we are solving a **minimization** (if rather than maximizing the profits of our worker-task assignments we were instead minimizing the cost of such assignments).

Fortunately, it is easy to turn a **maximum linear assignment problem** into a **minimum linear assignment problem** by setting each the **arc** weights to where . The solution to the original **maximizing problem** will be identical to the solution **minimizing problem** after the **arc** weights are changed. So for the remainder, assume that we make this change.

The **Kuhn-Munkres algorithm** solves **minimum weight matching in a weighted bipartite graph** by a sequence of **maximum matchings** in unweighted **bipartite** graphs. If a we find a **perfect matching** on the **digraph** representation of the **linear assignment problem**, and if the weight of every **arc** in the **matching** is zero, then we have found the **minimum weight matching** since this matching suggests that all **nodes** in the **digraph** have been **matched** by an **arc** with the lowest possible cost (no cost can be lower than 0, based on prior definitions).

No other **arcs** can be added to the **matching** (because all **nodes** are already matched) and no **arcs** should be removed from the **matching** because any possible replacement **arc** will have at least as great a weight value.

If we find a **maximum matching** of the subgraph of which contains only **zero weight arcs**, and it is not a **perfect matching**, we don’t have a full solution (since the **matching** is not **perfect**). However, we can produce a new **digraph** by changing the weights of **arcs** in in a way that new 0-weight **arcs** appear and the optimal solution of is the same as the optimal solution of . Since we guarantee that at least one **zero weight arc** is produced at each iteration, we guarantee that we will arrive at a **perfect matching** in no more than **|G.setOfNodes|^{2}=N^{2}** such iterations.

Suppose that in **bipartition**, contains **nodes** representing workers, and represents **nodes** representing tasks.

The algorithm starts by generating a new **digraph**. . Some **arcs** in are generated from **nodes** in . Each such **node**

With the cost matrix from the example above in mind, the Hungarian algorithm operates on this key idea: if a number is added to or subtracted from all of the entries of any one row or column of a cost matrix, then on optimal assignment for the resulting cost matrix is also an optimal assignment for the original cost matrix.

**Solve for the optimal solution for the example in the introduction using the Hungarian algorithm described above.**

Here is the initial adjacency matrix.

Subtract the smallest value in each row from the other values in the row.

Now subtract the smallest value in each column from all other values in the column.

Draw lines through the row and columns that have the 0 entries such that the fewest lines possible are drawn.

There are 2 lines drawn, and 2 is less than 3 so there is not yet the optimal number of zeroes.

Find the smallest entry not covered by any line. Subtract this entry from each row that isn’t crossed out, and then add it to each column that is crossed out. Then, go back to Step 3.

2 is the smallest entry.

First, subtract from the uncovered rows.

Now add to the covered columns.

Now go back to step 3, drawing lines through the rows and columns that have 0 entries.

There are 3 lines (which is ) so we are done. The assignment will be where the 0's are in the matrix such that only one 0 per row and column is part of the assignment.

Replace the original values.

The Hungarian algorithm tells us that it is cheapest to go with the musician from company C, the chef from company B, and the cleaners from company A.

We can verify this by brute-force.

108 + 135 + 250 = 493

108 + 148 + 175 = 431

150 + 125 + 250 = 525

150 + 148 + 150 = 448

122 + 125 + 175 = 422

122 + 135 + 150 = 407

We can see that 407 is the lowest price and matches the assignment the Hungarian algorithm determined.

## Leave a Comment

(0 Comments)