Understanding Kadane’s solution to the maximum subarray problem
There is something called the maximum subarray problem, and it goes something like this.
You are given an array of numbers: something like 3,5,-2,-1,5,3,1,-4,-6,5. Your job is to come up with the subarray with the highest sum.
For this example it would be the subarray [3,5,-2,-1,5,3,1], which has a sum of 14. No other subarray gives a higher sum.
Let’s look at a few more:
- The sequence -3,-3,-2,-4,-2,-5,-4 has the highest sum being -2, given by the subarray [-2].
- The sequence 1,2,-1,-2,1 has the highest sum being 3, given by the subarray [1,2].
- The sequence 3,2,1,4 has the highest sum being 10, given by the subarray [3,2,1,4].
It’s an interesting problem with many applications. How do we solve it?
Solving the problem Link to heading
While there are many ways to solve this problem, the best solution is Kadane’s Algorithm. It’s a deceptively simple algorithm, yet also deceptively difficult to understand.
Here’s the Python algorithm given on Wikipedia:
def max_subarray(A):
max_ending_here = max_so_far = A[0]
for x in A[1:]:
max_ending_here = max(x, max_ending_here + x)
max_so_far = max(max_so_far, max_ending_here)
return max_so_far
Here’s an R implementation, with a trace added to help figure out what’s going on.
find_max_subarray = function(arr){
max_ending_here = arr[1]
max_so_far = arr[1]
for(i in 2:length(arr)){
max_ending_here = max(arr[i], max_ending_here + arr[i])
max_so_far = max(max_so_far, max_ending_here)
print(paste( max_ending_here,max_so_far,arr[i])) #for tracing
}
return(max_so_far)
}
Try running the algorithm yourself. Feed it different subsequences and look at the trace to see if you can figure out how it works.
Understanding this algorithm is not a simple task. Here’s how I understand it.
Understanding Kadane’s Algorithm Link to heading
For each number in the sequence, we do this:
Each number “decides” if it wants to use the previous subarray sum, or if it wants to start a new subarray from scratch. At every comparison, it also checks to see if its subarray sum is higher than any previous one.
Let’s look at an example – the sequence 1,-2,3,-1,2.
We iterate through the numbers, starting at the second number in the sequence (-2).
At this point, the maximum subarray sum seen so far is 1 - the sum of the first element in the sequence. This is also the previous subarray sum.
The new subarray sum of -1 is passed to the next number. This is the sum of the current subarray [1,-2].
The next number (3) gets the previous subarray sum of-1:
We start a new sequence at this point. The current subarray is [3] and the current subarray sum is 3. The maximum subarray sum is also updated, since 3 is higher than everything we had before.
The next number (-1) is passed the previous subarray sum of 3:
Now the subarray is [3,-1] and the subarray sum is 2. Do you see how this goes now?
The last number is passed the subarray sum of 2:
The subarray at this point is [3,-1,2] and the subarray sum is 4. The max_so_far variable is updated to read 4.
At this point the sequence has ended and max_so_far is returned - which is 4. This is the highest subarray sum of the sequence, and the subarray was [3,-1,2].
And that’s it!