Introduction
In many problems, where we are given a set of elements such that we can divide them into two parts. To solve the problem, we are interested in knowing the smallest element in one part and the biggest element in the other part. This pattern is an efficient approach to solve such problems.
This pattern uses two Heaps to solve these problems; A Min Heap to find the smallest element and a Max Heap to find the biggest element.
Let’s jump onto our first problem to see this pattern in action.
Find the Median of a Number Stream (medium)
Top Interview 150 | 295. Find Median from Data Stream Design Gurus Educative.io
Introduction
Problem Statement
Design a class to calculate the median of a number stream. The class should have the following two methods:
insertNum(int num)
: stores the number in the classfindMedian()
: returns the median of all numbers inserted in the class
If the count of numbers inserted in the class is even, the median will be the average of the middle two numbers.
Example 1:
1 | 1. insertNum(3) |
Constraints:
- -10^5 <= num <= 10^5
- There will be at least one element in the data structure before calling
findMedian
. - At most 5 * 10^4 calls will be made to
insertNum
andfindMedian
.
Solution
As we know, the median is the middle value in an ordered integer list. So a brute force solution could be to maintain a sorted list of all numbers inserted in the class so that we can efficiently return the median whenever required. Inserting a number in a sorted list will take O(N) time if there are ‘N’ numbers in the list. This insertion will be similar to the Insertion sort. Can we do better than this? Can we utilize the fact that we don’t need the fully sorted list - we are only interested in finding the middle element?
Assume ‘x’ is the median of a list. This means that half of the numbers in the list will be smaller than (or equal to) ‘x’ and half will be greater than (or equal to) ‘x’. This leads us to an approach where we can divide the list into two halves: one half to store all the smaller numbers (let’s call it smallNumList
) and one half to store the larger numbers (let’s call it largNumList
). The median of all the numbers will either be the largest number in the smallNumList
or the smallest number in the largNumList
. If the total number of elements is even, the median will be the average of these two numbers.
The best data structure that comes to mind to find the smallest or largest number among a list of numbers is a Heap). Let’s see how we can use a heap to find a better algorithm.
- We can store the first half of numbers (i.e.,
smallNumList
) in a Max Heap. We should use a Max Heap as we are interested in knowing the largest number in the first half. - We can store the second half of numbers (i.e.,
largeNumList
) in a Min Heap, as we are interested in knowing the smallest number in the second half. - Inserting a number in a heap will take O(logN), which is better than the brute force approach.
- At any time, the median of the current list of numbers can be calculated from the top element of the two heaps.
Let’s take the Example-1 mentioned above to go through each step of our algorithm:
insertNum(3)
: We can insert a number in the Max Heap (i.e. first half) if the number is smaller than the top (largest) number of the heap. After every insertion, we will balance the number of elements in both heaps, so that they have an equal number of elements. If the count of numbers is odd, let’s decide to have more numbers in max-heap than the Min Heap.insertNum(1)
: As ‘1’ is smaller than ‘3’, let’s insert it into the Max Heap.Now, we have two elements in the Max Heap and no elements in Min Heap. Let’s take the largest element from the Max Heap and insert it into the Min Heap, to balance the number of elements in both heaps.
findMedian()
: As we have an even number of elements, the median will be the average of the top element of both the heaps -> (1+3)/2 = 2.0(1+3)/2=2.0insertNum(5)
: As ‘5’ is greater than the top element of the Max Heap, we can insert it into the Min Heap. After the insertion, the total count of elements will be odd. As we had decided to have more numbers in the Max Heap than the Min Heap, we can take the top (smallest) number from the Min Heap and insert it into the Max Heap.findMedian()
: Since we have an odd number of elements, the median will be the top element of Max Heap -> 3. An odd number of elements also means that the Max Heap will have one extra element than the Min Heap.insertNum(4)
: Insert ‘4’ into Min Heap.findMedian()
: As we have an even number of elements, the median will be the average of the top element of both the heaps -> (3+4)/2 = 3.5(3+4)/2=3.5
Code
Here is what our algorithm will look like:
1 | from heapq import * |
Time complexity
The time complexity of the insertNum()
will be O(logN) due to the insertion in the heap. The time complexity of the findMedian()
will be O(1) as we can find the median from the top elements of the heaps.
Space complexity
The space complexity will be O(N) because, as at any time, we will be storing all the numbers.
*Sliding Window Median (hard)
480. Sliding Window Median Design Gurus Educative.io
Problem Statement
Given an array of numbers and a number ‘k’, find the median of all the ‘k’ sized sub-arrays (or windows) of the array.
Example 1:
Input: nums=[1, 2, -1, 3, 5], k = 2
Output: [1.5, 0.5, 1.0, 4.0]
Explanation: Lets consider all windows of size ‘2’:
- [1, 2, -1, 3, 5] -> median is 1.5
- [1, 2, -1, 3, 5] -> median is 0.5
- [1, 2, -1, 3, 5] -> median is 1.0
- [1, 2, -1, 3, 5] -> median is 4.0
Example 2:
Input: nums=[1, 2, -1, 3, 5], k = 3
Output: [1.0, 2.0, 3.0]
Explanation: Lets consider all windows of size ‘3’:
- [1, 2, -1, 3, 5] -> median is 1.0
- [1, 2, -1, 3, 5] -> median is 2.0
- [1, 2, -1, 3, 5] -> median is 3.0
Constraints:
- 1 <= k <= nums.length <= 10^5
- -2^31 <= nums[i] <= 2^31 - 1
Solution
This problem follows the Two Heaps pattern and share similarities with Find the Median of a Number Stream. We can follow a similar approach of maintaining a max-heap and a min-heap for the list of numbers to find their median.
The only difference is that we need to keep track of a sliding window of ‘k’ numbers. This means, in each iteration, when we insert a new number in the heaps, we need to remove one number from the heaps which is going out of the sliding window. After the removal, we need to rebalance the heaps in the same way that we did while inserting.
Here is the visual representation of the algorithm:
Code
Here is what our algorithm will look like:
1 | from heapq import * |
Time complexity
The time complexity of our algorithm is O(N\K)* where ‘N’ is the total number of elements in the input array and ‘K’ is the size of the sliding window. This is due to the fact that we are going through all the ‘N’ numbers and, while doing so, we are doing two things:
- Inserting/removing numbers from heaps of size ‘K’. This will take O(logK)
- Removing the element going out of the sliding window. This will take O(K) as we will be searching this element in an array of size ‘K’ (i.e., a heap).
Space complexity
Ignoring the space needed for the output array, the space complexity will be O(K) because, at any time, we will be storing all the numbers within the sliding window.
Maximize Capital (hard)
Top Interview 150 | 502. IPO) Design Gurus Educative.io
Given a set of investment projects with their respective profits, we need to find the most profitable projects. We are given an initial capital and are allowed to invest only in a fixed number of projects. Our goal is to choose projects that give us the maximum profit. Write a function that returns the maximum total capital after selecting the most profitable projects.
We can start an investment project only when we have the required capital. After selecting a project, we can assume that its profit has become our capital, and that we have also received our capital back.
Example 1:
1 | Input: Project Capitals=[0,1,2], Project Profits=[1,2,3], Initial Capital=1, Number of Projects=2 |
Example 2:
1 | Input: Project Capitals=[0,1,2,3], Project Profits=[1,2,3,5], Initial Capital=0, Number of Projects=3 |
Constraints:
- 1 <= numberOfprojects <= 10^5
- 0 <= initialCapital <= 10^9
- n == profits.length
- n == capital.length
- 1 <= n <= 10^5
- 0 <= profits[i] <= 10^4
- 0 <= capital[i] <= 10^9
Solution
While selecting projects we have two constraints:
- We can select a project only when we have the required capital.
- There is a maximum limit on how many projects we can select.
Since we don’t have any constraint on time, we should choose a project, among the projects for which we have enough capital, which gives us a maximum profit. Following this greedy approach will give us the best solution.
While selecting a project, we will do two things:
- Find all the projects that we can choose with the available capital.
- From the list of projects in the 1st step, choose the project that gives us a maximum profit.
We can follow the Two Heaps approach similar to Find the Median of a Number Stream. Here are the steps of our algorithm:
- Add all project capitals to a min-heap, so that we can select a project with the smallest capital requirement.
- Go through the top projects of the min-heap and filter the projects that can be completed within our available capital. Insert the profits of all these projects into a max-heap, so that we can choose a project with the maximum profit.
- Finally, select the top project of the max-heap for investment.
- Repeat the 2nd and 3rd steps for the required number of projects.
Code
Here is what our algorithm will look like:
1 | from heapq import * |
Time complexity
Since, at the most, all the projects will be pushed to both the heaps once, the time complexity of our algorithm is O(NlogN + KlogN), where ‘N’ is the total number of projects and ‘K’ is the number of projects we are selecting.
Space complexity
The space complexity will be O(N) because we will be storing all the projects in the heaps.
Problem Challenge 1
436. Find Right Interval Design Gurus Educative.io
Next Interval (hard)
Given an array of intervals, find the next interval of each interval. In a list of intervals, for an interval ‘i’ its next interval ‘j’ will have the smallest ‘start’ greater than or equal to the ‘end’ of ‘i’.
Write a function to return an array containing indices of the next interval of each input interval. If there is no next interval of a given interval, return -1. It is given that none of the intervals have the same start point.
Example 1:
Input: Intervals [[2,3], [3,4], [5,6]]
Output: [1, 2, -1]
Explanation: The next interval of [2,3] is [3,4] having index ‘1’. Similarly, the next interval of [3,4] is [5,6] having index ‘2’. There is no next interval for [5,6] hence we have ‘-1’.
Example 2:
Input: Intervals [[3,4], [1,5], [4,6]]
Output: [2, -1, -1]
Explanation: The next interval of [3,4] is [4,6] which has index ‘2’. There is no next interval for [1,5] and [4,6].
Constraints:
- 1 <= intervals.length <= 2 * 10^4
- intervals[i].length == 2
- -10^6 <= starti <= endi <= 10^6
- The start point of each interval is
unique
.
Solution
A brute force solution could be to take one interval at a time and go through all the other intervals to find the next interval. This algorithm will take O(N^2) where ‘N’ is the total number of intervals. Can we do better than that?
We can utilize the Two Heaps approach. We can push all intervals into two heaps: one heap to sort the intervals on maximum start time (let’s call it maxStartHeap
) and the other on maximum end time (let’s call it maxEndHeap
). We can then iterate through all intervals of the `maxEndHeap’ to find their next interval. Our algorithm will have the following steps:
- Take out the top (having highest end) interval from the
maxEndHeap
to find its next interval. Let’s call this intervaltopEnd
. - Find an interval in the
maxStartHeap
with the closest start greater than or equal to the start oftopEnd
. SincemaxStartHeap
is sorted by ‘start’ of intervals, it is easy to find the interval with the highest ‘start’. Let’s call this intervaltopStart
. - Add the index of
topStart
in the result array as the next interval oftopEnd
. If we can’t find the next interval, add ‘-1’ in the result array. - Put the
topStart
back in themaxStartHeap
, as it could be the next interval of other intervals. - Repeat the steps 1-4 until we have no intervals left in
maxEndHeap
.
Code
Here is what our algorithm will look like:
1 | from heapq import * |
Time complexity
The time complexity of our algorithm will be O(NlogN), where ‘N’ is the total number of intervals.
Space complexity
The space complexity will be O(N) because we will be storing all the intervals in the heaps.