Leetcode 710. Random Pick with Blacklist

Guy Alster
4 min readMay 14, 2021

--

This is another great question from Leetcode. I love it very much because it is a question that requires you to think differently and outside the box. As with the rest of my attempts, I will try to teach the underlying algorithm for this question from “first principles”. What makes this question great, and which in my opinion makes this question so interesting, is that it has a very simple solution that is very costly (in terms of runtime), and it has a non-trivial solution that reduces that complexity dramatically. This problem is tagged “Hard” because it requires us to come up with that non-trivial solution.

Problem Description

given a number: n and a list: blacklist we need to return a uniformly random number in the range: [0,n) such that this number does not appear in the blacklist.

Uniform Distribution

First, let's remind ourselves of what uniform distribution is. Given a range: [a,b] of natural numbers. A uniform distribution is a discrete distribution (working on natural numbers as opposed to real numbers), that is described by the following probability function:

In other words, each number in the range has the exact same probability to be selected. That probably is one over the range. For example, if we have 5 numbers then the probability will be 1/5.

The Simple Solution

As mentioned the simple solution is very straightforward, we simply generate a random uniform number in the range of [0,n-1] and check if that number is on the blacklist. If it is, we simply generate another one. We repeat this process until we get a number that is not on the list. This technique is called rejection sampling and is used quite often in different scenarios when there are numbers to choose from and others to reject. However, rejection sampling comes at a cost. That cost is that we need to call our random number generator multiple times when rejecting numbers. Now imagine the case when the list is so large that most of the numbers need to be rejected? In such a case we would spend most of our time rejecting numbers and trying to generate new random ones. Conceptually, we can spend infinite cycles repeating these steps. But if we want to be more accurate let’s ask ourselves what is the expected (average) number of trials we will have until we reach a number that we don’t need to reject. To do so, let's say that we have n number and our list’s size is m. The probability of choosing a number from the list is m/n. Hence, the probability of not choosing a number from it is 1-m/n.
There is a famous probability distribution that describes precisely what we’re looking for: The number of attempts until (and including) a success in an experiment. That is the Geometric Distribution. The Expected value of distribution gives us a probabilistic average which sometimes is more interesting to look at. In this case, the Geometric distribution’s expected value for our experiment is 1/(1- m/n) which gives us: n/(n-m). As m approaches n we get a divergent limit.

The Advanced Solution

We know that if the list is of size m, then there are n-m elements that we can draw numbers from and won’t be rejected. It would be wonderful if we could simply draw numbers from the range [0,n-m-1] but unfortunately, some of those numbers may be rejected. In addition, the list may contain numbers that are greater than n-m, and we would want to be able to return them as well. This leads to the following idea: What if we map rejected numbers from the range [0,n-m-1] to legit numbers that are greater than n-m? that is for every number X that belongs to the range [0,n-m-1] and which is in the list, we map it to a number Y ≥ n-m and is not in the list. Then, we simply draw random numbers from the range [0,n-m-1] and check if the number we drew is mapped or not. If not, we return it, if it is, we return its mapping. That is it, no rejecting anymore. Every time we are called to get a random number we need to call our random number generator once and only once.

The Code

In this code, the range that we care about is denoted by the variable sampleSize. In the constructor of the class, we basically insert elements into the mapper that maps elements from the range [0,sampleSize] which appear in the list to elements ≥ sampleSize which don’t appear in the list. We do so with the help of another hash set: greater. This hash set stores all elements in the range [sampleSize,n) which are not on the list. We then iterate through the list, looking for elements ≤ sampleSize and map them to elements in the greater hash set. Finally, the pick() method simply draws a number uniformly in the range [0,sampleSize-1] and if that number appears in our mapper, we simply return what it maps to.

--

--