The most prominent randomized algorithm that almost everyone who has taken an algorithms course knows to their heart is the very famous Randomized QuickSort sorting algorithm. Since there are countless resources for this algorithm, I provide this blog to be a foundation for an upcoming post on the detailed analysis of the algorithm.
data:image/s3,"s3://crabby-images/edcd9/edcd98c9b6e728246b4ec2796b01433b935ef2cf" alt=""
Procedure
In this procedure, we explore a very simplified version of QuickSort different from the more computationally optimized procedures since we simply wish to explore the theory of QuickSort not the practical effectiveness.
Let’s consider some array of
elements. Now choose the last element
as the pivot denoted by
, i.e.
. Now collect all the elements in
(excluding
) that are less than or equal to
, and let this collection be defined by the array
. Similarly, collect all the elements in
(excluding
) that are strictly greater than
, and let this be the array
. The reason we call
a pivot is because we are essentially pivoting around it to divide the array into
,
, and
.
Now we recurse on both and
, so that we get
and
, the sorted versions of
and
. Notice that by how we constructed
and
, we know that all elements in
are less than or equal to
and all elements in
are strictly greater than
. So the sorted version of
is given by
which is clearly sorted by our above logic. Thus this both explains the QuickSort algorithm very briefly and proves its correctness.
Now the difficulty with this algorithm is that we want and
to be balanced in their sizes, and this doesn’t happen if
is close to the largest element in
. Thus we add in some randomness: choose
randomly from all the elements in
and continue our algorithm with the same procedure. This new algorithm is called Randomized QuickSort whose analysis we touch on below.
Analysis
Firstly, for the deterministic QuickSort algorithm, let the run-time be represented by the function where
is the number of elements in our given array
. To find
and to combine the
,
, and
at the end both take a constant amount of time. However, to generate
and
it is known to take linear time, i.e.
time, which follows from the fact that we need to compare every element in
to
(and
has
elements). Thus we see that we can form the following recurrence immediately:
where and
signify the number of elements in
and
, respectively. Also note that in the trivial case of a singleton array, i.e.
, we have
since the array is vacuously sorted.
The reason that a more balanced partition of and
is beneficial is that it produces a recurrence close to the form
which by the Master Theorem (more information can be found here) implies that which is optimal for a sorting algorithm (why this is asymptotically optimal will be discussed in another post). On the other hand, with a heavily imbalanced partition we produce a recurrence close to the form
which by the Master Theorem again implies that which is definitely not optimal for sorting algorithms. Thus this can provide some little intuition as to why balanced partitions are desired in QuickSort.
Now let’s see how balanced partitions are formed. Notice it is equivalent to say that the and
are balanced in size and to say that the pivot
is very close to being the median of the array. Thus by choosing a pivot
that has almost an equal number of elements less than or equal to
and greater than
, we can generate a balanced partition as desired.
This observation implies that when our pivot is very large or small relative to our array
, we have a heavily imbalanced partition. Thus we see that since we choose
as the last element of the array, if we construct our array such that the last element as a maximal or minimal element, the first partition is very imbalanced. For maximum fruition of the worst case, we want this to have at every step of recursion, so this follows that either a sorted or reverse sorted array is the worst case input for deterministic QuickSort (the reason we say “either” is because depending on the implementation of generating the partition, one is a best case and another the worst case). In this case, our recurrence is actually given by
which we know follows that from above, and this is very bad for any sorting algorithm. This concludes that QuickSort has a worst-case time of
which is surprising considering the popularity of this algorithm.
However, note that the algorithm we analyzed above is the deterministic version without any randomness, so let’s now see intuitively how randomness could help. Recall that the randomness is added only in the choice of the pivot which we do randomly rather than always choosing the last element of the array. This now makes it impossible for any specific input to always perform badly, and we also expect our random pivot
to not be maximal or minimal in any sense since it’s random (this is in fact the crux of the randomized analysis of Randomized QuickSort). This provides the intuition as to why randomness might help, and we dive into this in a follow-up post focusing only on the analysis of Randomized QuickSort.
To simply state the result of this randomness, we see that, in fact, our average-case runtime is given by which is optimal for sorting algorithms. However, with the same reasoning as for the deterministic algorithm, the worst-case runtime still stands at
, but this is known to occur with very low probability. Also with very high probability Randomized Quicksort runs in time
which makes this algorithm so viable for a sorting algorithm.
The intuitive derivation of these statements has been briefly developed here, but to garner the true scope of the analysis of Randomized QuickSort, I will be writing a follow-up post going into the full mathematics and probabilistic analyses required to achieve the above conclusions.
One thought on “Randomized QuickSort”