In this post, I would like to start a topic about functional algorithm design. I will use OCaml language to implement the designs. Each post is about a “functional pearl” from the book “Pearls of Functional Algorithm Design” by Richard Bird.

## Introduction

Given a set `X`

of `n`

natural numbers, we want to find the smallest natural number such that it is not in `X`

. In practice, this problem can
be considered as the following common programming task. There is a number of object ids and `X`

is the set of objects currently in use. The task
is to find some object not in use, say the one with the smallest id.

Our objective is to design an algorithm whose asymptotic complexity is **linear**, where `n`

is the number of elements in `X`

.

## Non-linear Solution

We assume that `X`

is a list without duplicates in no particular order. Otherwise, let’s say `X`

is in increasing order, then the **linear** solution is trivial: **simply
look for the first gap and check the first element is different from 0**. For example,

`X`

= [1; 2; 3; 6; 9], then we first check `1`

is different from `0`

, thus `0`

is the smallest number we are looking for.
Consider an other example, `X`

= [0; 1; 2; 3; 6; 9], then we check the gap between two continuous elements from the beginning of the list. In this case,
`6 - 3 > 1`

, thus `3+1`

is the number we are looking for.

The implementation is given as follows.

```
(* l is in increasing order *)
let trivial_smallest l =
let rec aux l res =
match l with
| [] -> res
| x::[] -> aux [] (x + 1)
| x1::x2::xs -> if x2 - x1 > 1 then (x1 + 1) else aux (x2::xs) res
in
if (List.length l = 0 ) || (List.hd l <> 0) then 0
else aux l 0;;
```

Consider the following quadratic algorithm where `filter : int list -> int list -> int list`

takes the list `[0;1;...;n]`

and input list `X`

and
returns the list of elements in the first input list in which any occurrence of `x`

in `X`

is removed. Then, the number we are looking for is the head of
the returned list. The implementation is given below.

```
(* use the List's mem and filter functions *)
let quadratic_smallest l =
let n = List.length l in
let rec loop i l =
if i = 0 then (i::l)
else
loop (pred i) (i::l)
in
let pre_l = loop n [] in
let filter l1 l2 =
List.filter (fun x -> not (List.mem x l2)) l1
in
List.hd (filter pre_l l)
```

```
(* test function *)
let remove_duplicate l =
let rec aux l res =
match l with
| [] -> List.rev res
| x::xs -> let filtered = List.filter (fun y -> not (x = y)) xs in
aux filtered (x::res)
in
aux l [];;
let create_input n =
let rec loop i l =
if i = 0 then (i::l)
else
loop (pred i) (i::l)
in
loop n [];;
let test f n =
let l = create_input n in
let t = Sys.time() in
let s = f l in
Printf.printf "Execution time: %fs\n" (Sys.time() -. t);
s
```

The funtion `quadratic_smallest`

works, however whose asymptotic complexity is quadratic because `List.mem`

and `List.filter`

are where `n`

is the length of the input list `X`

.

One can think about the other solution is that we first sort the input list `X`

, then employ the above **linear** trivial algorithm. However, the complexity
of the sort algorithm is at least . Hence, this solution is . In the following, I will discuss two **linear** algorithms based on array data structure and divide and conquer.

The function `test`

is used to estimate the execution time of the `quadratic_smallest`

function. The following results show the runtime with different lengths of the input list `X`

. For example, with `n = 10000`

, the runtime is almost `1`

second. If we increase the length `10`

times, then the runtime is `101`

seconds. Roundly speaking, the runtime is squared, .

```
test quadratic_smallest 10000;;
Execution time: 0.953102s
- : int = 10001
test quadratic_smallest 100000;;
Execution time: 100.936317s
- : int = 100001
```

## An Array-based Solution

The key idea of the *array-based* and *divide and conquer* solutions is that not every natural number in the range `[0...len]`

where `len`

is the length of the input list `X`

can be in `X`

. Thus, the smallest number not in `X`

is the number we are looking for. The array-based solution uses an array with `n + 1`

Boolean value (0: not present and 1: present) elements as a checklist of those numbers present in `X`

. Then the index of the first `0`

element in the checklist array is the finding smallest number. The array elements are initialized to `0`

.

The algorithm can be implemented with `Array`

data structure in OCaml as follows. In which, we assume that the access function of array is **constant**.

```
(* returns a checklist array *)
let array_filter a l len =
if (Array.length a <> (len + 1)) then
failwith "Input is not consistent";
let rec aux a l len =
match l with
| [] -> a
| x::xs ->
if x < 0 || x > len then
aux a xs len
else
begin
a.(x) <- 1;
aux a xs len
end
in
aux a l len
in
(* returns the first `0` index *)
let get_smallest a =
let n = Array.length a in
let rec loop i =
if i = n then -1
else if a.(i) = 0 then i
else loop (succ i)
in
loop 0
in
let linear_smallest l =
let n = List.length l in
let a = Array.make (n + 1) 0 in
get_smallest (array_filter a l n)
```

Because the array access function is , thus the asymptotic complexity of the `array_filter`

function is where `n`

is the length of the input list. The function `get_smallest`

is where `n`

is the length of the array. Hence, the complexity of the `linear_smallest`

function is , in other words, it is linear on the length of the input list.

We run the function `test`

with `linear_smallest`

as parameter to see its execution time. The results show that if we increase the length of the input list `10`

times (from `10000`

to `100000`

), the the runtime is `0.01`

compared to `0.001`

. Thus it is almost `10`

time longer that reflects the linear complexity.
We also see that the algorithm scales much better than the quadratic version. For instance, for the list of `100000000`

elements, it takes only `17.5`

seconds where the quadratic version seems to run forever (in fact, it is more than one day)! The experiments were run on a machine with an `Intel Core i5 2.4 GHz`

processor and `8GB`

of RAM under Mac OS X `10.12.5`

.

```
test linear_smallest 10000;;
Execution time: 0.001119s
- : int = 10001
test linear_smallest 100000;;
Execution time: 0.010943s
- : int = 100001
test linear_smallest 100000000;;
Execution time: 17.463857s
- : int = 100000001
```

## Divide and Conquer

As I’ve mentioned before, the key idea of *divide and conquer* solution is that not every natural number in the range `[0...len]`

where `len`

is the length of the input list `X`

can be in `X`

. We first observe the following properties of `filter`

(denoted `\\`

) and `append`

(denoted `@`

) functions. Recall that `filter u v`

returns the first list `u`

in which the occurrences of elements in the second list `v`

are filtered out. `append u v`

concatenates two lists.

`(u @ v) \\ t = (u \\ t) @ (v \\ t)`

`u \\ (v @ t) = (u \\ v) \\ t`

`(u \\ v) \\ t = (u \\ t) \\ v`

If we have `u1`

, `u2`

, `v1`

, and `v2`

such that `u1 \\ v2 = u1`

and `u2 \\ v1 = u2`

. In other words, `u1`

and `v2`

(respectively `u2`

and `v1`

) are disjoint. Then, following the properties above, we get

`(u1 @ u2) \\ (v1 @ v2) = (u1 \\ v1) @ (u2 \\ v2)`

Given a natural number `a`

and an input list `X`

whose elements are not duplicated and bigger than `a`

. For any natural number `b`

that is strictly bigger than `a`

, we have `[a; ... ; a + len] \\ X = [a; ... ; b - 1] \\ X1 @ [b; ... ; a + len] \\ X2`

where `len`

is the length of `X`

, `X1`

is the sublist of `X`

containing all elements that are smaller than `b`

, and `X2`

is the sublist of `X`

containing all elements that are bigger than `b`

, that is, `X`

is divided into two partitions. Hence, the smallest number from `a`

is `List.hd [a; ... ; a + len] \\ X = List.hd ([a; ... ; b - 1] \\ X1 @ [b; ...; a + len] \\ X2)`

. If `[a; ... ; b - 1] \\ X1`

is empty then the smallest number is `List.hd [b; ... ; a + len] \\ X2)`

. Otherwise, it is `List.hd [a; ... ; b - 1] \\ X1`

. In case that `X`

is empty, the smallest number is `a`

.

The next questions are: Can we implement the checking that `[a; ... ; b - 1] \\ X1`

is empty with a **linear** runtime complexity algorithm (the direct computation, e.g., using `List.mem`

, takes **quadratic** runtime in the length of `X1`

? And how to choose `b`

such that the **recurrence relation** of _divice and conquer__ gives us a **linear solution**?

Because `X1`

is a list of distinct natural numbers less than `b`

and `[a; ... ; b - 1]`

contains `b - a`

numbers from `a`

to `b - 1`

. Hence, to check that `[a; ... ; b - 1] \\ X1`

is empty, we only need to compute the length of `X1`

and show that it equals to `b - a`

. This algorithm takes **linear** runtime in the length of `X1`

in the worst-case. For the second question, if we choose `b = a + 1 + len div 2`

where `len`

is the length of `X`

, then we have length of the first partition is smaller or equal to `len div 2`

, the length of the second partition is `len - len div 2 - 1`

and smaller than `len div 2`

. Hence, we get the following recurrence relation:

`T(len) = T(len div 2) + O(len)`

By the **Master Theorem**, the solution is `T(len) = O(len)`

. The implementation using `List.partition`

function is given as follows.

```
let rec smallest_from a l =
match l with
| [] -> a
| list ->
let b = a + 1 + List.length l / 2 in
let x1, x2 = List.partition (fun y -> (y < b)) list in
if (List.length x1 = b - a) then
smallest_from b x2
else
smallest_from a x1
in
let divide_smallest l =
smallest_from 0 l
```

We can avoid to compute the length of the input list in the function `smallest_from`

by giving it as an argument like

```
let rec smallest_from a (len, l) =
if len = 0 then a
else
begin
let b = a + 1 + len / 2 in
let x1, x2 = List.partition (fun y -> (y < b)) l in
let len1 = List.length x1 in
if len1 = (b - a) then
smallest_from b ((len - len1), x2)
else
smallest_from a (len1, x1)
end
in
let divide_smallest l =
smallest_from 0 (List.length l, l)
```

As before we run the function `test`

with `divide_smallest`

as parameter to see its execution time. The results show that if we increase the length of the input list `10`

times (from `1000000`

to `10000000`

), the the runtime is `4.570747`

compared to `0.461642`

. Thus it is almost `10`

time longer that reflects the linear complexity.
We also see that this algorithm scales much better than the quadratic version. For instance, for the list of `100000000`

elements, it takes only `94.211405`

seconds. It turns out that it is a bit less efficient than the **array-based** solution.

```
test divide_smallest 1000000;;
Execution time: 0.461642s
- : int = 10001
test divide_smallest 10000000;;
Execution time: 4.570747s
- : int = 100001
test divide_smallest 100000000;;
Execution time: 47.099375s
- : int = 100000001
```