Probabilistic Functional Programs

Van C. Ngo · February 23, 2017

In this post, I would like to start a topic about probabilistic programming in OCaml.

Probabilistic Programming

Probabilistic programing languages are standard programming languages like C, Java, or ML, with two additional constructs:

  • The sampling construct for drawing a value at random from probability distributions

  • The probabilistic branching construct for controlling the flow in a program through observations.

The applications of probabilistic programming is wide, for instance, probabilistic programs are used in machine learning and robotics to describe distribution functions that are analyzed using Bayesian inference. In security, probabilistic programming has played a central role in cryptography, i.e., probabilistic security guarantees and probabilistic encryption. Probabilistic programs are also used to model the performance and reliability properties of a variety of systems, in which the uncertainties can come from the reliability of components (hardwares or softwares), the data from sensors, the reliability of the communication channels in the systems (e.g., the aircraft’s control systems).

This raises the question of how to analyze and verify that a system exhibiting probabilistic behavior satisfies a certain property. There is a rapidly growing trend in research on probabilistic programs which focuses on many aspects such as static analysis, program compilation, and program verification. To mention a few, the extension the framework of abstract interpretation to probabilistic programs, the development of a calculus for obtaining upper bounds on the expected value of run-time for probabilistic programs, and the developments of probabilistic and statistic model checking techniques for verifying probabilistic temporal properties of probabilistic and timed systems.


The following implements a simple example of probabilistic program for finding a given value in a list of integer values. The fist function find represents the standard program, e.g., without probability. It is easy to see that in the worst-case, the variable acc (initialized to 0) will have the value as the length of the input list l.

let rec find a l acc =
  match l with
  | [] -> (false, acc)
  | x::xs -> 
    if a = x then 
      (true,(acc + 1)) 
      find a xs (acc + 1);; 
let rec pfind a l acc =
  match l with
  | [] -> (false, acc)
  | x::xs as l -> let c = 2 in
    if c = 0 then 
      pfind a l (acc + 1) 
      (if a = x then 
         (true,(acc + 1)) 
         pfind a xs (acc + 1));; 
let test n f a l =
  let res = ref [] in
  for i = 1 to n do
    let (found,cost) = f a l 0 in
	res := cost::(!res)
  let sum = List.fold_left (fun s x -> s + x) 0 !res in 
  (float_of_int sum) /. (float_of_int n);;

In the second function pfind, the integer value c is sampled from a uniform distribution from 0 (inclusive) to 2 (exclusive). In other words, c is 0 with probability 0.5 and is 1 with probability 1 - 0.5 = 0.5. A a result, with probability 0.5, the recursive call on the tail of the list is skipped and with the same probability the recursion is called on the tail of the list.

We can see that the recursion can be infinite, e.g., the number of times such that c is 0 is infinite. However, we can calculate that the probability that the number of times such that c is 0 is infinite, is \(\frac{1}{2^n}\) with \(n \rightarrow +\infty\) which is 0. It is said that pfind is almost surely terminated.

The question is what is the behavior of pfind. For example,

what is the expected value of the variable acc?

For this simple program, we can compute manually that this expected value is 2.|l|, where |l| is the number of elements in the list in the worst-case. My on-going work is to automatically and statically infer a symbolic upper bound on this expected value. The following are the average values of acc when we run find and pfind 10000000 times.

# test 10000000 find 10 [0;1;2;3;4;5;6];;
- : float = 7.
# test 10000000 pfind 10 [0;1;2;3;4;5;6];;
- : float = 13.998363