# Formal Verification of a Synchronous Data-flow Compiler: from Signal to C

Van-Chan Ngo

Jean-Pierre Talpin

INRIA Rennes, France

Advisor INRIA Rennes, France

PhD Defense

Construct a translation validation-based verification framework to check the correctness of the synchronous dataflow compiler, Signal.

#### Agenda

Motivation Related Work Approach

Cock Semantics Preservation Data Dependency Preservation Value-Equivalence Preservation

Detected Bugs Conclusion

#### Motivation



Compilers always might have bugs!

## **Development of critical software**



- Safety requirements have to be implemented correctly
- Formal verification is applied at source level (static analysis, model checking, proof)
- The guarantees are obtained at source program might be broken due to the compiler bugs
- Raise awareness about the importance of compiler verification in critical software development

## Related work on compiler verification

- SuperTest: test and validation suite
- DO-178: certification standards
- Astrée: a static analyzer
- Static analysis of Signal programs for efficient code generation (Gamatié et al.)
- Translation validation for optimizing compiler (Berkeley, US)
- CompCert: a certified C compiler (Inria, France)
- Verified LLVM compiler (Harvard, US)

# **Compiler verification**

#### Testing-based approach

- Test and validation suite to verify compilers
- Test suite to qualify the compiler's output

#### Formal method-based approach

- Formal verification of compilers
- Formal verification of compiler's output
- Translation validation to check the correctness of the compilation

#### **Translation validation**



- Takes the source and compiled programs as input
- Checks that the source program semantics is preserved in the compiled program

#### Translation validation: Main components

#### Model builder

- Defines common semantics
- Captures the semantics of the source and compiled programs

#### Analyzer

- Formalizes the notion of "correct implementation"
- Provides an automated proof method
- Generates a proof scripts or a counter-example

#### **Translation validation: Features**

- Avoiding redoing the proof with changes of compiler
- Independence of how the compiler works
- Less to prove (in general, the validator is much more simple than the compiler)
- Verification process is fully automated

# Signal compiler



- Syntax and type checking
- Clock analysis
- Data dependency analysis
- Executable code generation

## Objective

A method to formally verify the Signal compiler such that

- light weight
- scalable: deals with 500K lines of code of the implementation
- modularity
- accuracy: the proof is separated w.r.t the data structure (clock, data dependency, value-equivalence)



- Adopt translation validation approach
- Prove the correctness of each phase w.r.t the data structure carrying the semantics relevant to that phase
- Decompose the preservation of the semantics into the preservation of clock semantics, data dependency, and value-equivalence









# Signal language

- Signal x: sequences x(t), t ∈ N of typed values (⊥ is absence)
- Clock  $C_x$  of x: instants at which  $x(t) \neq \bot$
- Process: set of equations representing relations between signals
- Parallelism: processes run concurrently
- Example:  $y := x + 1, \forall t \in C_y, y(t) = x(t) + 1$
- Other languages: Esterel, Lustre, Scade, ...

#### **Primitive operators**

- Stepwise functions:  $y := f(x_1, ..., x_n)$  $\forall t \in C_y, y(t) = f(x_1(t), ..., x_n(t)), C_y = C_{x_1} = ... = C_{x_n}$
- Delay: y := x\$1 init a

 $y(t_0) = a, \forall t \in C_x \land t > t_0, y(t) = x(t_-), C_y = C_x$ 

- Merge: y := x default z
  - y(t) = x(t) if  $t \in C_x, y(t) = z(t)$  if  $t \in C_z \setminus C_x$ ,  $C_y = C_x \cup C_z$

#### **Primitive operators**

• Sampling: y := x when b

 $\forall t \in C_x \cap C_b \wedge b(t) = true, y(t) = x(t), C_y = C_x \cap [b]$ 

• Composition:  $P_1|P_2$ 

Denotes the parallel composition of two processes

• Restriction: P where x

Specifies that x as a local signal to P

#### Example



- Emits a sequence of values FB, FB 1, ..., 1
- Execution traces

| t  | $t_0$ | <i>t</i> <sub>1</sub> | $t_2$   | $t_3$ | $t_4$  | $t_5$   | $t_6$ | • • • |
|----|-------|-----------------------|---------|-------|--------|---------|-------|-------|
| FΒ | 6     | $\bot$                | $\perp$ | 3     | $\bot$ | $\perp$ | 2     | • • • |
| ΖN | 1     | 6                     | 5       | 4     | 3      | 2       | 1     | • • • |
| Ν  | 6     | 5                     | 4       | 3     | 2      | 1       | 2     | • • • |

#### Preservation of clock semantics





#### Clock model of DEC

 $FB \ = when (ZN <= 1)$ ZN := N init 1 N := FB default (ZN - 1)

 $(FB \Leftrightarrow ZN1 \land ZN1)$  $\wedge (\widehat{ZN1} \Leftrightarrow v_{\leq \pm}^1 \Leftrightarrow \widehat{ZN}) \wedge (\widehat{ZN1} \Rightarrow (\widetilde{ZN1} = v_{\leq \pm}^1))$  $\wedge (\widetilde{ZN} \Leftrightarrow \widehat{N}) \wedge (\widetilde{ZN} \Rightarrow (\widetilde{ZN} = m.N \wedge m.N' = \widetilde{N}))$  $\wedge (m.N_0 = 1)$  $\wedge (\widehat{N} \Leftrightarrow \widehat{FB} \lor \widehat{ZN2}) \land (\widehat{N} \Rightarrow ((\widehat{FB} \land \widetilde{N} = \widetilde{FB})))$  $\vee (\neg FB \land N = ZN2)))$  $\wedge (\widehat{ZN2} \Leftrightarrow \widehat{v_{-}^{1}} \Leftrightarrow \widehat{ZN}) \wedge (\widehat{ZN2} \Rightarrow (\widetilde{ZN2} = \widehat{v_{-}^{1}}))$ 

#### **Clock refinement**

- Clock event: A clock event is an interpretation over X. The set of clock events denoted by  $\mathcal{E}c_X$
- Clock trace: A clock trace  $T_c : \mathbb{N} \longrightarrow \mathcal{E}c_X$  is a chain of clock events. The concrete clock semantic of  $\Phi(P)$  is a set of clock trace denoted by  $\Gamma(\Phi(P))_{\setminus X}$
- Clock refinement:  $\Phi(C) \sqsubseteq_{clk} \Phi(A)$  on X iff  $\forall X.T_c. (X.T_c \in \Gamma(\Phi(C))_{\backslash X} \Rightarrow X.T_c \in \Gamma(\Phi(A))_{\backslash X})$

#### Proof method

- Define a variable mapping  $\widehat{X_A} \setminus \widehat{X_{IO}} = \alpha(\widehat{X_C} \setminus \widehat{X_{IO}})$
- Given  $\alpha$ , prove  $\Phi(C) \sqsubseteq_{clk} \Phi(A)$  on  $X_{IO}$

#### Premise

$$\widehat{X_A} \setminus \widehat{X_{IO}} = \alpha(\widehat{X_C} \setminus \widehat{X_{IO}})$$
  
$$\forall \widehat{I} \text{ over } \widehat{X_A} \cup \widehat{X_C}. (\widehat{I} \models \Phi(C) \Rightarrow \widehat{I} \models \Phi(A))$$

Conclusion

 $\Phi(C) \sqsubseteq_{clk} \Phi(A)$  on  $X_{IO}$ 

#### Implementation with SMT



- Construct  $\Phi(A)$  and  $\Phi(C)$
- Establish  $(\Phi(C) \land \widehat{X_A} \setminus \widehat{X_{IO}} = \alpha(\widehat{X_C} \setminus \widehat{X_{IO}}) \Rightarrow \Phi(A))$
- Check the validity of  $\models (\Phi(C) \land \widehat{X_A} \setminus \widehat{X_{IO}} = \alpha(\widehat{X_C} \setminus \widehat{X_{IO}}) \Rightarrow \Phi(A))$

#### Preservation of data dependency



#### Synchronous data-flow dependency graph (SDDG)

- Data dependency is represented as a labeled directed graph
- Nodes are signals or clocks
- Edges express the dependencies among signals and clocks
- Clock constraints are first-order logic formulas to label the edges
- A dependency is effective iff its clock constraint has the value true

## SDDG of primitive operators

y := x default z



- y depends on x when  $\hat{x} = \text{true}$
- Clock constraint of a dependency path  $dp = (x_0, x_1, ..., x_n)$  is defined by  $\bigwedge^{n-1} \widehat{c_i}$
- A cyclic path is a deadlock if and only if  $M \models \bigwedge \widehat{c_i}$

i=0

n-1

## SDDG of DEC



#### **Dependency refinement**

SDDG(C) is a **dependency refinement** of SDDG(A) if:

- For every path dp1 in SDDG(A), there exits a path dp2 in SDDG(C) such that dp2 is a reinforcement of dp1
- For every path in dp1 SDDG(A), for any path dp2 in SDDG(C), dp2 is deadlock-consistent with dp1

#### Reinforcement

A path  $dp_2$  is a reinforcement of a path  $dp_1$  iff the following formula is valid



At any instant, if  $dp_1$  is effective, then  $dp_2$  is effective too

## **Deadlock-consistency**

A path dp<sub>2</sub> is a deadlock-consistent with a path dp<sub>1</sub> if for any path dp'<sub>1</sub> such that

- **dp**<sub>1</sub> and **dp**'<sub>1</sub> form a cycle path,
- then for every path dp'<sub>2</sub> that forms a cycle path the following formula is valid

$$\models (\bigwedge_{i=0}^{n-1} \widehat{c_i} \land \bigwedge_{j=0}^{p-1} \widehat{c_j^{(inv)}}) \Leftrightarrow \texttt{false} \Rightarrow (\bigwedge_{k=0}^{m-1} \widehat{c_k'} \land \bigwedge_{l=0}^{q-1} \widehat{c_l^{(inv')}}) \Leftrightarrow \texttt{false}$$
$$\bigcirc \texttt{Clock constraint of (dp_1, dp'_1)} \qquad \texttt{Clock constraint of (dp_2, dp'_2)}$$

It indicates that if (dp1, dp'1) doesn't stand for a deadlock, then (dp2, dp'2) doesn't either

## Implementation with SMT



- **Construct** SDDG(A) and SDDG(C)
- Establish the formulas for checking the reinforcement and deadlock-consistency
- Check the validity of the checking formulas

#### Preservation of value-equivalence



### Synchronous data-flow value-graph (SDVG)

- Signal and clock computation is represented as a labeled directed graph
- Nodes are clocks, signals, variables, operators, or gated -node function
- Edges describe the computation relation between the nodes
- The computation of both Signal program and generated C code is represented by a shared graph

## SDVG of Signal

- For each signal x, its computation is represented as  $x = \phi(\hat{x}, \tilde{x}, \bot)$
- The computation of y in  $y:=x \; \texttt{default} \; z$  is graphically represented by

$$y = \phi(\hat{y}, \phi(\hat{x}, \tilde{x}, \tilde{z}), \bot); \hat{y} \Leftrightarrow (\hat{x} \lor \hat{z})$$



## SDVG of generated C

 The computation of the corresponding variable, denoted by x<sup>c</sup>, is implemented as follows

• This computation can be represented by the graph

$$x^c = \phi(C\_x, \widetilde{x^c}, \bot)$$

## SDVG translation validation: Normalizing

#### **Objective**

 Prove that for every output signal x and its corresponding variable x<sup>c</sup>, they have the same value

Principle

- Define a set of rewrite rules
- Apply the rewrite rules to each graph node individually
- When there is no more rules can be applied to resulting graph, maximized the shared nodes
- Terminate when there exists no more sharing or rewrite rules can be applied

# SDVG of DEC



## Normalize SDVG of DEC

#### There might have more than one normalization scenario



### Normalize SDVG of DEC



## Normalize SDVG of DEC: Final graph



#### Detected bugs: Multiple constraints on a clock

```
// P.SIG
| x ^{=} when (y <= 9)
| x ^{=} when (y >= 1)
// P_BASIC_TRA.SIG
| CLK_x := when (y <=
   9)
| CLK := when (y >= 1)
CLK_x ^= CLK
 CLK ^{=} XZX_24
// P_BOOL_TRA.SIG
| when Tick ^= C_z ^=
   C_CLK
when C_z ^= x ^= z
| C_z := y <= 9
 C_{CLK} := y >= 1
```

Cause: The synchronization between CLK and XZX\_24

- In P\_BASIC\_TRA, x might be absent when XZX\_24 is absent, which is not the case in P and P\_BOOL\_TRA
- XZX\_24 is introduced without declaration

#### **Detection:**

 $\Phi(P\_BOOL\_TRA) \not\sqsubseteq_{clk} \Phi(P\_BASIC\_TRA)$ 

## **Detected bugs: XOR operator**

```
// P.SIG
| b3 := (true xor true)
        and b1
// P_BASIC_TRA.SIG
...
| CLK_b1 := ^b1
| CLK_b1 ^= b1 ^= b3
| b3 := b1
...
```

Cause: wrong implementation of XOR operator

 In P\_BASIC\_TRA, true xor true is true

```
Detection:

\Phi(P\_BASIC\_TRA) \not\sqsubseteq_{clk} \Phi(P)
```

### Detected bugs: Merge with constant signal

```
// P.SIG
| y := 1 default x
// C code
if (C_y)
{
    y = 1; else y = x;
    w_P_y(y);
}
```

Cause: wrong implementation of merge operator with constant signal

In the generated C code, a syntax error y = 1; else y = x;

**Detection:** when constructing the SDVG graph



A method to formally verify the Signal compiler

- Adopts the translation validation
- Is light-weight, scalable, modular
- Separates the proof into three smaller and independent sub-proofs: clock semantic, data dependency, and value-equivalence preservations

## Future work

- Fully implementation of the validator: benchmarks and integration into Polychrony toolset
- Extended the proof to use with the other code generation schemes (e.g., modular and distributed code generations)
- Use an SMT solver to reason on the rewrite rules in SDVG transformations

## Publication

- Translation validation for clock transformations in a synchronous compiler - FASE - ETAPS 2015
- Precise deadlock detection for polychronous data-flow specifications ESLsyn DAC 2014
- Formal verification of synchronous data-flow program transformations towards certified compilers - FCS 2013
- Formal verification of compiler transformations on polychronous equations IFM 2012
- Formal verification of automatically generated C-code from polychronous data-flow equations HLDVT 2012

Thank you!