





# Accurate and Extensible Symbolic Execution of Binary Code based on Formal ISA Semantics

Sören Tempel, Tobias Brandt, Christoph Lüth, Christian Dietrich, Rolf Drechsler soeren.tempel@tu-braunschweig.de

## → Motivation

### **Symbolic Execution:** SMT-based software verification and testing technique

- Enumerates execution through the SUT by reasoning about branches
- Requires custom software simulation to operate on symbolic input values
- Goal: Employing symbolic execution for testing low-level binary code

```
(Y \ge 0 \land Y \le 10)
```

DIVU rd, rs1, rs2

 $(X \ge 3 \land X \le 5)$ 

#### **Challenges:**

- 1. Requires a *correct* symbolic implementation of ISA instruction semantics
- → Given the complexity of modern ISAs, a manual implementation is error-prone
- 2. The analysis must be easily extendable to support design space exploration

# Formal Specifications of ISA Semantics

Idea: Describe ISA in a formal, machine-readable language

- Advantageous for various use-cases, e.g. code generation, fault injection, ...
- Newer ISAs (e.g. RISC-V) provide an official formal specification
- All instructions are formally described in terms of several language primitives

## ∃ Formal Specifications of ISA Semantics

Idea: Describe ISA in a formal, machine-readable language

- Advantageous for various use-cases, e.g. code generation, fault injection, ...
- Newer ISAs (e.g. RISC-V) provide an official formal specification
- All instructions are formally described in terms of several language primitives

```
instrSemantics DIVU = do
  (rs1-val, rs2-val, rd) <- decodeAndReadRType

runIfElse (rs2-val `EqInt` 0x000000000)
  do $ WriteRegister rd 0xffffffff
  do $ WriteRegister rd (rs1-val `UDiv` rs2-val)</pre>
```

**Problem:** Formal specifications are not directly executable

- Requires maintaining compiler tooling for each use-case
- Executable formal specification have emerged recently
- → We facilitate our own prior work on LibRISCV

**Approach:** Implement symbolic execution as an ISA specification interpreter

Decode → ReadRegister → RunIfElse → UDiv → …

**Problem:** Formal specifications are not directly executable

- Requires maintaining compiler tooling for each use-case
- Executable formal specification have emerged recently
- → We facilitate our own prior work on LibRISCV

**Approach:** Implement symbolic execution as an ISA specification interpreter



**Problem:** Formal specifications are not directly executable

- Requires maintaining compiler tooling for each use-case
- Executable formal specification have emerged recently
- → We facilitate our own prior work on LibRISCV

**Approach:** Implement symbolic execution as an ISA specification interpreter



**Problem:** Formal specifications are not directly executable

- Requires maintaining compiler tooling for each use-case
- Executable formal specification have emerged recently
- $\hookrightarrow$  We facilitate our own prior work on LibRISCV

**Approach:** Implement symbolic execution as an ISA specification interpreter



## → Contribution: BINSYM

BINSYM: Prototype implementation of the proposed approach for the RISC-V ISA

- Maps Library language primitives to SMT bit-vector theory with Z3
- Provides symbolic implementations of the register file, memory, ...
- ⇒ Supports any custom instr. described in term of existing primitives



Source Code: https://github.com/agra-uni-bremen/binsym

- Provides an implementation of Dynamic Symbolic Execution (DSE)
- Written in roughly 1000 LOC in Haskell (excluding the formal model)!

- Empirical comparison with prior work on symbolic execution of RV32 binaries
- Research questions:
- **RQ1** Do we discover the same amount of paths as prior work?
- **RQ2** Does our work achieve competitive SE performance

- Empirical comparison with prior work on symbolic execution of RV32 binaries
- Research questions:

**RQ1** Do we discover the same amount of paths as prior work?

RQ2 Does our work achieve competitive SE performance?

- Empirical comparison with prior work on symbolic execution of RV32 binaries
- Research questions:

**RQ1** Do we discover the same amount of paths as prior work?

RQ2 Does our work achieve competitive SE performance?

| Benchmark      | angr  | BINSEC | SYMEX-VP | BINSYM |
|----------------|-------|--------|----------|--------|
| base64-encode  | 125   | 6250   | 6250     | 6250   |
| bubble-sort    | 720   | 720    | 720      | 720    |
| clif-parser    | 11424 | 11424  | 11424    | 11424  |
| insertion-sort | 5040  | 5040   | 5040     | 5040   |
| uri-parser     | 8194  | 8240   | 8240     | 8240   |

Table: Empirical comparison with prior work on synthetic benchmarks.

- Empirical comparison with prior work on symbolic execution of RV32 binaries
- Research questions:

**RQ1** Do we discover the same amount of paths as prior work?

RQ2 Does our work achieve competitive SE performance?

| Benchmark      | angr  | BINSEC | SYMEX-VP | BINSYM |
|----------------|-------|--------|----------|--------|
| base64-encode  | 125   | 6250   | 6250     | 6250   |
| bubble-sort    | 720   | 720    | 720      | 720    |
| clif-parser    | 11424 | 11424  | 11424    | 11424  |
| insertion-sort | 5040  | 5040   | 5040     | 5040   |
| uri-parser     | 8194  | 8240   | 8240     | 8240   |

Table: Empirical comparison with prior work on synthetic benchmarks.



## → Conclusion

#### **Key Insights:**

- 1. Executable formal semantics reduce manual effort and the margin for errors
- 2. Formal semantics ease extending the analysis to additional instructions
- 3. Execution speed with executable formal ISA specifications is competitive

**Future Work:** Formally prove the correctness of the symbolic semantics

 $\hookrightarrow$  Using theorem-prover definitions provided by prior work







# Accurate and Extensible Symbolic Execution of Binary Code based on Formal ISA Semantics

Sören Tempel, Tobias Brandt, Christoph Lüth, Christian Dietrich, Rolf Drechsler soeren.tempel@tu-braunschweig.de