Skip to content

Latest commit

 

History

History
115 lines (93 loc) · 3.99 KB

README.md

File metadata and controls

115 lines (93 loc) · 3.99 KB

AgdaBench

A tool for benchmarking compile-time (type-checking time) performance of Agda programs. It is implemented as an Agda backend, which means that it functions as a full Agda compiler, with additional arguments for the benchmarking. The benchmarking is done using the Criterion library.

Example

A criterion benchmark is created for each top-level value in the given Agda module whose name starts with bench, and calling the tool with an Agda file will run each of the benchmarks and measure the time it takes to reduce them to weak-head normal form (or normal form if the --nf flag is given). For instance, given the module

module Example where

open import Agda.Builtin.Nat
open import Agda.Builtin.List
open import Agda.Builtin.Equality

downFrom : Nat  List Nat
downFrom zero    = []
downFrom (suc n) = n ∷ downFrom n

sum-rec : List Nat  Nat
sum-rec []       = 0
sum-rec (x ∷ xs) = x + sum-rec xs

sum-acc : Nat  List Nat  Nat
sum-acc z [] = z
sum-acc z (x ∷ xs) = sum-acc (z + x) xs

sum-acc! : Nat  List Nat  Nat
sum-acc! z [] = z
sum-acc! 0 (x ∷ xs) = sum-acc! x xs
sum-acc! z (x ∷ xs) = sum-acc! (z + x) xs

n = 10000
bench-rec  = sum-rec    (downFrom n)
bench-acc  = sum-acc  0 (downFrom n)
bench-acc! = sum-acc! 0 (downFrom n)

we can invoke agda-bench as

$ agda-bench Example.agda
Benchmarks:
  bench-rec = 49995000
  bench-acc = 49995000
  bench-acc! = 49995000

benchmarking bench-rec
time                 16.71 ms   (16.12 ms .. 17.60 ms)
                     0.986 R²   (0.968 R² .. 0.997 R²)
mean                 17.19 ms   (16.78 ms .. 17.76 ms)
std dev              1.203 ms   (822.0 μs .. 1.709 ms)
variance introduced by outliers: 29% (moderately inflated)

benchmarking bench-acc
time                 31.69 ms   (30.80 ms .. 32.71 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 32.35 ms   (31.67 ms .. 33.92 ms)
std dev              2.064 ms   (1.061 ms .. 3.637 ms)
variance introduced by outliers: 22% (moderately inflated)

benchmarking bench-acc!
time                 10.61 ms   (10.52 ms .. 10.71 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 10.83 ms   (10.73 ms .. 10.93 ms)
std dev              266.5 μs   (203.0 μs .. 381.2 μs)

Individual benchmarks can be run by giving the -B name-of-benchmark flag. The -B flag passes along its argument to the Criterion driver. Use -B --help for more information.

To benchmark an expression that isn't bound to a top-level bench definition you can use the -C EXPR flag:

$ agda-bench Example.agda -C "sum-rec (downFrom 100)"
benchmarking sum-rec (downFrom 100)
time                 107.0 μs   (105.5 μs .. 108.7 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 107.6 μs   (106.7 μs .. 109.0 μs)
std dev              3.532 μs   (2.379 μs .. 4.833 μs)
variance introduced by outliers: 32% (moderately inflated)

Instead to measuring evaluation time you can measure type checking time of an expression using the -T (or --type-check) flag:

$ agda-bench Example.agda -T "refl : sum-rec (downFrom 100) ≡ 4950"
time                 252.0 μs   (248.6 μs .. 255.0 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 249.7 μs   (247.1 μs .. 253.3 μs)
std dev              9.637 μs   (7.431 μs .. 13.56 μs)

The full list of options (not including the Criterion options) are:

benchmark backend options
  -B ARGS         --bench-options=ARGS      Benchmarking options. Use -B --help for more information.
  -s EXPR         --single=EXPR             Evaluate a single expression
  -C EXPR         --custom=EXPR             Add a custom benchmark for EXPR
  -T EXPR : TYPE  --type-check=EXPR : TYPE  Add a custom benchmark to type check EXPR against TYPE
  -n              --nf                      Full normalisation instead of weak-head reduction
                  --call-by-name            Use call-by-name