Skip to content

Merkle Root

roconnor-blockstream edited this page Apr 11, 2023 · 6 revisions

WORK IN PROGRESS

When we regard a Simplicity expression as a tree (or, more concisely as, a DAG), we can view it as a Merkle Tree and recursively hash subexprssions, using the Merkle Root of those sub expressions to compute the Merkle Root of the whole expression. Computing a Merkle Root offers two advantages over simply serializing a Simplicity expression in a linear format an simply hashing that.

  1. It allows for pruning of case expressions. Thanks to the Merkle structure we need only those branches that are involved in execution and leave evaluated branches hidden. This both increases privacy, and usually reduces the amount of on-chain data that needs to be revealed in a Blockchain context.
  2. It naturally identifies repeated subexpressions. Merkle roots allow one to transparently identify repeated subexpressions in an expression because, by definition, they have the same root.

Commitment Merkle Root

The Commitment Merkle Root (CMR) is one mode of hashing a Simplicity expression for inclusion inside a tapleaf and is “committed to” by an address. This hashing mode specifically excludes some parts of Simplicity expressions:

  • The content of witness expressions are excluded.
  • The right subexpresssion of disconnect expressions are excluded.
  • All typing annotations are excluded. (However jet expressions which always include the type of the jet’s inputs and outputs as part of the jet’s specification).

Witness content is excluded since this data is intended to be malleable and determined at redemption time. For example, signatures would be part of witness content.

Similarly, the right subexpression of disconnect is excluded because this subexpression is determined at redemption time in order to support delegation.

Typing information is excluded because the inferred principal types of subexpressions changes after pruning unused branches from case expressions. These changes can only generalize the inferred types and cannot affect the results of evaluation.

CMR specification

The CMR for basic expressions and combinators is defined as a BIP-0340 tagged hash midstate.

The tag is the string ”Simplicity- <version> \x1F Commitment \x1F <combinator-name>”, where \x1F is the ASCII character for “unit separator” and the current Simplicity version is ”Draft”. For example, we would use the string ”Simplicity-Draft\x1FCommitment\x1Fcase” as the tag for the case combinator.

For basic expressions without children we the CMR is the SHA-256 midstate that just includes the BIP-0340 tag:

SHA256_midstate(SHA256(tag)·SHA256(tag))

For combinators with one child, the CMR is the SHA-256 midstate of the BIP-0340 tag plus a block consisting of 32 0x00 bytes followed by the child’s CMR:

SHA256_midstate(SHA256(tag)·SHA256(tag)·32×'0x00'·CMR(child))

⚠ Note that for the purposes of the CMR, the ‘disconnect’ combinator is considered to have one child.

For combinators with two childeren, the CMR is the SHA-256 midstate of the BIP-0340 tag plus a block consisting of the first child’s CMR followed by the second child’s CMR.

SHA256_midstate(SHA256(tag)·SHA256(tag)·CMR(child1)·CMR(child2))

We use midstates to avoid the extra work that would otherwise be needed if we used full SHA-256 with padding. Because of our use of tags, there is no risk of length extension attacks in our application, and thus padding is not required.

CMR examples