Implementation available at https://github.com/yahoo/bftkv
Reliable data storage is one of the fundamental problems.
BFTKV uses b-masking quorum based read
/write
operations to
ensure Byzantine fault-tolerance and GPG's Web of Trust
mechanism to build trust relationships between entities. Trust relationships are used to build quorums.
Moreover, BFTKV provides the following guarantees:
- Value corresponding to the key is up to date and not forged
- Entities trying to deceive users will be revoked immediately
- Entities can join and leave the system dynamically
- Communications between entities are encrypted using public keys
BFTKV leverages integration of three concepts to provide a Byzantine fault tolerant distributed key-value storage:
- Byzantine Quorum Systems
- Web of Trust
- Quorum Certificate
In this document, we will first describe PGP's Web of Trust mechanism and then build the other concepts on top of it.
Web of Trust is a way of building trust between entities without a central authority, unlike Public Key Infrastructure (PKI). Trust is established by signing public keys (implies the signer trusts the owner of the signed public key). A Web of Trust is created by exchanging signed keys between entities.
Trust relationships can be represented with a graph, such as this:
The graph can be transcribed as:
Alice trusts Bob and Erin
Bob trusts Erin and Dave
Carol trusts Bob and Erin
Erin trusts Alice, Carol and Dave
Web of Trust mechanism plays a huge role in BFTKV's quorum selection mechanism.
In a network system, servers might be inaccessible or return wrong/not up to date data. Byzantine failure refers to both of these failures and the naming is based on The Byzantine General's Problem.
BFTKV uses b-masking quorum
s to tolerate Byzantine failures where b
is the number of failure nodes.
b-masking quorum
s are due to Malkhi and Reiter (Paper).
Castro and Liskov (Paper) introduced
a Byzantine fault tolerant replication mechanism that expects f+1
responses
from the servers to verify the data, where f
is the number of faulty nodes. We use the Web of Trust mechanism to specify
the nodes that their responses will be accepted by a quorum member.
The following parts of this document deals with how previously discussed concepts are used in BFTKV.
Quorum selection is based on the trust graph built using the Web of Trust mechanism. BFTKV, usually, chooses the maximal cliques that are
L
hops away from the clients. For example,
Client1 has two cliques: Clique 1 and Clique 2 where L=1
Client2 has two cliques: Clique 2 and Clique 3 where L=1
The Write
procedure saves a value associated with a key in the system. A high-level pseudocode for the procedure:
- Choose a quorum.
- Get times for the key from quorum members.
- Pick a new time that is higher than the maximum time returned by the quorum members.
- Request and gather signatures from quorum members for new value for the key with the new timestamp.
- Choose another quorum that includes the first quorum.
- Write the key, value and signature set to the new quorum members.
The read
procedure reads a value associated with a key in the system. A high level pseudocode for the procedure:
- Choose a quorum.
- Collect values associated with the key
- Revoke signers who signed different values with the same timestamp.
- Return value having signatures more than the number of faulty nodes and has the maximum timestamp.
In this section, we will go over somewhat unclear points in the chapters we discussed read
/write
operations.
- The quorum
Q
chosen here should have the property|Q| >= 3b + 1
whereb
is the number of faulty nodes. This is required for Byzantine fault tolerance sincef
nodes may be inaccessible andf
nodes may be returning a previous value for the key. The remaining honestf + 1
nodes will keep the system in a safe condition. AsQ
, BFTKV uses a maximal clique that a client is connected to in the trust graph. - Number of timestamps should be greater than or equal to
2b + 1
since we will tolerateb
inaccessible nodes. -
- The number of signatures
m
should be greater thanb + (n - b) / 2
. Please see the security analysis for details. - All nodes may be chosen which is BFTKV's current strategy.
- Before writing, each server verifies the signature, checks the number of valid signatures gathered from quorum members and accepts write if the number
is greater than
b + (n - b) / 2
. Moreover, every server makes sure that they haven't signed this key with the same timestamp before. write
operation succeeds if the received acks from server is greater than2f + 1
.
- BFTKV chooses a random quorum
Q
that has the property|Q| > b + (n - b) / 2
. - Collect pairs up to
2f + 1
. The reason is the same withwrite
operation second explanation. - A server should not sign the same key, same timestamp and a different value. This is equivocation attack. It is very important that servers revoke these nodes at this phase for the system to survive.
- To return a value for the key, the value should have at least
b + 1
signatures, which guarantees that the value is valid, and have the maximum timestamp.
Equivocation Attack: An adversary can try to create two different views of a quorum by trying to store different values for a key in half of the quorum and
another value in the other half (Half is the best option from an attacker's perspective if he wants to succeed). With the help of the b
faulty nodes, the basic check for
b + 1
signatures will succeed. However, if the quorum size is chosen carefully, this can be prevented.
Consider the node states below (n
= the number of nodes in the quorum, b
= faulty nodes in the quorum):
Maximum number of signatures an attacker can get is b + (n - b) / 2
. To make sure that the majority has the correct value n - b > b + (n - b) / 2
should hold. Therefore n
should be greater than 3b
.
Let Fp
define the failure probability of an adversary (i.e., he won't be detected); H1
and H2
honest node sets, F
the set of faulty nodes and N = H1 U H2 U F
.
Then the nodes chosen from the quorum Q
should be either from H1 U F
or H2 U F
to prevent detection. This probability is
Fp ~= 1 - ((|F| + |N|) / 2|N|)^|Q|
In a reqular quorum system after if the number of faulty nodes exceed n/3
trust to data drops down to 0. BFTKV can keep the adversary's failure probability
close to 1 for more than f
failing nodes. Below two graphs represent this: