-
Notifications
You must be signed in to change notification settings - Fork 2
Log
utilize the msg in IORequest:Send
, the log entry looks like below:
function RaftLogEntryValue:new(query_type, collection, query)
local o = {
query_type = query_type,
collection = collection:ToData(),
query = query,
cb_index = index,
serverId = serverId,
};
setmetatable(o, self);
return o;
end
and commint in state machine looks like below:
--[[
* Commit the log data at the {@code logIndex}
* @param logIndex the log index in the logStore
* @param data
]]--
function RaftTableDB:commit(logIndex, data)
-- data is logEntry.value
local raftLogEntryValue = RaftLogEntryValue:fromBytes(data);
local cbFunc = function(err, data)
local msg = {
err = err,
data = data,
cb_index = raftLogEntryValue.cb_index,
}
-- send Response
RTDBRequestRPC(nil, raftLogEntryValue.serverId, msg)
end;
collection[raftLogEntryValue.query_type](collection, raftLogEntryValue.query.query,
raftLogEntryValue.query.update or raftLogEntryValue.query.replacement,
cbFunc);
self.commitIndex = logIndex;
end
the client interface keeps unchanged, we provide a script/TableDB/RaftSqliteStore.lua
, This also need to add StorageProvider:SetStorageClass(raftSqliteStore)
method to StorageProvider
. RaftSqliteStore will send the Log Entry above to the raft cluster in each interface and could also consider consistency levels.
Like the original interfaces, callbacks is also implemented in a async way. But we make the connect to be sync to alleviate the effect.
Like rqlite, we use sqlite's Online Backup API to make snapshot.
Raft core logic is in RaftServer
. TableDB is implemented as Raft Statemachine, see RaftTableDBStateMachine
.
below is Figure1 in the raft thesis. It tells the process of a Raft replicated statemachine architechture.
TableDBApp:App -> create RpcListener && TableDB StateMachine -> set RaftParameters -> create RaftContext -> RaftConsensus.run
RaftConsensus.run -> create RaftServer -> start TableDB StateMachine -> RpcListener:startListening
RaftServer: has several PeerServers, and ping heartbeat at small random interval when it is the leader
RpcListener: at Raft level, route message to RaftServer
TableDB StateMachine: TableDB level, call messageSender to send message to RaftServer
TableDBApp:App -> create RaftSqliteStore && TableDB StateMachine -> create RaftClient -> RaftSqliteStore:setRaftClient -> send commands to the cluster
commands: appendEntries, addServer, removeServer.
send commands to the cluster:will retry
RaftSqliteStore: use RaftClient send various commands to Cluster and handle response.
TableDB StateMachine: send response to the client when the command is committed.
below is the structure of a server directory
.
├── assets.log
├── cluster.json
├── config.properties
├── log.txt
├── server.state
├── snapshot
│ ├── 12-1_User_s.db
│ ├── 12.cnf
│ ├── 18-1_User_s.db
│ ├── 18.cnf
│ ├── 24-1_User_s.db
│ ├── 24.cnf
│ ├── 6-1_User_s.db
│ └── 6.cnf
├── store.data
├── store.data.bak
├── store.idx
├── store.idx.bak
├── store.sti
├── store.sti.bak
└── temp
└── test_raft_database
├── User.db
├── User.db-shm
└── User.db-wal
-
assets.log
andlog.txt
are NPL's log files. -
cluster.json
is the cluster servers's information, see below. -
config.properties
is this server's id. -
server.state
stores this server's Raft state, which isterm
,commitIndex
,votedFor
. -
snapshot
is the directory contains the snapshot of the statemachine.- each snapshot contains 2 files.
-
$logIndex-$term_$collection_s.db
is the snapshot data of the statemachine. -
$logIndex.cnf
is the cluster configure at thelogIndex
.
-
store.data
stores this server's Raft log entries data, andstore.data.bak
is its backup. -
store.idx
stores the index data ofstore.data
, index data is the file position of log entries instore.data
.store.idx.bak
is its backup. -
store.sti
stores this server's log start index. andstore.sti.bak
is its backup. -
temp/test_raft_database
is the rootFolder of TableDatabase.
Note: we may move some state above into sqlite like actordb.
To use TableDB Raft, we need to add a tabledb.config.xml
file in the Tabledb rootFolder
.
Note
npl_package/main
should not have therootFolder
, otherwise it will affect the application'srootFolder
and its config. AND also care about theDevelopment directory
andWorkingDir
.
<tabledb>
<providers>
<provider type="TableDB.RaftSqliteStore" name="raft" file="(g1)npl_mod/TableDB/RaftSqliteStore.lua">./,localhost,9004,4,rtdb
</provider>
</providers>
<tables>
<table provider="raft" name="User1"/>
<table provider="raft" name="User2"/>
<table provider="raft" name="default" />
</tables>
</tabledb>
Above is an example of the Raft
provider config.
- one
provider
has 4 configs,-
type
is used incommonlib.gettable(provider.attr.type)
. -
file
is used inNPL.load(provider.attr.file)
. -
name
is the name of the provider, which is used in thetable
config. -
values
is the init args used in providertype
's init method.
-
-
table
have 2 config,-
provider
is the name of the provider, correspond to thename
in provider, -
name
is the table name. If the name isdefault
, the corresponding provider will be set to be default provider of the database. Ifdefault
is not set, default provider will besqlite
.
-
There are 4 values
(args) in the above for raft init and separated by ,
.
-
./
is the working directory for the node. The working directory should containclustor.json
file below. -
localhost
is the hostname of the node. -
9004
is port number for the node. -
rtdb
is the thread name running theRaftServer
.
At each server's start up stage, they should know all their peers, this is a json config file like below (setup/init-cluster.json
or the cluster.json
in the node's working directory):
{
"logIndex": 0,
"lastLogIndex": 0,
"servers":[
{
"id": 1,
"endpoint": "tcp://localhost:9001"
},
{
"id": 2,
"endpoint": "tcp://localhost:9002"
},
{
"id": 3,
"endpoint": "tcp://localhost:9003"
}
]
}
there are several scripts (setup.bat
, addsrv.bat
, stopNPL.bat
) to facillitate the deployment. see setup folder
The Communication use npl_mod/Raft/Rpc.lua
, include RaftRequestRPC
and RTDBRequestRPC
. RaftRequestRPC
is used in Raft level and RTDBRequestRPC
in TableDB level. They are both used in a Full Duplex
way, that is, they are not only used to send request but also recv response.
- For
RaftRequestRPC
, seeRpcListener:startListening
andPeerServer:SendRequest
. - For
RTDBRequestRPC
, seeRaftTableDBStateMachine:start
,RaftClient:tryCurrentLeader
andRaftTableDBStateMachine:start2
.
- Snapshot is made for every collection in TableDB, that is, one collection one snapshot. And each collection can have multiple snapshots at different log index. The snapshot is not deleted at the moment(should we?).
- Snapshot is created in a different thread, see
RaftTableDBStateMachine:createSnapshot
. If one Snapshot is failed to create, we try to copy the prev snapshot to current log index.