Experimental db written in rust, based on example of Chapter 7 of the book Rust in Action.
Also, added ideas from Chapter 3 of the book Designing Data-Intensive Applications.
For now, it exposes a rest api accepting json as input.
It accepts objects for put/post. It needs to have an id and at least one more field. An error 400 will be return with information regarding the problem.
For get/delete, it accepts strings, numbers or an object with and id.
After cloning this repo, you can run:
➜ rustdb git:(master) cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.01s Running `target/debug/rustdb_rest` Loading database... Database ready at 7887
As explicited here, the server will start listening to por 7887. It also creates a directory called storage
inside application's folder. You can test the database inserting some data:
➜ rustdb git:(master) curl --request POST \ --url http://localhost:7887/ \ --header 'content-type: application/json' \ --data '{ "id": "1237", "name": "Lucas", "email": "[email protected]" }'
And you can request data with a GET request:
➜ rustdb git:(master) curl --request GET \ --url http://localhost:7887/ \ --header 'content-type: application/json' \ --data 1237 {"email":"[email protected]","id":"1237","name":"Lucas"}%
When you start the server, it creates a separate thread to compress log. It will garantee that database files will occupy the lowest possible number of log files that represents all data. This process runs each 5 seconds and will create and delete log files from storage folder.
RustDB is a simple key/value storage with single collection and persisted data. The keys are kept in memory in a hash map. The value is stored in log files splited into data segments. Each time you request a key/value, it gets the file position from the hash map and load the value to return it.
The log file contains, for each register:
- Checksum
- key lenght
- value length
- key data
- value data
By this way, we can garantee that the database will not delivery corrputed data. The data segments are filled in a append only way, allwing very fast inserts. When you update an registry, it creates a new entry in the end of the log file and the hash map value index is updated in memory.
Due to the nature of writes, log files grows fast with lots of old versions of each key. We break each file in 3MB chuncks in a struct called DataSegment. Besides of the record strucuture, each data segment log file contains its name in the first 8 bytes and a reference to the next segment in the following 8 bytes.
The storage directory contains a file called initial_segment
that contains 8 bytes poiting to the first data segment. The name is a u64 value and is parsed into a {:016x}
hex value to express the file names.
To deal with the always growing log files, we have a struct called LogCompressor
that takes a list of segments and recreates a db without duplications. By this way, we can remove the old segments and change reference on initial_segment
file. In thre rest_api implementation, we run this compression funciton each 5 seconds.
RustDB has just few acceptance tests covering DataSegments, LogCompression and basic database opreations. All tests are executed using I/O, creating and deleting storage folders.
If you want to try some volume, you can use jmeter tests configured inside jmeter
folder. It uses a csv inside load_test
. Currently exists 2 tests. One is for writing and reading and a second one only reading data from the database.
➜ jmeter git:(master) jmeter -n -t test_read_write.jmx -l result01.jtl -e -o ./result01
➜ jmeter git:(master) jmeter -n -t test_read.jmx -l result02.jtl -e -o ./result02