talna

Ilocano: Peace, Serenity

Icelandic, Old Norse: Numbers

A simple, embeddable time series database.

About

It uses https://github.com/fjall-rs/fjall as its underlying storage engine, allowing around ~1M data points per second to be ingested.

With the LSM-based storage engine, there's no degradation in write ingestion speed (even for datasets much larger than RAM), low write amplification (good for SSDs) and on-disk data is compressed (again, good for SSDs).

Data model

The tagging and querying mechanism is modelled after Datadog's metrics service (https://www.datadoghq.com/blog/engineering/timeseries-indexing-at-scale/).

A (time) series is a list of data points.

Each data point has

a nanosecond timestamp, which is also its primary key (big-endian stored negated, because we want to scan from newest to oldest, and forwards scans are faster)
the actual value (float)
a tagset (list of key-value pairs, e.g. service=db; env=prod)
a metric name (e.g. cpu.total)

A Database is contained in a single Fjall Keyspace and consists of a couple of partitions (prefixed by _talna#). This way it can be integrated in an existing application using Fjall.

Every permutation of { metric, tagsets } is assigned a SeriesKey. This maps to a Series ID.

Each series’ tagset is stored in the Tagsets partition, used for aggregation.

Lastly, each metric and tag is indexed in an inverted index (TagIndex). Queries perform lookups to that index to get a list of series IDs that match a query. This way any query AST can be modelled by simply union-ing or intersecting postings lists of that inverted index.

Data points are f32 by default, but can be switched to f64 using the high_precision feature flag.

Benchmark: 1 billion data points

Hyper mode, jemalloc, i9 11900k, Samsung PM9A3:

ingested 1 billion in 769s
write speed: 1300390 writes per second
peak mem: 158 MiB
disk space: 10 GiB
query [1M latest data points] in 197ms
reopened DB in 140ms

Run with:

cd billion
cargo run -r

Basic usage

use talna::{Database, Duration, MetricName, tagset, timestamp};

let db = Database::builder().open(path)?;
// or: Database::from_keyspace(existing_keyspace)

let metric_name = MetricName::try_from("cpu.total").unwrap();

db.write(
    metric_name,
    25.42, // actual value (float)
    tagset!(
        "env" => "prod",
        "service" => "db",
        "host" => "h-1",
    ),
)?;

db.write(
    metric_name,
    42.42, // actual value (float)
    tagset!(
        "env" => "prod",
        "service" => "db",
        "host" => "h-2",
    ),
)?;

let buckets = db
  .avg(metric_name, /* group by tag */ "host")
  .filter("env:prod AND service:db")
  // use .start() and .end() to set the time bounds
  .start(timestamp() - Duration::months(1.0))
  // use .granularity() to set the granularity (bucket width in nanoseconds)
  .granularity(Duration::days(1.0))
  .build()?
  .collect()?;

println!("{buckets:#?}");

Filter query operators

The filter query DSL supports a couple of operators:

AND

env:prod AND service:db

OR

db:postgres OR db:mariadb

NOT

!db:postgres AND !db:mariadb

Wildcard

service:db.postgres.v* OR service:db.mariadb.v*

Note that wildcards can only be applied on the right side, so tags need to be designed in increasing cardinality (hierarchical):

BAD!: loc:munich.bavaria.germany.eu.earth

GOOD!: loc:earth.eu.germany.bavaria.munich, allows queries like: loc:earth.eu.germany.*

Nesting

env:prod AND (service:db OR service:rest-api OR service:graphql-api)

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
benches		benches
billion		billion
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
renovate.json		renovate.json
timeseries.svg		timeseries.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

talna

About

Data model

Benchmark: 1 billion data points

Basic usage

Filter query operators

AND

OR

NOT

Wildcard

Nesting

About

Licenses found

Releases

Sponsor this project

Contributors 2

Languages

License

Licenses found

marvin-j97/talna

Folders and files

Latest commit

History

Repository files navigation

talna

About

Data model

Benchmark: 1 billion data points

Basic usage

Filter query operators

AND

OR

NOT

Wildcard

Nesting

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Sponsor this project

Contributors 2

Languages