docs update for refs/tags/5.21.1-preview

nosqlbench · Jun 24, 2024 · fd4a07a · fd4a07a
1 parent 4a0fda8
commit fd4a07a
Show file tree

Hide file tree

Showing 3 changed files with 158 additions and 0 deletions.
diff --git a/exported_docs.zip b/exported_docs.zip
diff --git a/site/content/reference/bindings/binding-concepts.md b/site/content/reference/bindings/binding-concepts.md
@@ -0,0 +1,119 @@
+---
+title: Binding Concepts
+weight: 1
+---
+
+NoSQLBench has a built-in library for the flexible management and expressive use of
+procedural generation libraries. This section explains the core concepts
+of this library, known as _Virtual Data Set_.
+
+## Basic Example
+
+These functions can be stitched together in small recipes. When you give
+these mapping functions useful names in your workloads, they are called
+bindings.
+
+Here is an example:
+
+```yaml
+bindings:
+ numbers: NumberNameToString()
+ names: FirstNames()
+```
+
+These are two bindings that you can use in your workloads. The names on the left
+are the _binding names_ and the functions on the right are the _binding recipes_.
+Altogether, we just call them _bindings_.
+
+
+## Variates (Samples)
+
+A numeric sample that is drawn from a distribution for the purpose
+of simulation or analysis is called a *Variate*.
+
+## Procedural Generation
+
+Procedural generation is a category of algorithms and techniques which take
+a set or stream of inputs and produce an output in a different form or structure.
+While it may appear that procedural generation actually _generates_ data, no output
+can come from a void. These techniques simply perturb a value in some stateful way,
+or map a coordinate system to another representation. Sometimes, both techniques are
+combined together.
+
+## Uniform Variate
+
+A variate (sample) drawn from a uniform (flat) distribution is what we are used
+to seeing when we ask a system for a "random" value. These are often produced in
+one of two very common forms, either a register full of bits as with most hashing
+functions, or a floating point value between 0.0 and 1.0. (This is called the _unit
+interval_).
+
+Uniform variates are not really random. Without careful attention to API usage,
+such random samples are not even unique from session to session. In many systems,
+the programmer has to be very careful to seed the random generator or they will
+get the same sequence of numbers every time they run their program. This turns out
+to be a useful property, and the random number generators that behave this way are
+usually called Pseudo-Random Number Generators, or PRNGs.
+
+## Apparently Random Variates
+
+Uniform variates produced by PRNGs are not actually random, even though they may
+pass certain tests for randomness. The streams of values produced are nearly
+always measurably random by some meaningful standard. However, they can be
+used again in exactly the same way with the same initial seed.
+
+## Deterministic Variates
+
+If you intentionally avoid randomizing the initial seed for a PRNG, for example,
+with the current timestamp, then it gives you a way to replay a sequence.
+You can think of each initial seed as a _bank_ of values which you can go back
+to at any time. However, when using stateful PRNGs as a way to provide these
+variates, your results will be order dependent.
+
+## Randomly Accessible Determinism
+
+Instead of using a PRNG, it is possible to use a hash function instead. With a 64-bit
+register, you have 2^64 (2^63 in practice due to available implementations) possible
+values. If your hash function has high dispersion, then you will effectively
+get the same result of apparent randomness as well as deterministic sequences, even
+when you use simple sequences of inputs to your _random()_ function. This allows
+you to access a random value in bucket 57, for example, and go back to it at any
+time and in any order to get the same value again.
+
+## Data Mapping Functions
+
+The data mapping functions are the core building block of virtual data set.
+Data mapping functions are generally pure functions. This simply means that
+a generator function will always provide the same result given the same input.
+The parameters that you will see on some binding recipes are not representative
+of volatile state. These parameters are initializer values which are part of a
+function's definition. For example a `Mod(5)` will always behave like a `Mod(5)`,
+as a pure function. But a `Mod(7)` will behave differently than a `Mod(5)`, although
+each function will always produce its own stable result for a given input.
+
+## Combining RNGs and Data Mapping Functions
+
+Because pure functions play such a key part in procedural generation techniques,
+the terms "data mapping function", "data mapper" and "data mapping library" will
+be more common in the library than "generator". Conceptually, mapping functions
+to not generate anything. It makes more sense to think of mapping data from one
+domain to another. Even so, the data that is yielded by mapping functions can
+appear quite realistic.
+
+Because good RNGs do generally contain internal state, they aren't purely
+functional. This means that in some cases -- those in which you need to have
+random access to a virtual data set, hash functions make more sense. This
+toolkit allows you to choose between the two in some cases. However, it
+generally favors using hashing and pure-function approaches where possible. Even
+the statistical curve simulations do this.
+
+## Bindings Template
+
+It is often useful to have a template that describes a set of generator
+functions that can be reused across many threads or other application scopes. A
+bindings template is a way to capture the requested generator functions for
+re-use, with actual scope instantiation of the generator functions controlled by
+the usage point. For example, in a JEE app, you may have a bindings template in
+the application scope, and a set of actual bindings within each request (thread
+scope).
+
diff --git a/site/content/reference/drivers/driver-discovery.md b/site/content/reference/drivers/driver-discovery.md
@@ -0,0 +1,39 @@
++++
+title= "Built-In Adapters"
+weight= 1
++++
+
+# Built-In Adapters
+
+NoSQLBench supports a variety of different operations. For operations like
+sending a query to a database, a native driver is typically used with help of the
+DriverAdapter API. For basic operations, like writing the content of a templated
+message to stdout, no native driver is needed, although the mechanism of stdout
+is still implemented via the same Adapter API. In effect, if you want to
+allow NoSQLBench to understand your op templates in a new way, you add an Adapter
+and program it to interpret op templates in a specific way.
+
+Each op template of an activity can be configured to use a specific adapter. The `driver=...`
+parameter sets the default adapter to use for all op templates in an activity. However,
+this can be overridden per op template with the `driver` field.
+
+# Discovering Driver Adapters
+
+NoSQLBench comes with some drivers built-in. You can discover these by running:
+
+    nb5 --list-drivers
+
+Each one comes with its own built-in documentation. It can be accessed with this command:
+
+    nb5 help <driver>
+
+This section contains the per-driver documentation that you get when you run the above command.
+These driver docs are  auto-populated when NoSQLBench is built, so they are exactly the same as
+you will see with the above command, only rendered in HTML.
+
+# External Adapter jars
+
+It is possible to load an adapter from a jar at runtime. If the environment variable `NBLIBDIR`
+is set, it is taken as a library search path for jars, separated by a colon. For each element in the
+lib paths that exists, it is added to the classpath. If the element is a named .jar file, it is
+added. If it is a directory, then all jar files in that directory are added.