The data generation process uses this analogy: generated data flows from source to sink.
To generate data it is then necessary to define:
source
: what data is generated, eg. data modelsink
: where data is sent to, eg. ES indexflow
: how data is transmitted, eg. how fast or how much?schema
: fields definition, eg. ECS 8.2.0
Each of the above is handled by its own REST API endpoint. An arbitrary number of sources, sinks, flows and schemas can be defined on the same server.
Currently Geneve is packaged only for Homebrew, you need first to install the Geneve tap
$ brew tap elastic/geneve
then the tool itself
$ brew install geneve
Data is generated by the Geneve server, you start it with
$ geneve serve
2023/01/31 16:40:23 Control: http://localhost:9256
The server keeps the terminal busy with its logs, to stop just press ^C
.
The first line in the log shows where to reach it, this is the base url of
the server, all the API endpoints are reachable (but not browseable) under
api/
.
For the rest of this document we'll assume that the following shell variables are set:
$GENEVE
points to the Geneve server, urlhttp://localhost:9256
$TARGET_ES
is the url of the target Elasticsearch instance$TARGET_KIBANA
is the corresponding Kibana's url
Now open a separate terminal to operate on the server with curl.
The schema describes the fields that can be present in a generated document. At the moment it needs to be explicitly loaded into the server.
Download the latest version (or any other, if you have preferences) from
https://github.com/elastic/ecs/releases and search for file ecs_flat.yml
in the folder ecs-X.Y.Z/generated/ecs/
.
Supposing that the path of said file is in shell variable $SCHEMA_YAML
, you
load it with
$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/schema/ecs" --data-binary "@$SCHEMA_YAML"
The ecs
in the endpoint api/schema/ecs
is an arbitrary name, it's how
the loaded schema is addressed by the server.
In the data model you describe the data that shall be generated. It can be as simple as a list of fields that need to be present or more complex for defining also the relations among them.
How to write a data model is separate subject (see Data model),
here we focus on how to configure one on the server. You use the api/source
endpoint.
$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
queries:
- 'network where cidrMatch(destination.ip, "10.0.0.0/8", "192.168.0.0/16")'
EOF
Note the reference to the previously loaded schema ecs
and name of this
newly defined source, mydata
. Also, queries
is a list. You can add as
many queries you need, at each iteration Geneve will select one randomly.
You can generate some data right on terminal for early inspection
$ curl -s "$GENEVE/api/source/mydata/_generate?count=1" | jq
[
{
"@timestamp": "2023-01-31T18:19:20.197+01:00",
"destination": {
"ip": "192.168.130.52"
},
"event": {
"category": [
"network"
]
}
}
]
If all you need is security alerts then you can use security detection rules as data models; generated events will make the detection engine create alerts for you. You can select rules by name, tags or (rule) id.
Be sure to direct data to one of the indices monitored by the chosen rule(s).
Example of source configuration where the rule is selected by name:
$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
rules:
- name: IPSEC NAT Traversal Port Activity
kibana:
url: $TARGET_KIBANA
EOF
Note how the queries
entry is now replaced by rules
, which specifies
the rule name and the Kibana URL the rule shall be downloaded from.
Similarly with rule tags, they can be combined with boolean operators or
and and
:
$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
rules:
- tags: AWS or Azure or GCP
kibana:
url: $TARGET_KIBANA
EOF
Once more with rule_id
as defined on per-rule base (not to be confused
with the id
of the rule Kibana stored object):
$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
rules:
- rule_id: a9cb3641-ff4b-4cdc-a063-b4b8d02a67c7
kibana:
url: $TARGET_KIBANA
EOF
Once you're happy with the data model it's time to configure where data
shall be sent to. Endpoint api/sink
serves the purpose.
The command is rather unsofisticated:
curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/sink/mydest" --data-binary @- <<EOF
url: $TARGET_ES/myindex/_doc
EOF
The generated documents are POST
ed to the configured url one by one. The
name of this sink is mydest
, the destination index is myindex
.
Flow configuration is also quite basic, you just need a source and a sink. They need to be already defined in the server.
Use count
to specify how many documents should be generated and sent to
the stack. This flow is named myflow
.
$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/flow/myflow" --data-binary @- <<EOF
source:
name: mydata
sink:
name: mydest
count: 1000
EOF
All is left to do is to initiate the generation with
$ curl -s -XPOST "$GENEVE/api/flow/myflow/_start"
You can also check the progress with
$ curl -s "$GENEVE/api/flow/myflow"
params:
source:
name: mydata
sink:
name: mydest
count: 1000
state:
alive: true
documents: 250
documents_per_second: 350
Or stop it with
$ curl -s -XPOST "$GENEVE/api/flow/myflow/_stop"
Geneve assumes the target stack and index to be ready for documents acceptance, it seems pointless and expensive to duplicate the stack and indices configuration functionality.
Depending on your needs and the configuration of your stack, you may need or not to do extra steps before actually pumping any document into the stack.
If your target index does not exist and is not managed by any index template, then you may want to create it and configure its mappings.
Geneve can help you with the mappings, the api/source/<name>/_mappings
endpoint returns the mappings of all the possible fields that can be
encountered in the documents generated by that source.
Use the Elasticsearch index API to create the index
$ curl -s -XPUT -H "Content-Type: application/json" $TARGET_ES/myindex --data @- <<EOF
{
"mappings": $(curl -fs "$GENEVE/api/source/mydata/_mappings")
}
EOF
Note the embedded Geneve source API call to get the mappings, its output is merged in the index API request.
If you want to use Kibana Security to analyze the generated data, you need a data view in place. If your target index is not already included in some existing data view, then you need to create one by yourself.
Use the following command to create it from command line
$ curl -s -XPOST -H "Content-Type: application/json" -H "kbn-xsrf: true" $TARGET_KIBANA/api/data_views/data_view --data @- <<EOF
{
"data_view": {
"title": "myindex"
}
}
EOF
While Geneve is well capable of generating fields with IPv4 and IPv6 addresses, the same does not apply to their geographical location.
As workaround you can leverage the stack geoip processor to enrich the data.
First create the ingest pipeline (ex. geoip-info
)
$ curl -s -XPUT -H "Content-Type: application/json" $TARGET_ES/_ingest/pipeline/geoip-info --data @- <<EOF
{
"description": "Add geoip info",
"processors": [
{
"geoip": {
"field": "client.ip",
"target_field": "client.geo",
"ignore_missing": true
}
},
{
"geoip": {
"field": "source.ip",
"target_field": "source.geo",
"ignore_missing": true
}
},
{
"geoip": {
"field": "destination.ip",
"target_field": "destination.geo",
"ignore_missing": true
}
},
{
"geoip": {
"field": "server.ip",
"target_field": "server.geo",
"ignore_missing": true
}
},
{
"geoip": {
"field": "host.ip",
"target_field": "host.geo",
"ignore_missing": true
}
}
]
}
Next, append ?pipeline=geoip-info
to the url of your sink (see Set the
destination). This instructs the stack to pass the
generated data through the just created geoip-info
pipeline.
Optionally, ensure that your stack keeps the Geoip database up to date
$ curl -s -XPUT -H "Content-Type: application/json" $TARGET_ES/_cluster/settings --data @- <<EOF
{
"transient": {
"ingest": {
"geoip": {
"downloader": {
"enabled": "true"
}
}
}
}
}
EOF
At last, update your data model so to include the fields you want the geoip processor to fill in. Geneve will generate them with random content, the ingest pipeline will replace that content with better one.
$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
queries:
- 'network where
cidrMatch(destination.ip, "10.0.0.0/8", "192.168.0.0/16") and
destination.geo.city_name != null and
destination.geo.country_name != null and
destination.geo.location != null
'
EOF
In case the generated IP does not have any entry in the geoip database, the ingest pipeline will leave the content generated by Geneve as is. This will result in completely bogus randomic city, country etc names. If you read them, you'll know where the come from. We've issue #115 to deal with this.
For more details read GeoIP processor.