Skip to content

Commit

Permalink
Merge pull request #24 from HewlettPackard/regex-parser
Browse files Browse the repository at this point in the history
Add a generic regex parser
  • Loading branch information
timothyb89 authored Jul 9, 2019
2 parents 987f498 + 73cfd7b commit d813e86
Show file tree
Hide file tree
Showing 16 changed files with 602 additions and 22 deletions.
5 changes: 5 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ jobs:
cargo update -p woodchipper
fi
- run:
name: Run unit tests on linux x86_64
command: cargo test
- run:
name: Build static linux x86_64
command: cargo build --release --locked
Expand Down Expand Up @@ -62,6 +65,8 @@ jobs:
cargo update -p woodchipper
fi
# running unit tests on cross-compiled windows binaries doesn't seem
# especially productive
- run:
name: Build windows x86_64
command: |
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ terminal.
reflow, and clipboard support
* Built-in Kubernetes support follows multiple pods and containers at
once
* User-customizable output styles and custom log formats (see [customization])

## Quick Start

Expand Down Expand Up @@ -121,6 +122,7 @@ potentially mixed together:
* [logrus]-style key/value pair logs, e.g. `time="..." msg="hello world"`
* [klog] logs for Kubernetes components
* Plaintext logs with inferred timestamps and log levels
* User-specified custom formats with the [regex parser][regex]

## Similar Projects

Expand Down Expand Up @@ -150,9 +152,11 @@ end of your commit message.
Additionally, the [design documentation][design] may be a helpful resource for
understanding how woodchipper works.

[customization]: ./doc/customization.md
[plugin]: ./misc/kubectl-woodchipper
[releases]: https://github.com/HewlettPackard/woodchipper/releases/latest
[klog]: https://github.com/kubernetes/klog
[regex]: ./doc/customization.md#log-formats
[stern]: https://github.com/wercker/stern
[logrus]: https://github.com/sirupsen/logrus
[slog]: https://github.com/slog-rs/slog
Expand Down
147 changes: 147 additions & 0 deletions doc/customization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Customization

woodchipper can be customized using command-line flags and environment
variables.

A few of the more complex options are discussed here, but for a full list of
options, refer to `woodchipper --help`.

## Color Schemes

woodchipper can use any [base16 color scheme][base16]. To use:

* Save a scheme's `.yaml` configuration somewhere local, e.g.
[`classic-dark.yaml`][classic-dark]
* Run woodchipper with `--style=base16:path/to/classic-dark.yaml`
* Once satisfied with results, set:

```
export WD_STYLE=base16:path/to/classic-dark.yaml
```

...in your environment.

[base16]: https://github.com/chriskempson/base16#scheme-repositories
[classic-dark]: https://github.com/detly/base16-classic-scheme/blob/master/classic-dark.yaml

## Log Formats

In addition to the built-in formats, woodchipper supports custom regex-based
parsers. These can be used to support many application-specific log formats that
don't require a more advanced parser.

To add a custom log format, create a `.yaml` file containing a list of regexes:

```yaml
- pattern: ...
datetime: ...

- pattern: ...
datetime: ...
datetime_prepend: ...
```
Each `pattern` field should contain a regex with various
[named capture groups][groups]:
* `(?P<datetime>...)`

Captures the datetime string - this is further parsed later using the format
set in the `datetime` field.
* `(?P<level>...)`

Captures the log level (`I`, `INFO`, etc; case insensitive)
* `(?P<text>...)`

Captures the main message text.

Any additional named capture groups will be added as message metadata. Certain
classifiers may have special display-time rules for metadata fields; for
example, the `file` or `caller` fields will be shown as right-aligned context
if there's enough available screen width.

The `datetime` field contains parsing rules for the captured `datetime` field.
It has two built-in formats, `rfc2822` and `rfc3339`, but a free-form
[chrono `stftime`][strftime] string can be set here as well.

Note that chrono requires fully-formed datetime strings, and won't fill in
missing fields for you. If your log format omits some fields (e.g. `klog`
doesn't output the year), you can use `datetime_prepend` to add missing fields
to the incoming datetime string based on the current UTC time. This field should
contain another strftime format string with only the missing fields from the
original input.

Finally, to make use of the regex config, first test with:

```
woodchipper --regexes path/to/regexes.yaml
```

... and once satisfied with the results, add:

```bash
export WD_REGEXES=path/to/regexes.yaml
```

... to your environment.

[groups]: https://docs.rs/regex/1.1.7/regex/#grouping-and-flags
[strftime]: https://docs.rs/chrono/0.4.7/chrono/format/strftime/index.html

### Example

As an example, take this Python logging example, `logs.py`:

```python
import logging
logging.basicConfig(
format='%(asctime)-15s - %(levelname)-8s - %(filename)s:%(lineno)d - %(message)s',
level='DEBUG'
)
logger = logging.getLogger('test')
logger.debug('this is a debug message')
logger.info('this is an info message')
logger.warning('this is a warning message')
logger.error('this is an error message')
```

It produces log messages like this:
```
$ python3 -u logs.py 2>&1
2019-07-03 12:02:13,977 - DEBUG - test.py:9 - this is a debug message
2019-07-03 12:02:13,977 - INFO - test.py:10 - this is an info message
2019-07-03 12:02:13,977 - WARNING - test.py:11 - this is a warning message
2019-07-03 12:02:13,977 - ERROR - test.py:12 - this is an error message
```

Create a YAML file, e.g. `~/.woodchipper-regexes.yaml` with the following
content:

```yaml
- pattern: |-
^(?P<datetime>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2})(?:,[0-9]+) - (?P<level>\w+)\s* - (?P<file>\S+)\s* -(?P<text>.+)$
datetime: '%Y-%m-%d %H:%M:%S'
```

Now pipe it through `woodchipper` with the `--regexes` flag set to point to your
YAML file:

```
$ python3 -u test.py 2>&1 | woodchipper -r json --regexes ~/.woodchipper-regexes.yaml
{"kind":"regex","timestamp":"2019-07-03T12:07:15Z","level":"debug","text":" this is a debug message","metadata":{"file":"test.py:9"}}
{"kind":"regex","timestamp":"2019-07-03T12:07:15Z","level":"info","text":" this is an info message","metadata":{"file":"test.py:10"}}
{"kind":"regex","timestamp":"2019-07-03T12:07:15Z","level":"warning","text":" this is a warning message","metadata":{"file":"test.py:11"}}
{"kind":"regex","timestamp":"2019-07-03T12:07:15Z","level":"error","text":" this is an error message","metadata":{"file":"test.py:12"}}
```

The three primary fields (timestamp, level, text) were captured, along with an
additional metadata field containing the `file`.

Note that chrono's strftime doesn't seem to support custom millisecond separator
characters if they aren't right-aligned to a particular width, and python's
`%(asctime)` seems to like using commas. The example pattern above just excludes
the milliseconds as they won't be displayed anyway.

Finally, `WD_REGEXES` may be set in your environment to make use of this regex
configuration without needing to manually pass in `--regexes`.
85 changes: 84 additions & 1 deletion src/config.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
// (C) Copyright 2019 Hewlett Packard Enterprise Development LP

use std::sync::Arc;
use std::error::Error;
use std::fmt;
use std::fs::File;
use std::io::BufReader;
use std::str::FromStr;
use std::sync::Arc;

use atty::{self, Stream};
use regex::Regex;
use serde::Deserialize;
use serde::de::{self, Visitor, Deserializer};
use shellexpand;
use simple_error::SimpleError;
use structopt::StructOpt;

use crate::style::StyleConfig;
Expand Down Expand Up @@ -146,6 +154,76 @@ pub struct KubernetesConfig {
pub poll_interval: u64
}

struct RegexFromStr;

impl<'de> Visitor<'de> for RegexFromStr {
type Value = Regex;

fn expecting(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.write_str("a string containing a valid regular expression")
}

fn visit_str<E>(self, s: &str) -> Result<Self::Value, E>
where
E: de::Error
{
match Regex::new(s) {
Ok(r) => Ok(r),
Err(e) => Err(de::Error::custom(format!(
"could not compile regex: {:?}", e
)))
}
}
}

fn de_regex<'de, D>(deserializer: D) -> Result<Regex, D::Error>
where
D: Deserializer<'de>
{
deserializer.deserialize_str(RegexFromStr)
}

#[derive(Debug, Deserialize)]
pub struct RegexMapping {
/// a Regex pattern to parse an incoming line
#[serde(deserialize_with = "de_regex")]
pub pattern: Regex,

/// A Chrono datetime format string, will be applied to the `datetime` capture
/// group
pub datetime: Option<String>,

/// An optional Chrono strftime string used to prepend missing fields to the
/// timestamp before parsing
///
/// Chrono isn't able to parse datetimes with missing fields (e.g. year), but
/// some log formats (e.g. klog) leave certain fields out. This allows these
/// formats to be parsed anyway.
pub datetime_prepend: Option<String>
}

#[derive(Debug)]
pub struct RegexConfig {
pub mappings: Vec<RegexMapping>
}

impl FromStr for RegexConfig {
type Err = SimpleError;

fn from_str(path: &str) -> Result<Self, Self::Err> {
let expanded_path = shellexpand::full(path).map_err(SimpleError::from)?;
let file = File::open(&expanded_path.to_string()).map_err(SimpleError::from)?;
let reader = BufReader::new(file);

match serde_yaml::from_reader(reader) {
Ok(mappings) => Ok(RegexConfig { mappings }),
Err(e) => Err(SimpleError::new(
format!("error loading regexes {}: {:?}", path, e)
))
}
}
}

#[derive(Debug, StructOpt)]
#[structopt(
name = "woodchipper",
Expand Down Expand Up @@ -206,6 +284,11 @@ pub struct Config {
#[structopt(long, short = "s", default_value = "default", env = "WD_STYLE")]
pub style: StyleConfig,

/// A path to a regexes config file, which may contain custom parsing regexes
/// for application-specific log formats.
#[structopt(long, env = "WD_REGEXES")]
pub regexes: Option<RegexConfig>,

#[structopt(flatten)]
pub kubernetes: KubernetesConfig
}
4 changes: 3 additions & 1 deletion src/parser/json.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@

use std::collections::HashMap;
use std::error::Error;
use std::sync::Arc;

use chrono::prelude::*;
use regex::Regex;
use serde_json::{self, Value, Map};

use crate::config::Config;
use super::types::{
LogLevel, MappingField, Message, MessageKind, ReaderMetadata
};
Expand Down Expand Up @@ -156,7 +158,7 @@ pub fn parse_document(
}

pub fn parse_json(
line: &str, meta: Option<ReaderMetadata>
_config: Arc<Config>, line: &str, meta: Option<ReaderMetadata>
) -> Result<Option<Message>, Box<Error>> {
// skip anything that doesn't at least vaguely look like json
if !line.starts_with('{') || !line.ends_with('}') {
Expand Down
4 changes: 3 additions & 1 deletion src/parser/klog.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@

use std::collections::HashMap;
use std::error::Error;
use std::sync::Arc;

use chrono::prelude::*;
use regex::Regex;
use serde_json::Value;

use crate::config::Config;
use super::types::{LogLevel, Message, MessageKind, ReaderMetadata};

fn map_klog_level(level: &str) -> Option<LogLevel> {
Expand All @@ -25,7 +27,7 @@ fn map_klog_level(level: &str) -> Option<LogLevel> {
// based on the format description at:
// https://github.com/kubernetes/klog/blob/master/klog.go#L592-L602
pub fn parse_klog(
line: &str, meta: Option<ReaderMetadata>
_config: Arc<Config>, line: &str, meta: Option<ReaderMetadata>
) -> Result<Option<Message>, Box<Error>> {
lazy_static! {
static ref RE: Regex = Regex::new(
Expand Down
4 changes: 3 additions & 1 deletion src/parser/logrus.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
// (C) Copyright 2019 Hewlett Packard Enterprise Development LP

use std::error::Error;
use std::sync::Arc;

use pest::Parser;
use serde_json::{self, Value, Map};
use simple_error::SimpleError;

use crate::config::Config;
use super::types::{Message, ReaderMetadata};
use super::json::parse_document;

Expand Down Expand Up @@ -64,7 +66,7 @@ pub fn logrus_to_document(
}

pub fn parse_logrus(
line: &str, meta: Option<ReaderMetadata>
_config: Arc<Config>, line: &str, meta: Option<ReaderMetadata>
) -> Result<Option<Message>, Box<Error>> {
match logrus_to_document(line) {
Ok(doc) => parse_document(doc, meta),
Expand Down
Loading

0 comments on commit d813e86

Please sign in to comment.