Skip to content

Latest commit

 

History

History
175 lines (136 loc) · 7.65 KB

design.md

File metadata and controls

175 lines (136 loc) · 7.65 KB

Design Overview

Woodchipper has a four-stop processing pipeline for incoming message:

  1. Reading: reads raw messages as strings from some source,
  2. Parsing: converts raw messages into a standardized format
  3. Classification: converts standardized messages into human-readable chunks with rendering metadata
  4. Rendering: displays messages to the screen, possibly applying styles and providing interactive features

Each stage may have multiple implementations and will be selected either by the user (readers and renderers) or determined automatically (e.g. parsers and classifiers).

Reading

Readers fetch messages from some input source as text and pass them along for parsing. Input sources may be local (stdin, file, subprocess) or may fetch log messages via sockets or some API.

Existing implementations include:

  • stdin.rs: reads lines from standard input / pipes
  • stdin_hack.rs: reads lines from /dev/stdin to avoid conflicts with the interactive renderer on Unix
  • null.rs: a dummy reader that prints an error and quits, used as a fallback if no other reader is available
  • the kubernetes reader fetches log messages from Kubernetes pods via the Kubernetes API

Readers run in a dedicated thread and send messages over a channel for further processing. If needed, they may accept arguments via the Config to, for example, set the Kubernetes namespace.

Rust's blocking IO means that reader threads cannot be reliably terminated at users' request, so we can't necessarily expect readers to be capable of responding to an exit request. However, readers require some cleanup actions may use the optional exit request and response channels to listen for exit requests, perform cleanup actions, and notify the main thread that it's safe to terminate.

Rather than pushing just a raw message string over the channel, lines are instead wrapped in a LogEntry, allowing some additional metadata to be send along the channel:

  • LogEntry::eof() can be sent to notify renderers that the end of input has been reached

  • LogEntry::message() is used to send normal messages

    Optionally, a ReaderMetadata may be provided to pass along datatype hints if they're available at read-time, e.g. a source name if reading from multiple sources or a timestamp if tracked via the input api (e.g. Docker and Kubernetes).

  • LogEntry::internal() is used to send internal messages to the user as our own logging ability is restricted, particularly in the interactive renderer

Parsing

Woodchipper parses lines independently to better support applications that output multiple formats (e.g. startup scripts, 3rd party libraries, or multiple separate Kubernetes containers). Parsers must quickly determine if messages are supported or hand them off to the next parser in the chain.

If the parser can parse the input message, it returns a normalized Message instance with as much metadata as it could extract.

Existing implementations include:

  • json.rs: parses JSON log lines, i.e. lines like {...}\n

    It specifically aims to support logrus-like JSON output formats, but various other field mappings are also supported.

    Prefers RFC-3339-style timestamps but falls back to dtparse.

    Unidentified fields are copied to the metadata field for use later in the pipline.

  • plain.rs: the fallback parser; renders the raw message, but opportunistically includes metadata if it can be identified.

    Where possible, timestamps are parsed out of messages using dtparse, with some simple checks to discard timestamps for common failure cases. Log levels are identified where possible.

Parsers may refer to the reader's metadata to include or override their parsed contextual info. For example, the plain parser prefers to use the reader's timestamp rather than using the significantly slower and less accurate dtparse free-form parser.

Classification

Given a normalized Message instance, a classifier generates some number of Chunks. They are responsible for determining various rendering-specific attributes:

  • the formatted text content
  • the kind, used mainly for highlighting and aligning text segments
  • the slot, used to place the segment within a screen region (left, center, right)
  • the alignment of text within a chunk
  • padding, wrapping, and line break hints
  • the weight, used to hide less important chunks on smaller displays

At the moment, chunks are arranged based on the order in which classifiers are executed. Chunks may contain children to individually apply styles to different sub-sections of a text segment while avoiding improper line wrapping.

Classifiers may mark metadata fields as "consumed" by adding their keys to a shared HashSet, allowing later classifiers in the chain to skip

Existing implementations include:

  • timestamp.rs: formats timestamps into two chunks, allowing the lower priority date chunk to be pruned while still displaying the time.
  • level.rs: adds the log level using its level-specific kind
  • text.rs: adds force-wrapped chunks per line of input text, allowing strings with newlines to be displayed sensibly
  • logrus.rs: extracts logrus's file field for display in the right column, trimming the path to the last few components
  • metadata.rs: adds all un-processed metadata fields to the message as [key]=[value] pairs

Rendering

Existing implementations include:

  • json.rs: writes the normalized parsed messages back to standard output, discarding classifier results. Useful for normalizing log messages in scripting applications.

  • plain.rs: writes classified messages to standard output with basic (whitespace-only) formatting, suitable for sharing.

    This renderer is automatically selected if output is piped. The interactive renderer will re-format messages using this renderer when copying to the clipboard.

  • styled.rs: writes classified and styled output to standard output.

    If terminal width can be detected, lines will be wrapped and a right-side column may display contextual information.

    This output is less suitable for sharing as it contains ANSI escape characters and right-aligned text.

  • the interactive renderer: a performant custom pager with interactive features, including text reflow, searching, filtering, and improved browsing.