Skip to content

High level view of the Healthcheck system

cdybedahl edited this page Oct 28, 2011 · 2 revisions

Introduction

Healthcheck is a system used at the Swedish Internet Infrastructure Foundation to gather information about the .se zone, with an eye to estimating the general health of the zone. It consist of two major logical parts: the engine part that gathers data and extracts (some) information, and a web application through which the user can control which domain names will be scanned, and see the results. All the data is stored in a CouchDB database instance.

Logical structure

The two major conceptual units the Healthcheck system works with are Domain Sets and Testruns. A domainset is a list of domain names, and a testrun is the data that results from running the gathering process on that list. So a domainset can have zero or more testruns, while a testrun has exactly one domainset. The web interface presents testruns, grouped by their domainsets for selection purposes.

Intended work cycle

  • Someone gathers a list of domain names, and enters it into the system (giving it a name in the process).

  • They hit "Start Gathering" in the web interface, whereupon a new testrun id is allocated and the domain names are added to the queue marked with that testrun id.

  • The dispatcher daemon picks names off the queue and spawns child processes that gather data. It runs many in parallel, for efficiency. Usually, the amount of RAM in the server or the disk I/O needed to store the results is what sets the ceiling for how many can be run concurrently.

  • CouchDB runs its map/reduce process over the results.

  • The web interface, on request, shows the results to the user.