Quilt packages are not only reproducible units of data and metadata, but units of reporting. You can use the following features to include interactive visualizations and light applications inside of packages.
Importantly, relative references to data are resolved relative to the parent package. This means that all of your reports are backed by immutable, versioned data, providing a common frame of reference that is lacking in BI applications that read from fast-moving databases and file systems.
In addition to rendering a wide variety of images, binary files, and text files, the Quilt catalog supports the following libraries for visualization and exploration:
The above systems provide you with hundreds of charts out of the box.
quilt_summarize.json
is a configuration file that renders one or more dashboard
elements in both Bucket view and Packages view. The contents of quilt_summarize.json
are a JSON array of files that you wish to preview in the catalog. Each file may
be represented as a string or, if you wish to provide more configuration, as an
object.
The simplest summary is a list of relative paths to files that you wish to preview:
// quilt_summarize.json
[
"file1.json",
"file2.csv",
"file3.ipynb"
]
By default, each list element renders in its own row.
For multi-column layouts, you can provide an array instead of a string for a given row:
// quilt_summarize.json
[
"file1.json",
[{
"path": "file2.csv",
"width": "200px"
}, {
"path": "file3.ipynb",
"title": "Scientific notebook",
"description": "[See docs](https://docs.com)"
}]
]
Each element of an array in quilt_summarize.json
can either be a path string
or an object with one or more of the following properties:
path
- file path relative toquilt_summarize.json
title
- title rendered instead of file pathdescription
- description in markdown formatexpand
- Display the file (true
) or display a preview in an expandable box (false
, default)width
- column width either in pixels or ratio (default is ratio1
)types
- a list of render types (at present only singleton lists are supported):["echarts"]
to render JSON as an EChart["perspective"]
to render tabular data (csv, xlsx etc.) with Perspective["igv"]
to render JSON with Integrative Genomics Viewer["voila"]
to render a Jupyter notebook as an interactive Voila dashboard["html"]
to render HTML in iframes. See also Advanced HTML rendering["text"]
to render anything as text with syntax highlighting
If you need to control the height of an element (useful for Voila dashboards), use the following extended syntax:
// quilt_summarize.json
[
{
"path": "file1.json",
"types": [
{
"name": "echarts",
"style": { "height": "1000px" }
}
]
}
]
At present height
is the only supported style
element.
Limitations:
- Objects linked via
quilt_summarize.json
are always the latest version, even when browsing an older package version.- Object titles and image thumbnails link to the file view, even in the package view.
If your Amazon S3 bucket contains images, by default the Quilt
Catalog displays a preview of those images before any
quilt_summarize.json
-referenced files.
In the Overview tab, the Catalog parses the entire Amazon S3 bucket contents and displays thumbnail image previews in a paginated grid (25 per page by default) of all supported image types.
To hide this block, use the
gallery
field in your bucket preferences file.
In the Bucket tab, the Catalog displays thumbnail image previews in a similarly paginated grid but only from the current directory viewed.
In the Packages tab, when a specific package has been opened the Catalog displays thumbnail image previews in a similarly paginated grid but only those image files in the selected package.
The Quilt catalog uses vega-embed to render vega and vega-lite visualizations. See package.json for specific library versions and compatibility.
In order to display a Vega or Vega-Lite visualization, simply reference a JSON
file with a library-compatible schema in a JSON file as follows in your
quilt_summarize.json
file:
[
"relative/path/to/my/vega.json",
"optionally/some/other/file.csv"
]
For both Vega and Vega Lite you may specify relative paths to package files as data sources and the Quilt catalog correctly resolves them. Vega treats any data source as JSON by default. If you wish to use a different format than JSON, please specify the file type. For example:
{
"data": {
"url": "./datasource.csv",
"format": {
"type": "csv"
}
}
The easiest way to create Vega-lite visualizations for Quilt packages is with Altair.
Here's a simple example:
import pandas as pd
from numpy import random
import altair as alt
# Create Dataframe with two columns of random values
scores = random.randint(60, 100, size=5)
densities = random.random_sample(5)
df = pd.DataFrame({'score': scores, 'density': densities})
# Create Chart with two Quantitative axes
alt.Chart(df).mark_area(
color="gray",
opacity=.2
).encode(
x="score:Q",
y='density:Q',
tooltip=['count(score):Q']
).save("vega.json")
To create plots that directly embed a dataset with more than 5000 rows (a large dataset), you will encounter a
MaxRowsError
. You can get around this error in several different ways
- Interactive map of California with slider scale
- Interactive map of 2015 United States by-county smoking & poverty data
To render an EChart, you provide a JSON file (a dictionary that
specifies the ECharts option parameter)
and you set the "types"
property to [ "echarts" ]
.
// quilt_summarize.json
[
{
"path": "echarts-option-file.json",
"title": "Awesome line chart",
"types": ["echarts"]
}
]
The following example is a simple line chart from the ECharts documentation.
// echarts.json
{
"dataset": {
"source": [
["Mon", 150],
["Tue", 230],
["Wed", 224],
["Thu", 218],
["Fri", 135],
["Sat", 147],
["Sun", 250]
]
},
"xAxis": {
"type": "category"
},
"yAxis": {
"type": "value"
},
"series": [
{
"type": "line"
}
]
}
As with Vega, you can provide either a relative path or URL to the dataset file.
// echarts.json
{
"dataset": {
"source": "./dataset.csv"
},
"xAxis": {
"type": "category"
},
"yAxis": {
"type": "value"
},
"series": [
{
"type": "line"
}
]
}
Relative paths are resolved relative to your echarts.json file and relative to the parent package.
At present, ECharts in Quilt does not support custom JavaScript. You are therefore
limited to JSON types (numbers, strings, objects, arrays, etc.). Functions like symbolSize
are not available.
This feature is a Developer preview, details are subject to change.
Enterprise deployments of Quilt support interactive Jupyter notebooks with Voilà.
In brief, a Voila dashboard version of your notebook will display all of the output cells and none of the input cells from the underlying notebook. This enables you to create interactive, Jupyter-driven apps for your Quilt catalog users.
The Voila libraries execute a remote Jupyter Kernel and stream the results to the
browser with tornado. Jupyter kernels run on a single EC2 instance (t3.small
by default)
in Linux containers that have network access but do not have access to persistent
storage. The catalog users's AWS credentials are passed to Jupyter kernel as
environment variables.
When you have a Voila dashboard inside of a Quilt package, you may wish to reference files in the current package revision. The Quilt catalog sets the following environment variables and passes them to the Voila kernel:
QUILT_PKG_BUCKET
QUILT_PKG_NAME
QUILT_PKG_TOP_HASH
You can access these variables in Python and browse the package:
import io
import os
import pandas as pd
import quilt3 as q3
# https://open.quiltdata.com/b/allencell/packages/aics/data_handoff_4dn/tree/260c3a46581a324e3a495570886e07b62cb4ff54f20b334c5d73a5a370e678c1/
bucket = os.environ.get("QUILT_PKG_BUCKET") or "allencell"
handle = os.environ.get("QUILT_PKG_NAME") or "aics/data_handoff_4dn"
top_hash = os.environ.get("QUILT_PKG_TOP_HASH") or "260c3a46581a324e3a495570886e07b62cb4ff54f20b334c5d73a5a370e678c1"
pkg = q3.Package.browse(handle, registry=f"s3://{bucket}", top_hash=top_hash)
# Read data.csv from the current package from Voila
df = pkg["metadata.csv"].deserialize()
By default, Quilt Voila containers provide the following modules:
altair
bqplot
ipykernel
ipyvolume
ipywidgets
pandas
perspective-python
PyYAML
quilt3
scipy
Quilt renders tabular data formats into a Perspective Datagrid, including the following file extensions: .csv, .xls, .xlsx, .jsonl, .parquet, and .tsv.
For speed, Quilt loads the first few rows stored in S3. Click Load More to fetch up to about 6MB of zipped data. To see the entire file contents for large files, download the file (lower left).
Click Filter and Plot to open the side drawer. Drag and drop columns from the sidebar to Group By, Split By, Order By, and Where to pivot, filter, and more.
Select from a variety for visualizations by clicking the upper left menu that initially displays "Datgrid".
Click Toggle Theme to use a fixed-width font (useful for comparing strings).
Use the controls along the bottom to reset, download, copy, resize the grid, and more.
To open the drawer by default, set the config.settings
property
in quilt_summarize.json
as follows:
// quilt_summarize.json
[
{
"path": "file1.csv",
"types": [
{
"name": "perspective",
"config": {
"settings": true
}
}
]
}
]
You can save the state of the datagrid, as shown below. To restore a saved datagrid
use the config
property of quilt_summarize.json
:
All filters and columns will be restored:
// quilt_summarize.json
[
{
"path": "file1.csv",
"types": [
{
"name": "perspective",
"config": {
"columns": ["YOUR_COLUMN_0", "YOUR_COLUMN_1"],
"group_by": ["YOUR_COLUMN_1"],
"settings": true,
"theme": "Material Light Mono"
}
}
]
}
]
Several customers have reported that Perspective Datagrids fail to automatically render in the Quilt web catalog. We have isolated this problem to clashes with third party browser extensions in both Mozilla Firefox and Google Chrome. At least one extension, Zotero Connector, has been reported and the error reproduced.
If you encounter a rendering error, please first try a different browser (Firefox, Safari, Edge) on the same machine. If the error persists, next disable all third-party extensions, turning each one back on, one-by-one, until the problem extension is identified. Please then notify [email protected] with the extension name and version.
To render genome tracks, you can select "View as IGV" in the catalog, or you can invoke igv.js in quilt_summarize, as shown below:
// quilt_summarize.json
[
{
"path": "igv-options-file.json",
"title": "Awesome genome",
"types": ["igv"]
}
]
In the above example, igv-options-file.json
is an
IGV browser configuration.
You may specify relative paths to package files or absolute S3 URLs as data sources, and the Quilt catalog will resolve them. HTTP URLs will remain unchanged.
Note: Please be mindful of rendering large sequences
You can limit the downloaded file size of the sequence by using the visibilityWindow
parameter
(-1
is for downloading the whole file, which could potentially
be several gigabytes in size - this may impact
rendering speed and interactive performance).
Note that tracks are grouped by type and file format.
// igv-options-file.json
{
"tracks": [{
"name": "Absolute URL track",
"url": "s3://bucket/file" // will be resolved
}, {
"name": "Relative path track",
"url": "./file" // will be resolved
}, {
"name": "HTTP URL track",
"url": "https://some-url-even-url-to-s3-file" // will stay intact
}]
}