Quilt provides support for queries in the Elasticsearch DSL, as well as SQL queries in Athena.

Elasticsearch

The objects in Amazon S3 buckets connected to Quilt are synchronized to an Elasticsearch cluster, which provides Quilt's search features.

Quilt uses Elasticsearch 6.7 (docs).

Indexing

Quilt maintains a near-realtime index of the objects in your S3 bucket in Elasticsearch. Each bucket corresponds to one or more Elasticsearch indexes. As objects are mutated in S3, Quilt uses an event-driven system (via SNS and SQS) to update Elasticsearch.

There are two types of indexing in Quilt:

shallow indexing includes object metadata (such as the file name and size)
deep indexing includes object contents. Quilt supports deep indexing for the following file extensions:
- .csv, .html, .json, .md, .rmd, .rst, .tab, .txt, .tsv (plain-text formats)
- .fcs (FlowJo)
- .ipynb (Jupyter notebooks)
- .parquet
- .pdf
- .pptx
- .xls, .xlsx

By default, Quilt indexes a limited number of bytes per document for specified file formats (100KB). Both the max number of bytes per document and which file formats to deep index can be customized per Bucket in the Catalog Admin settings.

Search Bar

The search bar on every page in the catalog provides a convenient shortcut for searching objects and packages in an Amazon S3 bucket.

Quilt uses Elasticsearch 6.7 query string syntax.

The following are all valid search parameters:

Fields

Syntax	Description	Example
`comment`	Package comment	`comment:TODO`
`content`	Object content	`content:Hello`
`ext`	Object extension	`ext:*.fastq.gz`
`handle`	Package name	`handle:examples\/metadata`
`hash`	Package hash	`hash:3192ac1*`
`key`	Object key	`key:phase*`
`key_text`	Analyzed object key	`key:"phase"`
`last_modified`	Last modified date	`last_modified:[2022-02-04 TO 2022-02-20]`
`metadata`	Package metadata	`metadata:dapi`
`size`	Object size in bytes	`size:>=4096`
`version_id`	Object version id	`version_id:t.LVVCx*`
`pointer_file`	Package revision tag in S3; either "latest" or a timestamp	`pointer_file:latest`
`package_stats.total_files`	Package total files	`package_stats.total_files:>100`
`package_stats.total_bytes`	Package total bytes	`package_stats.total_bytes:<100`
`workflow.id`	Package workflow ID	`workflow.id:verify-metadata`

Logical operators and grouping

Syntax	Description	Example
`AND`	Conjunction	`a AND b`
`OR`	Disjunction	`a OR b`
`NOT`	Negation	`NOT a`
`_exists_`	Matches any non-null value for the given field	`_exists_: content`
`()`	Group terms	`(a AND b) NOT c`

Wildcard and regular expressions

Syntax	Description	Example
`*`	Zero or more characters, avoid leading `*` (slows performance)	`ext:config.y*ml`
`?`	Exactly one character	`ext:React.?sx`
`//`	Regular expression (slows performance)	`content:/lmnb[12]/`

QUERIES > ELASTICSEARCH tab

Quilt Elasticsearch queries support the following keys:

index — comma-separated list of indexes to search (learn more)
filter_path — to reducing response nesting, (learn more)
_source — boolean that adds or removes the _source field, or a list of fields to return (learn more)
size — limits the number of hits (learn more)
from — starting offset for pagination (learn more)
body — the search query body as a JSON dictionary (learn more)

Saved queries

You can provide pre-canned queries for your users by providing a configuration file at s3://YOUR_BUCKET/.quilt/queries/config.yaml:

version: "1"
queries:
  query-1:
    name: My first query
    description: Optional description
    url: s3://BUCKET/.quilt/queries/query-1.json
  query-2:
    name: Second query
    url: s3://BUCKET/.quilt/queries/query-2.json

The Quilt catalog displays your saved queries in a drop-down for your users to select, edit, and execute.

Athena

You can park reusable Athena Queries in the Quilt catalog so that your users can run them. You must first set up you an Athena workgroup and Saved queries per AWS's Athena documentation.

Configuration

You can hide the "Queries" tab by setting ui > nav > queries: false. It is also possible to set the default workgroup in ui > athena > defaultWorkgroup: 'your-default-workgroup'. Learn more.

The tab will remember the last workgroup, catalog name and database that was selected.

Basics

"Run query" executes the selected query and waits for the result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SearchQuery.md

SearchQuery.md

Elasticsearch

Indexing

Search Bar

QUERIES > ELASTICSEARCH tab

Saved queries

Athena

Configuration

Basics

Files

SearchQuery.md

Latest commit

History

SearchQuery.md

File metadata and controls

Elasticsearch

Indexing

Search Bar

QUERIES > ELASTICSEARCH tab

Saved queries

Athena

Configuration

Basics