Skip to content

Commit

Permalink
Delete the obsolete "mqm_viewer" code files, leaving only a README.md…
Browse files Browse the repository at this point in the history
… file that redirects to Marot.

PiperOrigin-RevId: 585453986
  • Loading branch information
vratnakar authored and copybara-github committed Nov 26, 2023
1 parent 9c8dd01 commit b8d46d3
Show file tree
Hide file tree
Showing 6 changed files with 8 additions and 5,671 deletions.
257 changes: 8 additions & 249 deletions mqm_viewer/README.md
Original file line number Diff line number Diff line change
@@ -1,251 +1,10 @@
# MQM Viewer
# MQM Viewer has been renamed "Marot"

This repository contains a web app that can be used to analyze
[Multidimensional Quality Metrics (MQM)](http://www.qt21.eu/mqm-definition/definition-2015-06-16.html)
data from a human evaluation of translation quality. The web app can also
display metrics computed by automated evaluations, such as BLEURT.

To use it, download the files `mqm-viewer.html`, `mqm-viewer.js`,
`mqm-sigtests.js`, and `mqm-viewer.css` to your computer:

```
wget https://raw.githubusercontent.com/google-research/google-research/master/mqm_viewer/mqm-viewer.{html,js,css}
```

Then, simply open the `mqm-viewer.html` file in a web browser, and use
the "Choose files" button to pick one or more MQM data files. MQM data spans
several columns, so it's best to use a desktop or laptop computer with a wide
screen.

A simpler option may be to just download the `mqm-viewer-lite.html` file and
open it in a web browser (it loads the needed JavaScript and CSS files from
a Google-hosted server).

This is not an officially supported Google product.

## Data file format

The data file should have tab-separated UTF-8-encoded data with the following
ten columns, one line per marked error:

- **system**: Name of the translation system.
- **doc**: Name of the document. It's useful to suffix this with language-pair,
(eg., "doc42:English-German"), especially as you may want to view the data
from several evaluations together.
- **docSegId**: Id of segment (sentence or group of sentences) within the
document.
- **globalSegId**: Id of segment across all documents. If you do not have
such numbering available, set this to a constant value, say 0.
- **rater**: Rater who evaluated segment. If this row only carries metadata
such as automated metrics and/or references, then `rater` will be the empty
string (as will be `category` and `severity`).
- **source**: Source text for segment.
- **target**: Translated text for segment.
- **category**: MQM error category (or "no-error").
- **severity**: MQM error severity (or "no-error").
- **metadata**: JSON-formatted object that may contain the following fields,
among others:
- **timestamp**: Time at which this annotation was obtained (milliseconds
since Unix epoch)
- **note**: Free-form text note provided by the rater with some annotations
(notably, with the "Other" error category)
- **corrected_translation**: If the rater provided a corrected translation,
for the segment, it will be included here.
- **source_not_seen**: This will be set to true if this annotation was marked
without the source text of the segment being visible.
- **source_spans**: Array of pairs of 0-based indices (usually just one)
identifying the indices of the first and last source tokens in the marked
span. These indices refer to the source_tokens array in the segment
object.
- **target_spans**: Array of pairs of 0-based indices (usually just one)
identifying the indices of the first and last target tokens in the marked
span. These indices refer to the target_tokens array in the segment
object.
- **marked_text**: The text that has been marked by the rater (or the
empty string if this metadata is not associated with an marked span). This
field is computed from source_spans/target_spans. It can be useful
when filtering.
- **segment**: An object that has information about the segment (from the
current doc+docSegId+system) that is not specific to any particular
annotation/rater. This object may not necessarily be repeated across
multiple ratings for the same segment. The segment object may contain the
following fields:
- **references**: A mapping from names of references to the references
themselves (e.g., {"ref_A": "The reference", "ref_B": "..."}). This
field need not be repeated across different systems.
- **primary_reference**: The name of the primary reference, which is
a key in the "references" mapping (e.g., "ref_A"). This field is
required if "references" is present. This field too need not be repeated
across different systems.
- **metrics**: A dictionary in which the keys are the names of metrics
(such as "Bleurt-X") and values are the numbers for those metrics. The
metric name "MQM" is used for the MQM score. Note that this MQM score
for the segment is computed *without any filtering*.
- **source_tokens**: An array of source text tokens.
- **target_tokens**: An array of target text tokens.
- **source_sentence_tokens**: An array specifying sentence segmentation
in the source segment. Each entry is the number of tokens in one
sentence.
- **target_sentence_tokens**: An array specifying sentence segmentation
in the target segment. Each entry is the number of tokens in one
sentence.
- **starts_paragraph**: A boolean that is true if this segment is the
start of a new paragraph.
- In addition, any text annotation fields present in the input data are
copied here. In [Anthea's data format](https://github.com/google-research/google-research/blob/master/anthea/anthea-help.html),
this would be all the fields present in the optional last column.
- **feedback**: An object optionally present in the metadata of the first
segment of a doc. This captures any feedback the rater may have provided.
It can include a free-form text field (keyed by **notes**) and a string
keyed by **thumbs** that is set to either "up" or "down".
- **evaluation**: An object that has information about the evaluation used.
This field is typically only present in the very first data row, and is
not repeated, in order to save space. This object may contain the following
fields:
- **template**: The name of the template used ("MQM", "MQM-WebPage",
etc.).
- **config**: The configuration parameters that define the template. This
includes "errors" and "severities". Some bulky fields, notably
"instructions" and "description" may have been stripped out from this
object.
- **source_language**, **target_language**: Language codes.
In MQMViewer, each metadata.evaluation object found is logged in the
JavaScript debug console.

The "metadata" column used to be an optional "note" column, and MQM Viewer
continues to support that legacy format. Going forward, the metadata object
may be augmented to contain additional information about the rating/segment.

An optional header line in the data file will be ignored (identified by the
presence of the text "system\tdoc").

Example data files and details on score computations can be found in this
[GitHub repository](https://github.com/google/wmt-mqm-human-evaluation).

## Data format conversion

You can easily add format conversion code that can convert arbitrarily
formatted data (for example, JSON lines from a BLEURT decoder), by adding a
JavaScript function with the following name and behavior:

```
/**
* Transform data (that may be in some custom format) into the MQM data format.
* Pass through the data if no conversion was appropriate or necessary.
* @param {string} sourceName The file name or URL source for the data.
* @param {string} data The original data.
* @return {string} The MQM-data-formatted data.
*/
function mqmDataConvertor(sourceName, data) {
...
return data;
}
```

## Data from URLs

You can pass a `?dataurls=<url1>,...` parameter to MQM Viewer, to load data
from the URLs listed. Note that any URLs have to be hosted on the same site
as the viewer itself, or need to have a CORS exception.

If your domain uses some custom way of storing data (Google uses the CNS file
system, for example) that uses a way to convert data names to URLs, and you wish
to directly pass such data names as URLs (to `?dataurls=`), then you can add a
JavaScript function with the following name and behavior:
```
/**
* Transform a data name (that may be in some custom format) to a URL.
* @param {string} dataName The name or identifier for the data.
* @return {string} The URL from which the data can be loaded.
*/
function mqmURLMaker(dataName) {
/** Code to convert dataName into url */
let url = ...;
return url;
}
```

## Filtering

This web app facilitates interactive slicing and dicing of the data to identify
interesting subsets, to compare translation systems along various dimensions,
etc. The scores shown are always updated to reflect the currently active
filters.

- You can click on any System/Doc/ID/Rater/Category/Severity (or pick
from the drop-down list under the column name) to set its **column
filter** to that specific value.
- You can provide **column filter** regular expressions for filtering
one or more columns, in the input fields provided under the column names.
- You can create sophisticated filters (involving multiple columns, for
example) using a **JavaScript filter expression**.
- This allows you to filter using any expression
involving the columns. It can use the following
variables: **system**, **doc**, **docSegId**,
**globalSegId**, **rater**, **category**, **severity**,
**source**, **target**, **metadata**.
- Filter expressions also have access to three aggregated objects in
variables named **aggrDoc**, **aggrDocSeg**, and **aggrDocSegSys**.
The aggrDocSegSys dict also contains aggrDocSeg (with the key
"aggrDocSeg"), which in turn similarly contains aggrDoc.
- **aggrDoc** has the following properties:
**doc**, **thumbsUpCount**, **thumbsDownCount**.
- **aggrDocSeg** is an object with the following properties:
- **aggrDocSeg.catsBySystem**,
- **aggrDocSeg.catsByRater**,
- **aggrDocSeg.sevsBySystem**,
- **aggrDocSeg.sevsByRater**,
- **aggrDocSeg.sevcatsBySystem**,
- **aggrDocSeg.sevcatsByRater**,
- **aggrDocSeg.source_tokens**,
- **aggrDocSeg.source_sentence_tokens**,
- **aggrDocSeg.starts_paragraph**,
- **aggrDocSeg.references** (if available),
- **aggrDocSeg.primary_reference** (if available),
Each of these properties is an object keyed by system or rater, with the
values being arrays of strings. The "sevcats\*" values look like
"Minor/Fluency/Punctuation" or are just the same as severities if
categories are empty. This segment-level aggregation allows you
to select specific segments rather than just specific error ratings.
- **aggrDocSeg.metrics** is an object keyed by the metric name and then by
system name. It provides the segment's metric scores (including MQM) for
all systems for which a metric is available for that segment.
- **aggrDocSegSys** is just an alias for metadata.segment.
- **Example**: docSegId > 10 || severity == 'Major'
- **Example**: target.indexOf('thethe') >= 0
- **Example**: metadata.marked_text.length >= 10
- **Example**: aggrDocSeg.sevsBySystem['System-42'].includes('Major')
- **Example**: aggrDocSegSys.metrics['MQM'] > 4 &&
(aggrDocSegSys.metrics['BLEURT-X'] ?? 1) < 0.1.
- **Example**: JSON.stringify(aggrDocSeg.sevcatsBySystem).includes('Major/Fl')
- You can examine the metadata associated with any using the **Log metadata**
interface shown in the **Filters** section. This can be useful for crafting
filter expressions.

## Significance tests
When there are multiple systems that have been evaluated on common document
segments, significance tests are run for each pair of systems and the resulting
p-values are displayed in a table. The testing is done via paired one-sided
approximate randomization (PAR), which corresponds to 'alternative="greater"'
in [scipy's API](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.permutation_test.html).

The significance tests are recomputed with any filtering that is applied. The
computations are run in a background Worker thread. The tests include any
available automated metrics in addition to MQM.

## Data Notes
There are some nuances to the data format which are useful to be aware of:

- Marked spans are noted in the source/target text using `<v>...</v>` tags
to enclose them. For example: `The error is <v>here</v>.`
- Except in some legacy data, error spans are also identified at precise
token-level using the `metadata.source_spans` and `metadata.target_spans`
fields.
- Severity and category names come directly from annotation tools and may
have subtle variations (such as lowercase/uppercase differences or
space-underscore changes).
- Error spans may include leading/trailing whitespace if the annotation tool
allows for this, which may or may not be part of the actual errors.
For example, `The error is<v> here</v>.`
The error spans themselves can also be entirely whitespace.
The "MQM Viewer" tool has been renamed "Marot". Marot allows you to view not
just [Multidimensional Quality Metrics
(MQM)](http://www.qt21.eu/mqm-definition/definition-2015-06-16.html) human
evaluations of translation quality, but also automated evaluations, such as
BLEURT.

[Please follow this link to find the Marot
project.](https://github.com/google-research/google-research/tree/master/marot)
Loading

0 comments on commit b8d46d3

Please sign in to comment.