forked from google-research/google-research
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Delete the obsolete "mqm_viewer" code files, leaving only a README.md…
… file that redirects to Marot. PiperOrigin-RevId: 585453986
- Loading branch information
1 parent
9c8dd01
commit b8d46d3
Showing
6 changed files
with
8 additions
and
5,671 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,251 +1,10 @@ | ||
# MQM Viewer | ||
# MQM Viewer has been renamed "Marot" | ||
|
||
This repository contains a web app that can be used to analyze | ||
[Multidimensional Quality Metrics (MQM)](http://www.qt21.eu/mqm-definition/definition-2015-06-16.html) | ||
data from a human evaluation of translation quality. The web app can also | ||
display metrics computed by automated evaluations, such as BLEURT. | ||
|
||
To use it, download the files `mqm-viewer.html`, `mqm-viewer.js`, | ||
`mqm-sigtests.js`, and `mqm-viewer.css` to your computer: | ||
|
||
``` | ||
wget https://raw.githubusercontent.com/google-research/google-research/master/mqm_viewer/mqm-viewer.{html,js,css} | ||
``` | ||
|
||
Then, simply open the `mqm-viewer.html` file in a web browser, and use | ||
the "Choose files" button to pick one or more MQM data files. MQM data spans | ||
several columns, so it's best to use a desktop or laptop computer with a wide | ||
screen. | ||
|
||
A simpler option may be to just download the `mqm-viewer-lite.html` file and | ||
open it in a web browser (it loads the needed JavaScript and CSS files from | ||
a Google-hosted server). | ||
|
||
This is not an officially supported Google product. | ||
|
||
## Data file format | ||
|
||
The data file should have tab-separated UTF-8-encoded data with the following | ||
ten columns, one line per marked error: | ||
|
||
- **system**: Name of the translation system. | ||
- **doc**: Name of the document. It's useful to suffix this with language-pair, | ||
(eg., "doc42:English-German"), especially as you may want to view the data | ||
from several evaluations together. | ||
- **docSegId**: Id of segment (sentence or group of sentences) within the | ||
document. | ||
- **globalSegId**: Id of segment across all documents. If you do not have | ||
such numbering available, set this to a constant value, say 0. | ||
- **rater**: Rater who evaluated segment. If this row only carries metadata | ||
such as automated metrics and/or references, then `rater` will be the empty | ||
string (as will be `category` and `severity`). | ||
- **source**: Source text for segment. | ||
- **target**: Translated text for segment. | ||
- **category**: MQM error category (or "no-error"). | ||
- **severity**: MQM error severity (or "no-error"). | ||
- **metadata**: JSON-formatted object that may contain the following fields, | ||
among others: | ||
- **timestamp**: Time at which this annotation was obtained (milliseconds | ||
since Unix epoch) | ||
- **note**: Free-form text note provided by the rater with some annotations | ||
(notably, with the "Other" error category) | ||
- **corrected_translation**: If the rater provided a corrected translation, | ||
for the segment, it will be included here. | ||
- **source_not_seen**: This will be set to true if this annotation was marked | ||
without the source text of the segment being visible. | ||
- **source_spans**: Array of pairs of 0-based indices (usually just one) | ||
identifying the indices of the first and last source tokens in the marked | ||
span. These indices refer to the source_tokens array in the segment | ||
object. | ||
- **target_spans**: Array of pairs of 0-based indices (usually just one) | ||
identifying the indices of the first and last target tokens in the marked | ||
span. These indices refer to the target_tokens array in the segment | ||
object. | ||
- **marked_text**: The text that has been marked by the rater (or the | ||
empty string if this metadata is not associated with an marked span). This | ||
field is computed from source_spans/target_spans. It can be useful | ||
when filtering. | ||
- **segment**: An object that has information about the segment (from the | ||
current doc+docSegId+system) that is not specific to any particular | ||
annotation/rater. This object may not necessarily be repeated across | ||
multiple ratings for the same segment. The segment object may contain the | ||
following fields: | ||
- **references**: A mapping from names of references to the references | ||
themselves (e.g., {"ref_A": "The reference", "ref_B": "..."}). This | ||
field need not be repeated across different systems. | ||
- **primary_reference**: The name of the primary reference, which is | ||
a key in the "references" mapping (e.g., "ref_A"). This field is | ||
required if "references" is present. This field too need not be repeated | ||
across different systems. | ||
- **metrics**: A dictionary in which the keys are the names of metrics | ||
(such as "Bleurt-X") and values are the numbers for those metrics. The | ||
metric name "MQM" is used for the MQM score. Note that this MQM score | ||
for the segment is computed *without any filtering*. | ||
- **source_tokens**: An array of source text tokens. | ||
- **target_tokens**: An array of target text tokens. | ||
- **source_sentence_tokens**: An array specifying sentence segmentation | ||
in the source segment. Each entry is the number of tokens in one | ||
sentence. | ||
- **target_sentence_tokens**: An array specifying sentence segmentation | ||
in the target segment. Each entry is the number of tokens in one | ||
sentence. | ||
- **starts_paragraph**: A boolean that is true if this segment is the | ||
start of a new paragraph. | ||
- In addition, any text annotation fields present in the input data are | ||
copied here. In [Anthea's data format](https://github.com/google-research/google-research/blob/master/anthea/anthea-help.html), | ||
this would be all the fields present in the optional last column. | ||
- **feedback**: An object optionally present in the metadata of the first | ||
segment of a doc. This captures any feedback the rater may have provided. | ||
It can include a free-form text field (keyed by **notes**) and a string | ||
keyed by **thumbs** that is set to either "up" or "down". | ||
- **evaluation**: An object that has information about the evaluation used. | ||
This field is typically only present in the very first data row, and is | ||
not repeated, in order to save space. This object may contain the following | ||
fields: | ||
- **template**: The name of the template used ("MQM", "MQM-WebPage", | ||
etc.). | ||
- **config**: The configuration parameters that define the template. This | ||
includes "errors" and "severities". Some bulky fields, notably | ||
"instructions" and "description" may have been stripped out from this | ||
object. | ||
- **source_language**, **target_language**: Language codes. | ||
In MQMViewer, each metadata.evaluation object found is logged in the | ||
JavaScript debug console. | ||
|
||
The "metadata" column used to be an optional "note" column, and MQM Viewer | ||
continues to support that legacy format. Going forward, the metadata object | ||
may be augmented to contain additional information about the rating/segment. | ||
|
||
An optional header line in the data file will be ignored (identified by the | ||
presence of the text "system\tdoc"). | ||
|
||
Example data files and details on score computations can be found in this | ||
[GitHub repository](https://github.com/google/wmt-mqm-human-evaluation). | ||
|
||
## Data format conversion | ||
|
||
You can easily add format conversion code that can convert arbitrarily | ||
formatted data (for example, JSON lines from a BLEURT decoder), by adding a | ||
JavaScript function with the following name and behavior: | ||
|
||
``` | ||
/** | ||
* Transform data (that may be in some custom format) into the MQM data format. | ||
* Pass through the data if no conversion was appropriate or necessary. | ||
* @param {string} sourceName The file name or URL source for the data. | ||
* @param {string} data The original data. | ||
* @return {string} The MQM-data-formatted data. | ||
*/ | ||
function mqmDataConvertor(sourceName, data) { | ||
... | ||
return data; | ||
} | ||
``` | ||
|
||
## Data from URLs | ||
|
||
You can pass a `?dataurls=<url1>,...` parameter to MQM Viewer, to load data | ||
from the URLs listed. Note that any URLs have to be hosted on the same site | ||
as the viewer itself, or need to have a CORS exception. | ||
|
||
If your domain uses some custom way of storing data (Google uses the CNS file | ||
system, for example) that uses a way to convert data names to URLs, and you wish | ||
to directly pass such data names as URLs (to `?dataurls=`), then you can add a | ||
JavaScript function with the following name and behavior: | ||
``` | ||
/** | ||
* Transform a data name (that may be in some custom format) to a URL. | ||
* @param {string} dataName The name or identifier for the data. | ||
* @return {string} The URL from which the data can be loaded. | ||
*/ | ||
function mqmURLMaker(dataName) { | ||
/** Code to convert dataName into url */ | ||
let url = ...; | ||
return url; | ||
} | ||
``` | ||
|
||
## Filtering | ||
|
||
This web app facilitates interactive slicing and dicing of the data to identify | ||
interesting subsets, to compare translation systems along various dimensions, | ||
etc. The scores shown are always updated to reflect the currently active | ||
filters. | ||
|
||
- You can click on any System/Doc/ID/Rater/Category/Severity (or pick | ||
from the drop-down list under the column name) to set its **column | ||
filter** to that specific value. | ||
- You can provide **column filter** regular expressions for filtering | ||
one or more columns, in the input fields provided under the column names. | ||
- You can create sophisticated filters (involving multiple columns, for | ||
example) using a **JavaScript filter expression**. | ||
- This allows you to filter using any expression | ||
involving the columns. It can use the following | ||
variables: **system**, **doc**, **docSegId**, | ||
**globalSegId**, **rater**, **category**, **severity**, | ||
**source**, **target**, **metadata**. | ||
- Filter expressions also have access to three aggregated objects in | ||
variables named **aggrDoc**, **aggrDocSeg**, and **aggrDocSegSys**. | ||
The aggrDocSegSys dict also contains aggrDocSeg (with the key | ||
"aggrDocSeg"), which in turn similarly contains aggrDoc. | ||
- **aggrDoc** has the following properties: | ||
**doc**, **thumbsUpCount**, **thumbsDownCount**. | ||
- **aggrDocSeg** is an object with the following properties: | ||
- **aggrDocSeg.catsBySystem**, | ||
- **aggrDocSeg.catsByRater**, | ||
- **aggrDocSeg.sevsBySystem**, | ||
- **aggrDocSeg.sevsByRater**, | ||
- **aggrDocSeg.sevcatsBySystem**, | ||
- **aggrDocSeg.sevcatsByRater**, | ||
- **aggrDocSeg.source_tokens**, | ||
- **aggrDocSeg.source_sentence_tokens**, | ||
- **aggrDocSeg.starts_paragraph**, | ||
- **aggrDocSeg.references** (if available), | ||
- **aggrDocSeg.primary_reference** (if available), | ||
Each of these properties is an object keyed by system or rater, with the | ||
values being arrays of strings. The "sevcats\*" values look like | ||
"Minor/Fluency/Punctuation" or are just the same as severities if | ||
categories are empty. This segment-level aggregation allows you | ||
to select specific segments rather than just specific error ratings. | ||
- **aggrDocSeg.metrics** is an object keyed by the metric name and then by | ||
system name. It provides the segment's metric scores (including MQM) for | ||
all systems for which a metric is available for that segment. | ||
- **aggrDocSegSys** is just an alias for metadata.segment. | ||
- **Example**: docSegId > 10 || severity == 'Major' | ||
- **Example**: target.indexOf('thethe') >= 0 | ||
- **Example**: metadata.marked_text.length >= 10 | ||
- **Example**: aggrDocSeg.sevsBySystem['System-42'].includes('Major') | ||
- **Example**: aggrDocSegSys.metrics['MQM'] > 4 && | ||
(aggrDocSegSys.metrics['BLEURT-X'] ?? 1) < 0.1. | ||
- **Example**: JSON.stringify(aggrDocSeg.sevcatsBySystem).includes('Major/Fl') | ||
- You can examine the metadata associated with any using the **Log metadata** | ||
interface shown in the **Filters** section. This can be useful for crafting | ||
filter expressions. | ||
|
||
## Significance tests | ||
When there are multiple systems that have been evaluated on common document | ||
segments, significance tests are run for each pair of systems and the resulting | ||
p-values are displayed in a table. The testing is done via paired one-sided | ||
approximate randomization (PAR), which corresponds to 'alternative="greater"' | ||
in [scipy's API](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.permutation_test.html). | ||
|
||
The significance tests are recomputed with any filtering that is applied. The | ||
computations are run in a background Worker thread. The tests include any | ||
available automated metrics in addition to MQM. | ||
|
||
## Data Notes | ||
There are some nuances to the data format which are useful to be aware of: | ||
|
||
- Marked spans are noted in the source/target text using `<v>...</v>` tags | ||
to enclose them. For example: `The error is <v>here</v>.` | ||
- Except in some legacy data, error spans are also identified at precise | ||
token-level using the `metadata.source_spans` and `metadata.target_spans` | ||
fields. | ||
- Severity and category names come directly from annotation tools and may | ||
have subtle variations (such as lowercase/uppercase differences or | ||
space-underscore changes). | ||
- Error spans may include leading/trailing whitespace if the annotation tool | ||
allows for this, which may or may not be part of the actual errors. | ||
For example, `The error is<v> here</v>.` | ||
The error spans themselves can also be entirely whitespace. | ||
The "MQM Viewer" tool has been renamed "Marot". Marot allows you to view not | ||
just [Multidimensional Quality Metrics | ||
(MQM)](http://www.qt21.eu/mqm-definition/definition-2015-06-16.html) human | ||
evaluations of translation quality, but also automated evaluations, such as | ||
BLEURT. | ||
|
||
[Please follow this link to find the Marot | ||
project.](https://github.com/google-research/google-research/tree/master/marot) |
Oops, something went wrong.