diff --git a/.dockerignore b/.dockerignore
index eb8ef92aab8d..3ee0ef375bc1 100644
--- a/.dockerignore
+++ b/.dockerignore
@@ -1,32 +1,11 @@
-node_modules
-npm-debug.log
-Dockerfile
-.dockerignore
-build
-dist
-docs
-tests
-tox.ini
-public
-label_studio.egg-info
-.git
-.github
-.vscode
-.editorconfig
-.gitignore
-*.md
-!README.md
-*.txt
-!requirements.txt
-*.yml
-*.json
-*.pem
-.python-version
-
-# shell scripts
-update_pypi.sh
+# Ignore everything:
+**
-# misc folders
-tmp
-etc
-my_*
+# Except:
+!images
+!label_studio
+!scripts
+!tools
+!setup.py
+!requirements.txt
+!README.md
\ No newline at end of file
diff --git a/README.md b/README.md
index 13a8d0840013..ca4a4962e9ea 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
# Label Studio · ![GitHub](https://img.shields.io/github/license/heartexlabs/label-studio?logo=heartex) [![Build Status](https://travis-ci.com/heartexlabs/label-studio.svg?branch=master)](https://travis-ci.com/heartexlabs/label-studio) [![codecov](https://codecov.io/gh/heartexlabs/label-studio/branch/master/graph/badge.svg)](https://codecov.io/gh/heartexlabs/label-studio) ![GitHub release](https://img.shields.io/github/v/release/heartexlabs/label-studio?include_prereleases) · :sunny:
-[Website](https://labelstud.io/) • [Docs](https://labelstud.io/guide) • [Twitter](https://twitter.com/heartexlabs) • [Join Slack Community ](https://join.slack.com/t/label-studio/shared_invite/zt-cr8b7ygm-6L45z7biEBw4HXa5A2b5pw)
+[Website](https://labelstud.io/) • [Docs](https://labelstud.io/guide/) • [Twitter](https://twitter.com/heartexlabs) • [Join Slack Community ](https://join.slack.com/t/label-studio/shared_invite/zt-cr8b7ygm-6L45z7biEBw4HXa5A2b5pw)
We send a periodic newsletter announcing new features as well as some ML-related papers, label techniques that are innovative and funny stories.
diff --git a/docs/public/search.xml b/docs/public/search.xml
index 583574d8a8b5..923fbd0fb17a 100644
--- a/docs/public/search.xml
+++ b/docs/public/search.xml
@@ -18,22 +18,22 @@
You can create relations between labeled regions. For example, if you put two bounding boxes, you can connect them with a relation. We’ve extended the functionality to include the direction of the relation, and the possibly label the relation. Here is an example config for that:
<View> <Relations> <Relation value="Is A" /> <Relation value="Has Function" /> <Relation value="Involved In" /> <Relation value="Related To" /> </Relations> <Labels name="lbl-1" toName="txt-1"> <Label value="Subject"></Label> <Label value="Object"></Label> </Labels> <Text name="txt-1" value="$text"></Text></View>
NER got an update, nested entities representation is more apparent now, and it’s optimized to support large texts.
Initial implementation of the image segmentation using masks. You get two controls, brush with configurable size, and eraser. The output format is RLE implemented by rle-pack library.
There is a new template available that provides more information about the setup.
Changing the labels of the existing regions is now easy and supported for any of the data types.
Simple validation to protect you from empty results. When choices or labels are required you can specify required=true
parameter for the
That enables you to build more complex interfaces. Here is an example that puts labels into different groups:
<View> <Choices name="label" toName="audio" required="true" choice="multiple" > <View style="display: flex; flex-direction: row; padding-left: 2em; padding-right: 2em; margin-bottom: 3em"> <View style="padding: 1em 4em; background: rgba(255,0,0,0.1)"> <Header size="4" value="Speaker Gender" /> <Choice value="Business" /> <Choice value="Politics" /> </View> <View style="padding: 1em 4em; background: rgba(255,255,0,0.1)"> <Header size="4" value="Speach Type" /> <Choice value="Legible" /> <Choice value="Slurred" /> </View> <View style="padding: 1em 4em; background: rgba(0,0,255,0.1)"> <Header size="4" value="Additional" /> <Choice value="Echo" /> <Choice value="Noises" /> <Choice value="Music" /> </View> </View> </Choices> <Audio name="audio" value="$url" /></View>
A significant contribution from @lrlunin, implementing ellipses labeling for the images, checkout the template.
zoomControl, brightnessControl and contrastControl for the image tag - zoom has been available for sometime, but now there is an additional toolbar that can be created if one of the above params is provided to the
select each region with shift+alt+number - hotkeys to quickly navigate the regions
settings now show the hotkeys - show the defined and available hotkeys inside the Hotkeys tab in the Settings
simplifying the creation of concave polygons - polygons are not closed unless fully defined, that enables you to create concave polygons easily
HyperText works with its body now you can put in HTML right into the HyperText tag, here is an example config:
<View> <HyperText><h1>Hello</h1></HyperText></View>
Support for Windows, MacOSX, Linux with Python 3.5 or greater
There are now several ways on how you can import your tasks for labeling:
Previously changing a config after importing or labeling tasks could be dangerous because of created tasks/completions invalidation, therefore this was switched off. Now you should not worry about that - labeling config validation is taken on the fly considering the data already created. You can freely change the appearance of your project on setup page and even add new labels - when you modify something crucial, you’ll be alerted about.
When finishing your project - go to the export page and choose in between the common export formats valid for your current project configuration.
Connecting to a running machine learning backend allows you to retrain your model continually and visually inspect how its predictions behave on tasks. Just specify ML backend URL when launching Label Studio, and start labeling.
Now Label Studio is also maintained and distributed as Docker container - run one-liner to build your own cloud labeling solution.
You can launch Label Studio in multisession mode - then each browser session dynamically creates its own project.
]]>You can create relations between labeled regions. For example, if you put two bounding boxes, you can connect them with a relation. We’ve extended the functionality to include the direction of the relation, and the possibly label the relation. Here is an example config for that:
<View> <Relations> <Relation value="Is A" /> <Relation value="Has Function" /> <Relation value="Involved In" /> <Relation value="Related To" /> </Relations> <Labels name="lbl-1" toName="txt-1"> <Label value="Subject"></Label> <Label value="Object"></Label> </Labels> <Text name="txt-1" value="$text"></Text></View>
NER got an update, nested entities representation is more apparent now, and it’s optimized to support large texts.
Initial implementation of the image segmentation using masks. You get two controls, brush with configurable size, and eraser. The output format is RLE implemented by rle-pack library.
There is a new template available that provides more information about the setup.
Changing the labels of the existing regions is now easy and supported for any of the data types.
Simple validation to protect you from empty results. When choices or labels are required you can specify required=true
parameter for the
That enables you to build more complex interfaces. Here is an example that puts labels into different groups:
<View> <Choices name="label" toName="audio" required="true" choice="multiple" > <View style="display: flex; flex-direction: row; padding-left: 2em; padding-right: 2em; margin-bottom: 3em"> <View style="padding: 1em 4em; background: rgba(255,0,0,0.1)"> <Header size="4" value="Speaker Gender" /> <Choice value="Business" /> <Choice value="Politics" /> </View> <View style="padding: 1em 4em; background: rgba(255,255,0,0.1)"> <Header size="4" value="Speach Type" /> <Choice value="Legible" /> <Choice value="Slurred" /> </View> <View style="padding: 1em 4em; background: rgba(0,0,255,0.1)"> <Header size="4" value="Additional" /> <Choice value="Echo" /> <Choice value="Noises" /> <Choice value="Music" /> </View> </View> </Choices> <Audio name="audio" value="$url" /></View>
A significant contribution from @lrlunin, implementing ellipses labeling for the images, checkout the template.
zoomControl, brightnessControl and contrastControl for the image tag - zoom has been available for sometime, but now there is an additional toolbar that can be created if one of the above params is provided to the
select each region with shift+alt+number - hotkeys to quickly navigate the regions
settings now show the hotkeys - show the defined and available hotkeys inside the Hotkeys tab in the Settings
simplifying the creation of concave polygons - polygons are not closed unless fully defined, that enables you to create concave polygons easily
HyperText works with its body now you can put in HTML right into the HyperText tag, here is an example config:
<View> <HyperText><h1>Hello</h1></HyperText></View>
Support for Windows, MacOSX, Linux with Python 3.5 or greater
There are now several ways on how you can import your tasks for labeling:
Previously changing a config after importing or labeling tasks could be dangerous because of created tasks/completions invalidation, therefore this was switched off. Now you should not worry about that - labeling config validation is taken on the fly considering the data already created. You can freely change the appearance of your project on setup page and even add new labels - when you modify something crucial, you’ll be alerted about.
When finishing your project - go to the export page and choose in between the common export formats valid for your current project configuration.
Connecting to a running machine learning backend allows you to retrain your model continually and visually inspect how its predictions behave on tasks. Just specify ML backend URL when launching Label Studio, and start labeling.
Now Label Studio is also maintained and distributed as Docker container - run one-liner to build your own cloud labeling solution.
You can launch Label Studio in multisession mode - then each browser session dynamically creates its own project.
]]>We send a periodic newsletter announcing new features as well as some ML-related papers, label techniques that are innovative and funny stories.
my_project_name/completions
directory, one file per labeled task named as task_id.json
.You can optionally convert and export raw completions to a more common format by doing one of the following:
my_project_name/completions
directoryThe output data is stored in completions - JSON formatted files, one per each completed task saved in project directory in completions
folder or in the "output_dir"
option The example structure of completion is the following:
{ "completions": [ { "id": "1001", "lead_time": 15.053, "result": [ { "from_name": "tag", "id": "Dx_aB91ISN", "source": "$image", "to_name": "img", "type": "rectanglelabels", "value": { "height": 10.458911419423693, "rectanglelabels": [ "Moonwalker" ], "rotation": 0, "width": 12.4, "x": 50.8, "y": 5.869797225186766 } } ] } ], "data": { "image": "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg" }, "id": 1, "predictions": [ { "created_ago": "3 hours", "model_version": "model 1", "result": [ { "from_name": "tag", "id": "t5sp3TyXPo", "source": "$image", "to_name": "img", "type": "rectanglelabels", "value": { "height": 11.612284069097889, "rectanglelabels": [ "Moonwalker" ], "rotation": 0, "width": 39.6, "x": 13.2, "y": 34.702495201535505 } } ] }, { "created_ago": "4 hours", "model_version": "model 2", "result": [ { "from_name": "tag", "id": "t5sp3TyXPo", "source": "$image", "to_name": "img", "type": "rectanglelabels", "value": { "height": 33.61228406909789, "rectanglelabels": [ "Moonwalker" ], "rotation": 0, "width": 39.6, "x": 13.2, "y": 54.702495201535505 } } ] } ]}
That’s where the list of labeling results per one task is stored.
Unique completion identifier
Time in seconds spent to create this completion
Completion result data
Unique completion result identifier
Name of the tag that was used to label region (control tags)
Name of the object tag that provided the region to be labeled (object tags)
Type of the labeling/tag
Tag specific value that includes the labeling result details. The exact structure of value depends on the chosen labeling tag.
Explore each tag for more details.
Data copied from input task
Task identifier
Machine learning predictions (aka pre-labeling results). Follows the same format as completion, with some additional fields related to machine learning inference:
List of items in raw completion format stored in JSON file
List of items where only "from_name", "to_name"
values from raw completion format are kept:
{ "image": "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg", "tag": [{ "height": 10.458911419423693, "rectanglelabels": [ "Moonwalker" ], "rotation": 0, "width": 12.4, "x": 50.8, "y": 5.869797225186766 }]}
Results are stored in comma-separated tabular file with column names specified by "from_name"
"to_name"
values
Results are stored in tab-separated tabular file with column names specified by "from_name"
"to_name"
values
Popular format used for CoNLL-2003 named entity recognition challenge
Popular machine learning format used by COCO dataset for object detection and image segmentation tasks
Popular XML-formatted task data used for object detection and image segmentation tasks
You can use an API to request a file with exported results, e.g.
curl http://localhost:8080/api/export?format=JSON > exported_results.tar.gz
The format
parameter could be one of available export formats
We’re also very interested to learn more from you about your ML pipelines, if you’re interested in having a conversation, please ping us on Slack.
You can configure label studio to synchronize labeling tasks with your s3 or gcp bucket, potentially filtering by a specific prefix or a file extension. Label Studio will take that list and generate pre-signed URLs each time the task is shown to the annotator.
There are several ways how label studio can load the file, either as a URL or as a blob therefore, you can store the list of tasks or the assets themselves and load that.
You can configure it to store the results back to s3/gcp, making Label Studio a part of your data processing pipeline. Read more about the configuration in the docs here.
Finally with a lot of work from Andrew there is an implementation of frontend testing. This will make sure that we don’t break things when we introduce new features. Along with that another Important part — improved building and publishing process, configured CI. Now the npm frontend package will be published along with the pip package.
Introducing a new object tag called “Paragraphs”. A paragraph is a piece of text with potentially additional metadata like the author and the timestamp. With this tag we’re also experimenting now with an idea of providing predefined layouts. For example to label the dialogue you can use the following config: <Paragraphs name=“conversation” value=“$conv” layout=“dialogue” />
This feature is available in the enterprise version only
One limitation label studio had was the ability to use only one shape on the same image, for example, you were able to put either bounding boxes or polygons. Now this limitation is waived and you can define different label groups and connect those to the same image.
There are a couple of ways how you can make sure that the annotation is being performed in full. One of these concepts is a required
flag, and we’ve created a new one called maxUsages
. For some datasets you know how much objects of a particular type there is, therefore you can limit the usage of specific labels.
Label Studio is a self-contained Web application for multi-typed data labeling and exploration. The backend is written in pure Python powered by Flask. The frontend part is a backend-agnostic React + MST app, included as a precompiled script.
Here are the main concepts behind Label Studio’s workflow:
Label Studio is supported for Python 3.5 or greater, running on Linux, Windows and MacOSX.
Note: for Windows users the default installation may fail to build
lxml
package. Consider manually installing it from unofficial Windows binaries e.g. if you are running on x64 with Python 3.8, runpip install lxml‑4.5.0‑cp38‑cp38‑win_amd64.whl
.
To install Label Studio via pip, you need Python>=3.5 and run:
pip install label-studio
Then launch a new project which stores all labeling data in a local directory my_labeling_project
:
label-studio start my_labeling_project --init
The default browser opens automatically at http://localhost:8080.
Label Studio is also distributed as a docker container. Make sure you have Docker installed on your local machine.
Install and start Label Studio at http://localhost:8080 storing all labeling data in ./my_labeling_project
directory:
docker run --rm -p 8080:8080 -v `pwd`/my_labeling_project:/label-studio/my_labeling_project --name label-studio heartexlabs/label-studio:latest
Note: if
./my_labeling_project
the folder exists, an exception will be thrown. Please delete this folder or use--force
option.
Note: for Windows, you have to modify the volumes paths set by-v
option
You can override the default startup command by appending any of available command line arguments:
docker run -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init --force --template image_mixedlabel
If you want to build a local image, run:
docker build -t heartexlabs/label-studio:latest .
If you want to use nighty builds, or extend the functionality, consider to download the source code using Git and run Label Studio locally:
git clone https://github.com/heartexlabs/label-studio.gitcd label-studiopython setup.py develop
Then create a new project, it stores all labeling data in a local directory my_labeling_project
:
label-studio start my_labeling_project --init
The default browser will open automatically at http://localhost:8080.
You can start Label Studio in multisession mode - each browser session creates it’s own project with associated session ID as a name.
In order to launch Label Studio in multisession mode and keep all projects in a separate directory session_projects
, run
label-studio start-multi-session --root-dir ./session_projects
You can specify input tasks, project config, machine learning backend and other options via the command line interface. Run label-studio start --help
to see all available options.
Its repository is located at https://github.com/heartexlabs/label-studio-frontend
npm install label-studio
<!-- Theme included stylesheets --><link href="https://unpkg.com/browse/label-studio@0.4.0/build/static/css/main.14acfaa5.css" rel="stylesheet"><!-- Main Label Studio library --><script src="https://unpkg.com/browse/label-studio@0.4.0/build/static/js/main.0249ea16.js"></script>
Instantiate a new Label Studio object with a selector for the div that should become the editor.
<!-- Include Label Studio stylesheet --><link href="https://unpkg.com/label-studio@0.4.0/build/static/css/main.14acfaa5.css" rel="stylesheet"><!-- Create the Label Studio container --><div id="label-studio"></div><!-- Include the Label Studio library --><script src="https://unpkg.com/label-studio@0.4.0/build/static/js/main.0249ea16.js"></script><!-- Initialize Label Studio --><script> var labelStudio = new LabelStudio('editor', { config: ` <View> <Image name="img" value="$image"></Image> <RectangleLabels name="tag" toName="img"> <Label value="Hello"></Label> <Label value="World"></Label> </RectangleLabels> </View> `, interfaces: [ "panel", "update", "controls", "side-column", "completions:menu", "completions:add-new", "completions:delete", "predictions:menu", ], user: { pk: 1, firstName: "James", lastName: "Dean" }, task: { completions: [], predictions: [], id: 1, data: { image: "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg" } }, onLabelStudioLoad: function(LS) { var c = LS.completionStore.addCompletion({ userGenerate: true }); LS.completionStore.selectCompletion(c.id); } });</script>
You can use Playground to test out different types of config.
To see all the available options for the initialization of LabelStudio, please check the Reference.
]]>my_project_name/completions
directory, one file per labeled task named as task_id.json
.You can optionally convert and export raw completions to a more common format by doing one of the following:
my_project_name/completions
directoryThe output data is stored in completions - JSON formatted files, one per each completed task saved in project directory in completions
folder or in the "output_dir"
option The example structure of completion is the following:
{ "completions": [ { "id": "1001", "lead_time": 15.053, "result": [ { "from_name": "tag", "id": "Dx_aB91ISN", "source": "$image", "to_name": "img", "type": "rectanglelabels", "value": { "height": 10.458911419423693, "rectanglelabels": [ "Moonwalker" ], "rotation": 0, "width": 12.4, "x": 50.8, "y": 5.869797225186766 } } ] } ], "data": { "image": "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg" }, "id": 1, "predictions": [ { "created_ago": "3 hours", "model_version": "model 1", "result": [ { "from_name": "tag", "id": "t5sp3TyXPo", "source": "$image", "to_name": "img", "type": "rectanglelabels", "value": { "height": 11.612284069097889, "rectanglelabels": [ "Moonwalker" ], "rotation": 0, "width": 39.6, "x": 13.2, "y": 34.702495201535505 } } ] }, { "created_ago": "4 hours", "model_version": "model 2", "result": [ { "from_name": "tag", "id": "t5sp3TyXPo", "source": "$image", "to_name": "img", "type": "rectanglelabels", "value": { "height": 33.61228406909789, "rectanglelabels": [ "Moonwalker" ], "rotation": 0, "width": 39.6, "x": 13.2, "y": 54.702495201535505 } } ] } ]}
That’s where the list of labeling results per one task is stored.
Unique completion identifier
Time in seconds spent to create this completion
Completion result data
Unique completion result identifier
Name of the tag that was used to label region (control tags)
Name of the object tag that provided the region to be labeled (object tags)
Type of the labeling/tag
Tag specific value that includes the labeling result details. The exact structure of value depends on the chosen labeling tag.
Explore each tag for more details.
Data copied from input task
Task identifier
Machine learning predictions (aka pre-labeling results). Follows the same format as completion, with some additional fields related to machine learning inference:
List of items in raw completion format stored in JSON file
List of items where only "from_name", "to_name"
values from raw completion format are kept:
{ "image": "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg", "tag": [{ "height": 10.458911419423693, "rectanglelabels": [ "Moonwalker" ], "rotation": 0, "width": 12.4, "x": 50.8, "y": 5.869797225186766 }]}
Results are stored in comma-separated tabular file with column names specified by "from_name"
"to_name"
values
Results are stored in tab-separated tabular file with column names specified by "from_name"
"to_name"
values
Popular format used for CoNLL-2003 named entity recognition challenge
Popular machine learning format used by COCO dataset for object detection and image segmentation tasks
Popular XML-formatted task data used for object detection and image segmentation tasks
You can use an API to request a file with exported results, e.g.
curl http://localhost:8080/api/export?format=JSON > exported_results.tar.gz
The format
parameter could be one of available export formats
Label Studio is a self-contained Web application for multi-typed data labeling and exploration. The backend is written in pure Python powered by Flask. The frontend part is a backend-agnostic React + MST app, included as a precompiled script.
Here are the main concepts behind Label Studio’s workflow:
Label Studio is supported for Python 3.5 or greater, running on Linux, Windows and MacOSX.
Note: for Windows users the default installation may fail to build
lxml
package. Consider manually installing it from unofficial Windows binaries e.g. if you are running on x64 with Python 3.8, runpip install lxml‑4.5.0‑cp38‑cp38‑win_amd64.whl
.
To install Label Studio via pip, you need Python>=3.5 and run:
pip install label-studio
Then launch a new project which stores all labeling data in a local directory my_labeling_project
:
label-studio start my_labeling_project --init
The default browser opens automatically at http://localhost:8080.
Label Studio is also distributed as a docker container. Make sure you have Docker installed on your local machine.
Install and start Label Studio at http://localhost:8080 storing all labeling data in ./my_labeling_project
directory:
docker run --rm -p 8080:8080 -v `pwd`/my_labeling_project:/label-studio/my_labeling_project --name label-studio heartexlabs/label-studio:latest
Note: if
./my_labeling_project
the folder exists, an exception will be thrown. Please delete this folder or use--force
option.
Note: for Windows, you have to modify the volumes paths set by-v
option
You can override the default startup command by appending any of available command line arguments:
docker run -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init --force --template image_mixedlabel
If you want to build a local image, run:
docker build -t heartexlabs/label-studio:latest .
If you want to use nighty builds, or extend the functionality, consider to download the source code using Git and run Label Studio locally:
git clone https://github.com/heartexlabs/label-studio.gitcd label-studiopython setup.py develop
Then create a new project, it stores all labeling data in a local directory my_labeling_project
:
label-studio start my_labeling_project --init
The default browser will open automatically at http://localhost:8080.
You can start Label Studio in multisession mode - each browser session creates it’s own project with associated session ID as a name.
In order to launch Label Studio in multisession mode and keep all projects in a separate directory session_projects
, run
label-studio start-multi-session --root-dir ./session_projects
You can specify input tasks, project config, machine learning backend and other options via the command line interface. Run label-studio start --help
to see all available options.
Its repository is located at https://github.com/heartexlabs/label-studio-frontend
npm install label-studio
<!-- Theme included stylesheets --><link href="https://unpkg.com/browse/label-studio@0.4.0/build/static/css/main.14acfaa5.css" rel="stylesheet"><!-- Main Label Studio library --><script src="https://unpkg.com/browse/label-studio@0.4.0/build/static/js/main.0249ea16.js"></script>
Instantiate a new Label Studio object with a selector for the div that should become the editor.
<!-- Include Label Studio stylesheet --><link href="https://unpkg.com/label-studio@0.4.0/build/static/css/main.14acfaa5.css" rel="stylesheet"><!-- Create the Label Studio container --><div id="label-studio"></div><!-- Include the Label Studio library --><script src="https://unpkg.com/label-studio@0.4.0/build/static/js/main.0249ea16.js"></script><!-- Initialize Label Studio --><script> var labelStudio = new LabelStudio('editor', { config: ` <View> <Image name="img" value="$image"></Image> <RectangleLabels name="tag" toName="img"> <Label value="Hello"></Label> <Label value="World"></Label> </RectangleLabels> </View> `, interfaces: [ "panel", "update", "controls", "side-column", "completions:menu", "completions:add-new", "completions:delete", "predictions:menu", ], user: { pk: 1, firstName: "James", lastName: "Dean" }, task: { completions: [], predictions: [], id: 1, data: { image: "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg" } }, onLabelStudioLoad: function(LS) { var c = LS.completionStore.addCompletion({ userGenerate: true }); LS.completionStore.selectCompletion(c.id); } });</script>
You can use Playground to test out different types of config.
To see all the available options for the initialization of LabelStudio, please check the Reference.
]]>That gives you the opportunities to use:
Here is a quick example tutorial on how to do that with simple text classification:
git clone https://github.com/heartexlabs/label-studio
label-studio-ml init my_ml_backend --script label-studio/ml/examples/simple_text_classifier.py
label-studio-ml start my_ml_backend
label-studio start text_classification_project --init --template text_sentiment --ml-backend-url http://localhost:9090
Check examples in label-studio/ml/examples
directory.
That gives you the opportunities to use:
Here is a quick example tutorial on how to run the ML backend with a simple text classifier:
git clone https://github.com/heartexlabs/label-studio
cd label-studiopip install -e .cd label_studio/ml/examplespip install -r requirements.txt
label-studio-ml init my_ml_backend --script label-studio/ml/examples/simple_text_classifier.py
label-studio-ml start my_ml_backend
label-studio start text_classification_project --init --template text_sentiment --ml-backend-url http://localhost:9090
Check examples in label-studio/ml/examples
directory.
Cloud storage type and bucket need to be configured during the start of the server, and further configured during the runtime via UI.
You can configure one or both:
The connection to both storages is synced, so you can see new tasks after uploading them to the bucket without restarting Label Studio.
The parameters like prefix or matching filename regex could be changed any time from the webapp interface.
To connect your S3 bucket with Label Studio, be sure you have programmatic access enabled. Check this link to learn more how to set up access to your S3 bucket.
The following commands launch Label Studio, configure the connection to your S3 bucket, scan for existing tasks, and load them into the labeling app.
label-studio start --init --source s3 --source-path my-s3-bucket
label-studio start --init --target s3-completions --target-path my-s3-bucket
When you are storing BLOBs in your S3 bucket (like images or audio files), you might want to use then as is, by generating URLs pointing to those objects (e.g. gs://my-s3-bucket/image.jpg
)
Label Studio allows you to generate input tasks with corresponding URLs automatically on-the-fly. You can to this either specifying --source-params
when launching app:
label-studio start --init --source s3 --source-path my-s3-bucket --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true}"
You can leave "data_key"
empty (or skip it at all) then LS generates it automatically with the first task key from label config (it’s useful when you have only one object tag exposed).
You can specify additional parameters with the command line escaped JSON string via --source-params
/ --target-params
or from UI.
Bucket prefix (typically used to specify internal folder/container)
A regular expression for filtering bucket objects
If set true, the local copy of the remote storage will be created.
Generate task data with URLs pointed to your bucket objects(for resources like jpg, mp3, etc). If not selected, bucket objects will be interpreted as tasks in Label Studio JSON format, one object per task.
To connect your GCS bucket with Label Studio, be sure you have enabled programmatic access. Check this link to learn more about how to set up access to your GCS bucket.
The following commands launch Label Studio, configure the connection to your GCS bucket, scan for existing tasks, and load them into the app for the labeling.
label-studio start --init --source gcs --source-path my-gcs-bucket
label-studio start --init --target gcs-completions --source-path my-gcs-bucket
When you are storing BLOBs in your GCS bucket (like images or audio files), you might want to use then as is, by generating URLs pointing to those objects (e.g. gs://my-gcs-bucket/image.jpg
)
Label Studio allows you to generate input tasks with corresponding URLs automatically on-the-fly. You can to this either specifying --source-params
when launching app:
label-studio start --init --source gcs --source-path my-gcs-bucket --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true}"
You can leave "data_key"
empty (or skip it at all) then LS generates it automatically with the first task key from label config (it’s useful when you have only one object tag exposed).
You can specify additional parameters with the command line escaped JSON string via --source-params
/ --target-params
or from UI.
Bucket prefix (typically used to specify internal folder/container)
A regular expression for filtering bucket objects
If set true, the local copy of the remote storage will be created.
Generate task data with URLs pointed to your bucket objects(for resources like jpg, mp3, etc). If not selected, bucket objects will be interpreted as tasks in Label Studio JSON format, one object per task.
]]>Label Studio expects the JSON-formatted list of tasks as input. Each task is a dictionary-like structure, with some specific keys reserved for internal use:
{"key": "value"}
. It is possible to store any number of key-value pairs within task data, but there should be source keys defined by label config (i.e. what is defined by object tag’s attribute value="$key"
).<Text value="$key">
: value
is taken as plain text<HyperText value="$key">
: value
is a HTML markup<HyperText value="$key" encoding="base64">
: value
is a base64 encoded HTML markup<Audio value="$key">
: value
is taken as a valid URL to audio file<AudioPlus value="$key">
: value
is taken as a valid URL to an audio file with CORS policy enabled on the server side<Image value="$key">
: value
is a valid URL to an image fileNote: in case
"data"
field is missing in imported task object, the whole task body is interpreted astask["data"]
, i.e.[{"my_key": "my_value"}]
will be internally converted to[{"data": {"my_key": "my_value"}}]
Here is an example of a config and tasks list composed of one element, for text classification project:
<View> <Text name="message" value="$my_text"/> <Choices name="sentiment_class" toName="message"> <Choice value="Positive"/> <Choice value="Neutral"/> <Choice value="Negative"/> </Choices></View>
[{ # "id" is a reserved field, avoid using it when importing tasks "id": 123, # "data" requires to contain "my_text" field defined by labeling config, # and can optionally include other fields "data": { "my_text": "Opossum is great", "ref_id": 456, "meta_info": { "timestamp": "2020-03-09 18:15:28.212882", "location": "North Pole" } }, # completions are the list of annotation results matched labeling config schema "completions": [{ "result": [{ "from_name": "sentiment_class", "to_name": "message", "type": "choices", "value": { "choices": ["Positive"] } }] }], # "predictions" are pretty similar to "completions" # except that they also include some ML related fields like prediction "score" "predictions": [{ "result": [{ "from_name": "sentiment_class", "to_name": "message", "type": "choices", "value": { "choices": ["Neutral"] } }], # score is used for active learning sampling mode "score": 0.95 }]}]
There are a few possible ways to import data files to your labeling project:
Start Label Studio without specifying input path and then import through the web interfaces available at http://127.0.0.1:8080/import
Initialize Label Studio project and directly specify the paths, e.g. label-studio init --input-path my_tasks.json --input-format json
The --input-path
argument points to a file or a directory where your labeling tasks reside. By default it expects JSON-formatted tasks, but you can also specify all other formats listed bellow by using --input-format
option.
label-studio init --input-path=my_tasks.json
tasks.json
contains tasks in a basic Label Studio JSON format
label-studio init --input-path=dir/with/json/files --input-format=json-dir
Instead of putting all tasks into one file, you can split your input data into several tasks.json, and specify the directory path. Each JSON file contains tasks in a basic Label Studio JSON format.
Note: that if you add more files into the directory then you need to restart Label Studio server.
When CSV / TSV formatted text file is used, column names are interpreted as task data keys:
my_text,optional_fieldthis is a first task,123this is a second task,456
Note: Currently CSV / TSV files could be imported only in UI.
label-studio init --input-path=my_tasks.txt --input-format=text --label-config=config.xml
In a typical scenario, you may use only one input data stream (or in other words only one object tag specified in label config). In this case, you don’t need to use JSON format, but simply write down your values in a plain text file, line by line, e.g.
this is a first taskthis is a second task
label-studio init --input-path=dir/with/text/files --input-format=text-dir --label-config=config.xml
You can split your input data into several plain text files, and specify the directory path. Then Label Studio scans each file line-by-line, creating one task per line. Each plain text file is formatted the same as above.
label-studio init --input-path=dir/with/images --input-format=image-dir --label-config=config.xml --allow-serving-local-files
WARNING: “–allow-serving-local-files” is intended to use only for locally running instances: avoid using it for remote servers unless you are sure what you’re doing.
You can point to a local directory, which is scanned recursively for image files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows:
http://<host:port>/data/filename?d=<path/to/the/local/directory>
Supported formats are: .png
.jpg
.jpeg
.tiff
.bmp
.gif
label-studio init --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml --allow-serving-local-files
WARNING: “–allow-serving-local-files” is intended to use only for locally running instances: avoid using it for remote servers unless you are sure what you’re doing.
You can point to a local directory, which is scanned recursively for audio files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows:
http://<host:port>/data/filename?d=<path/to/the/local/directory>
Supported formats are: .wav
.aiff
.mp3
.au
.flac
Use API to import tasks in Label Studio basic format if for any reason you can’t access either a local filesystem nor Web UI (e.g. if you are creating a data stream)
curl -X POST -H Content-Type:application/json http://localhost:8080/api/import \--data "[{\"my_key\": \"my_value_1\"}, {\"my_key\": \"my_value_2\"}]"
You can define the way of how your imported tasks are exposed to annotators. Several options are available. To enable one of them, specify --sampling=<option>
as command line option.
Tasks are ordered ascending by their "id"
fields. This is default mode.
Tasks are sampled with equal probabilities.
Task with minimum average prediction score is taken. When this option is set, task["predictions"]
list should be presented along with "score"
field within each prediction.
Task with maximum average prediction score is taken. When this option is set, task["predictions"]
list should be presented along with "score"
field within each prediction.
...
...
...
...
Param | Type | Default | Description |
---|---|---|---|
name | string | name of the element | |
toName | string | name of the image to label | |
[opacity] | float | 0.6 | opacity of rectangle |
[fillColor] | string | ellipse fill color, default is transparent | |
[strokeColor] | string | stroke color | |
[strokeWidth] | number | 1 | width of stroke |
[canRotate] | boolean | true | show or hide rotation handle |
<View> <EllipseLabels name="labels" toName="image"> <Label value="Person" /> <Label value="Animal" /> </EllipseLabels> <Image name="image" value="$image" /></View>
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of the element | |
toName | string | name of the image to label | |
[opacity] | float | 0.6 | opacity of ellipse |
[fillColor] | string | rectangle fill color, default is transparent | |
[strokeColor] | string | "#f48a42" | stroke color |
[strokeWidth] | number | 1 | width of the stroke |
[canRotate] | boolean | true | show or hide rotation handle |
<View> <Ellipse name="ellipse1-1" toName="img-1" /> <Image name="img-1" value="$img" /></View>
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of the element | |
toName | string | name of the image to label | |
[opacity] | float | 0.6 | opacity of rectangle |
[fillColor] | string | ellipse fill color, default is transparent | |
[strokeColor] | string | stroke color | |
[strokeWidth] | number | 1 | width of stroke |
[canRotate] | boolean | true | show or hide rotation handle |
<View> <EllipseLabels name="labels" toName="image"> <Label value="Person" /> <Label value="Animal" /> </EllipseLabels> <Image name="image" value="$image" /></View>
]]>Label Studio expects the JSON-formatted list of tasks as input. Each task is a dictionary-like structure, with some specific keys reserved for internal use:
{"key": "value"}
. It is possible to store any number of key-value pairs within task data, but there should be source keys defined by label config (i.e. what is defined by object tag’s attribute value="$key"
).<Text value="$key">
: value
is taken as plain text<HyperText value="$key">
: value
is a HTML markup<HyperText value="$key" encoding="base64">
: value
is a base64 encoded HTML markup<Audio value="$key">
: value
is taken as a valid URL to audio file<AudioPlus value="$key">
: value
is taken as a valid URL to an audio file with CORS policy enabled on the server side<Image value="$key">
: value
is a valid URL to an image fileNote: in case
"data"
field is missing in imported task object, the whole task body is interpreted astask["data"]
, i.e.[{"my_key": "my_value"}]
will be internally converted to[{"data": {"my_key": "my_value"}}]
Here is an example of a config and tasks list composed of one element, for text classification project:
<View> <Text name="message" value="$my_text"/> <Choices name="sentiment_class" toName="message"> <Choice value="Positive"/> <Choice value="Neutral"/> <Choice value="Negative"/> </Choices></View>
[{ # "id" is a reserved field, avoid using it when importing tasks "id": 123, # "data" requires to contain "my_text" field defined by labeling config, # and can optionally include other fields "data": { "my_text": "Opossum is great", "ref_id": 456, "meta_info": { "timestamp": "2020-03-09 18:15:28.212882", "location": "North Pole" } }, # completions are the list of annotation results matched labeling config schema "completions": [{ "result": [{ "from_name": "sentiment_class", "to_name": "message", "type": "choices", "value": { "choices": ["Positive"] } }] }], # "predictions" are pretty similar to "completions" # except that they also include some ML related fields like prediction "score" "predictions": [{ "result": [{ "from_name": "sentiment_class", "to_name": "message", "type": "choices", "value": { "choices": ["Neutral"] } }], "score": 0.95 }]}]
There are a few possible ways to import data files to your labeling project:
Start Label Studio without specifying input path and then import through the web interfaces available at http://127.0.0.1:8080/import
Initialize Label Studio project and directly specify the paths, e.g. label-studio init --input-path my_tasks.json --input-format json
The --input-path
argument points to a file or a directory where your labeling tasks reside. By default it expects JSON-formatted tasks, but you can also specify all other formats listed bellow by using --input-format
option.
label-studio init --input-path=my_tasks.json
tasks.json
contains tasks in a basic Label Studio JSON format
label-studio init --input-path=dir/with/json/files --input-format=json-dir
Instead of putting all tasks into one file, you can split your input data into several tasks.json, and specify the directory path. Each JSON file contains tasks in a basic Label Studio JSON format.
Note: that if you add more files into the directory then you need to restart Label Studio server.
When CSV / TSV formatted text file is used, column names are interpreted as task data keys:
my_text,optional_fieldthis is a first task,123this is a second task,456
Note: Currently CSV / TSV files could be imported only in UI.
label-studio init --input-path=my_tasks.txt --input-format=text --label-config=config.xml
In a typical scenario, you may use only one input data stream (or in other words only one object tag specified in label config). In this case, you don’t need to use JSON format, but simply write down your values in a plain text file, line by line, e.g.
this is a first taskthis is a second task
label-studio init --input-path=dir/with/text/files --input-format=text-dir --label-config=config.xml
You can split your input data into several plain text files, and specify the directory path. Then Label Studio scans each file line-by-line, creating one task per line. Each plain text file is formatted the same as above.
label-studio init --input-path=dir/with/images --input-format=image-dir --label-config=config.xml
You can point to a local directory, which is scanned recursively for image files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows:
http://<host:port>/static/filename?d=<path/to/the/local/directory>
Supported formats are: .png
.jpg
.jpeg
.tiff
.bmp
.gif
label-studio init --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml
You can point to a local directory, which is scanned recursively for audio files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows:
http://<host:port>/static/filename?d=<path/to/the/local/directory>
Supported formats are: .wav
.aiff
.mp3
.au
.flac
Use API to import tasks in Label Studio basic format if for any reason you can’t access either a local filesystem nor Web UI (e.g. if you are creating a data stream)
curl -X POST -H Content-Type:application/json http://localhost:8080/api/import \--data "[{\"my_key\": \"my_value_1\"}, {\"my_key\": \"my_value_2\"}]"
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of tag | |
toName | string | name of image to label | |
[opacity] | number | 0.6 | opacity of polygon |
[fillColor] | string | rectangle fill color, default is transparent | |
[strokeColor] | string | stroke color | |
[strokeWidth] | number | 1 | width of stroke |
[pointSize] | small | medium | large | medium | size of polygon handle points |
[pointStyle] | rectangle | circle | rectangle | style of points |
<View> <Image name="image" value="$image" /> <PolygonLabels name="lables" toName="image"> <Label value="Car" /> <Label value="Sign" /> </PolygonLabels></View>
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of tag | |
toname | string | name of image to label | |
[opacity] | number | 0.6 | opacity of polygon |
[fillColor] | string | rectangle fill color, default is transparent | |
[strokeColor] | string | stroke color | |
[strokeWidth] | number | 1 | width of stroke |
[pointSize] | small | medium | large | medium | size of polygon handle points |
[pointStyle] | rectangle | circle | circle | style of points |
<View> <Polygon name="rect-1" toName="img-1" /> <Image name="img-1" value="$img" /></View>
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of tag | |
toname | string | name of image to label | |
[opacity] | number | 0.6 | opacity of polygon |
[fillColor] | string | rectangle fill color, default is transparent | |
[strokeColor] | string | stroke color | |
[strokeWidth] | number | 1 | width of stroke |
[pointSize] | small | medium | large | medium | size of polygon handle points |
[pointStyle] | rectangle | circle | circle | style of points |
<View> <Polygon name="rect-1" toName="img-1" /> <Image name="img-1" value="$img" /></View>
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of tag | |
toName | string | name of image to label | |
[opacity] | number | 0.6 | opacity of polygon |
[fillColor] | string | rectangle fill color, default is transparent | |
[strokeColor] | string | stroke color | |
[strokeWidth] | number | 1 | width of stroke |
[pointSize] | small | medium | large | medium | size of polygon handle points |
[pointStyle] | rectangle | circle | rectangle | style of points |
<View> <Image name="image" value="$image" /> <PolygonLabels name="lables" toName="image"> <Label value="Car" /> <Label value="Sign" /> </PolygonLabels></View>
]]>Param | Type | Description |
---|---|---|
value | string | value of the relation |
[background] | string | background color of active label |
<View> <Relations> <Relation value="Name 1" /> <Relation value="Name 2" /> </Relations></View>
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of the element | |
toName | string | name of the image to label | |
[opacity] | float | 0.6 | opacity of rectangle |
[fillColor] | string | rectangle fill color, default is transparent | |
[strokeColor] | string | stroke color | |
[strokeWidth] | number | 1 | width of stroke |
[canRotate] | boolean | true | show or hide rotation handle |
<View> <RectangleLabels name="labels" toName="image"> <Label value="Person" /> <Label value="Animal" /> </RectangleLabels> <Image name="image" value="$image" /></View>
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of the element | |
toName | string | name of the image to label | |
[opacity] | float | 0.6 | opacity of rectangle |
[fillColor] | string | rectangle fill color, default is transparent | |
[strokeColor] | string | stroke color | |
[strokeWidth] | number | 1 | width of stroke |
[canRotate] | boolean | true | show or hide rotation handle |
<View> <RectangleLabels name="labels" toName="image"> <Label value="Person" /> <Label value="Animal" /> </RectangleLabels> <Image name="image" value="$image" /></View>
]]>Param | Type | Description |
---|---|---|
value | string | value of the relation |
[background] | string | background color of active label |
<View> <Relations> <Relation value="Name 1" /> <Relation value="Name 2" /> </Relations></View>
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of the element | |
value | string | value of the element | |
[selectionEnabled] | boolean | true | enable or disable selection |
[highlightColor] | string | hex string with highlight color, if not provided uses the labels color | |
[granularity] | symbol | word | symbol | control per symbol or word selection |
[showLabels] | boolean | true | show labels next to the region |
[encoding] | string | "string | base64" |
<Text name="text-1" value="$text" granularity="symbol" highlightColor="#ff0000" />
]]>value
[string] <View> <Table name="text-1" value="$text"></Table></View>
]]>value
[string] <View> <Table name="text-1" value="$text"></Table></View>
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of the element | |
value | string | value of the element | |
[selectionEnabled] | boolean | true | enable or disable selection |
[highlightColor] | string | hex string with highlight color, if not provided uses the labels color | |
[granularity] | symbol | word | symbol | control per symbol or word selection |
[showLabels] | boolean | true | show labels next to the region |
[encoding] | string | "string | base64" |
<Text name="text-1" value="$text" granularity="symbol" highlightColor="#ff0000" />
]]>Param | Type | Default | Description |
---|---|---|---|
name | string | name of the element | |
toName | string | name of the image to label | |
[opacity] | float | 0.6 | opacity of ellipse |
[fillColor] | string | rectangle fill color, default is transparent | |
[strokeColor] | string | "#f48a42" | stroke color |
[strokeWidth] | number | 1 | width of the stroke |
[canRotate] | boolean | true | show or hide rotation handle |
<View> <Ellipse name="ellipse1-1" toName="img-1" /> <Image name="img-1" value="$img" /></View>
]]>