Skip to content

Commit

Permalink
Add documentation for framework conversion (#1659)
Browse files Browse the repository at this point in the history
- Update documentation for framework conversion
- Add same notebook in kaggle : [kaggle notebook](https://www.kaggle.com/code/sooahleex/fer-2013-dataset-training-with-datumaro/notebook)
  • Loading branch information
sooahleex authored Oct 28, 2024
1 parent a29c23b commit 028f9f7
Show file tree
Hide file tree
Showing 9 changed files with 530 additions and 9 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
(<https://github.com/openvinotoolkit/datumaro/pull/1635>)
- Optimize path assignment to handle point cloud in JSON without images
(<https://github.com/openvinotoolkit/datumaro/pull/1643>)
- Add documentation for framework conversion
(<https://github.com/openvinotoolkit/datumaro/pull/1659>)

### Bug fixes
- Fix assertion to compare hashkeys against expected value
Expand Down
8 changes: 8 additions & 0 deletions docs/source/docs/jupyter_notebook_examples/e2e_example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Here we provide E2E examples from Datumaro to model trainers.
notebooks/10_noisy_label_detection_cls
notebooks/13_noisy_label_detection_det
notebooks/16_missing_annotation_detection
notebooks/22_framework_converter

.. grid:: 1 2 2 2
:gutter: 2
Expand Down Expand Up @@ -42,3 +43,10 @@ Here we provide E2E examples from Datumaro to model trainers.
:color: primary
:outline:
:expand:

.. grid-item-card::

.. button-ref:: notebooks/22_framework_converter
:color: primary
:outline:
:expand:
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
============================
Level 12: Project Versioning
Level 13: Project Versioning
============================

Project versioning is a concept unique to Datumaro. Datumaro project includes a data source and revision tree,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=================================
Level 13: Pseudo Label Generation
Level 14: Pseudo Label Generation
=================================

TBD
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=====================================================
Level 14: Dataset Pruning
Level 15: Dataset Pruning
=====================================================


Expand Down
12 changes: 6 additions & 6 deletions docs/source/docs/level-up/advanced_skills/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@ Advanced Skills
:maxdepth: 1
:hidden:

12_project_versioning
13_pseudo_label_generation
14_data_pruning
13_project_versioning
14_pseudo_label_generation
15_data_pruning

.. grid:: 1 2 2 2
:gutter: 2

.. grid-item-card::

.. button-ref:: 12_project_versioning
.. button-ref:: 13_project_versioning
:color: primary
:outline:
:expand:
Expand All @@ -25,7 +25,7 @@ Advanced Skills

.. grid-item-card::

.. button-ref:: 13_pseudo_label_generation
.. button-ref:: 14_pseudo_label_generation
:color: primary
:outline:
:expand:
Expand All @@ -36,7 +36,7 @@ Advanced Skills

.. grid-item-card::

.. button-ref:: 14_data_pruning
.. button-ref:: 15_data_pruning
:color: primary
:outline:
:expand:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
============================
Level 12: Framework Conversion
============================

Datumaro allows seamless conversion of datasets to popular deep learning frameworks, such as PyTorch and TensorFlow.
This is particularly useful when you are working with a dataset that needs to be used across different frameworks
without manual reformatting.

Datumaro provides the FrameworkConverter class, which can be used to convert a dataset for various tasks
like classification, detection, and segmentation.

Supported Tasks
- Classification
- Multilabel Classification
- Detection
- Instance Segmentation
- Semantic Segmentation
- Tabular Data

.. tab-set::

.. tab-item:: Python

With the PyTorch framework, you can convert a Datumaro dataset like this:

.. code-block:: python
from datumaro.plugins.framework_converter import FrameworkConverter
from torchvision import transforms
transform = transforms.Compose([transforms.ToTensor()])
dm_dataset = ... # Load your dataset here
First, we have to specify the dataset, subset, and task

.. code-block:: python
multi_framework_dataset = FrameworkConverter(dm_dataset, subset="train", task="classification")
train_dataset = multi_framework_dataset.to_framework(framework="torch", transform=transform)
Through this, we convert the dataset to PyTorch format

.. code-block:: python
from torch.utils.data import DataLoader
train_loader = DataLoader(train_dataset, batch_size=32)
Now we can use the train_dataset with PyTorch DataLoader

In this example:

- `subset="train"` indicates that we are working with the training portion of the dataset.

- `task="classification"` specifies that this is a classification task.

- The dataset is converted to PyTorch-compatible format using the `to_framework` method.
12 changes: 12 additions & 0 deletions docs/source/docs/level-up/intermediate_skills/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Intermediate Skills
09_data_filtering
10_data_exploration
11_data_generation
12_framework_conversion

.. grid:: 1 2 2 2
:gutter: 2
Expand Down Expand Up @@ -102,3 +103,14 @@ Intermediate Skills

:bdg-info:`CLI`
:bdg-warning:`Python`

.. grid-item-card::

.. button-ref:: 12_framework_conversion
:color: primary
:outline:
:expand:

Level 12: Framework Conversion

:bdg-warning:`Python`
Loading

0 comments on commit 028f9f7

Please sign in to comment.