You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, datasets return objects of type Any, and preprocessing and transform functions take in an argument of type Any and return a value of type Any. Generic types should be used so the type of data the dataset returns can be changed upon dataset creation.
Known challenges:
If a dataset has no preprocessor or no transform function, its return type is potentially different. It might be worth looking into function overloads, class methods as alternative constructors for the sake of typing or definint multiple generic typed classes as seen in the example below, e.g.
_T=TypeVar("_T")
classTypedDataset(torch.utils.data.Dataset, Generic[_T]):
def__init__(self, root: Path):
self._root=root@abstractmethoddef__getitem__(self, index: int) ->_T:
raiseNotImplementedError()
_IT=TypeVar("_IT")
_OT=TypeVar("_OT")
classTransformedDataset(TypedDataset[_OT], Generic[_IT, _OT]):
def__init__(self, root: Path, source: TypedDataset[_IT], transform: Callable[[_IT], _OT]):
super().__init__(root)
self._source=sourceself._transform=transformdef__getitem__(self, index: int) ->_OT:
result=self._source[index]
returnself._transform(result)
# Do a similar thing for a `PreprocessedDataset`, except the preprocessor function is applied to all samples on load, data is saved to file for later access and references to the original dataset are not kept.
We want to be able to reuse a GQADataset (for example) for both general data analysis (not preprocessed, optionally transformed) and model training (preprocessed and transformed). This indicates that the dataset factory should be resonsible for creating the original GQADataset and wrapping it with a PreprocessedDataset and then a TransformedDataset if needed for training.
The text was updated successfully, but these errors were encountered:
Overview:
At the moment, datasets return objects of type
Any
, and preprocessing and transform functions take in an argument of typeAny
and return a value of typeAny
. Generic types should be used so the type of data the dataset returns can be changed upon dataset creation.Known challenges:
GQADataset
(for example) for both general data analysis (not preprocessed, optionally transformed) and model training (preprocessed and transformed). This indicates that the dataset factory should be resonsible for creating the originalGQADataset
and wrapping it with aPreprocessedDataset
and then aTransformedDataset
if needed for training.The text was updated successfully, but these errors were encountered: