Skip to content

Sparsh Self-supervised touch representations for vision-based tactile sensing

License

Notifications You must be signed in to change notification settings

facebookresearch/sparsh

Repository files navigation

Sparsh: Self-supervised touch representations for vision-based tactile sensing

Carolina Higuera*, Akash Sharma*, Chaithanya Krishna Bodduluri, Taosha Fan, Patrick Lancaster, Mrinal Kalakrishnan, Michael Kaess, Byron Boots, Mike Lambeta, Tingfan Wu, Mustafa Mukadam

*Equal contribution

AI at Meta, FAIR; The Robotics Institute, CMU; University of Washington

drawing

Sparsh is a family of general touch representations trained via self-supervision algorithms such as MAE, DINO and JEPA. Sparsh is able to generate useful representations for DIGIT, Gelsight'17 and Gelsight Mini. It outperforms end-to-end models in the downstream tasks proposed in TacBench by a large margin, and can enable data efficient training for new downstream tasks.

This repository contains the pytorch implementation, pre-trained models, and datasets released with Sparsh.

animated

πŸ› οΈInstallation and setup

Clone this repository:

git clone https://github.com/facebookresearch/sparsh.git
cd sparsh

and create a conda environment with dependencies:

mamba env create -f environment.yml
mamba activate tactile_ssl

πŸš€ Pretrained models

Pretrained model weights are available for download from our Hugging Face: facebook/sparsh

model small base
Sparsh (MAE) backbone backbone only
Sparsh (DINO) backbone backbone
Sparsh (DINOv2) ❌ backbone
Sparsh (IJEPA) backbone backbone
Sparsh (VJEPA) backbone backbone

πŸ“₯ Datasets

Pretraining datasets

For pretraining, we curate datasets from multiple sources containing unlabeled data from DIGIT and GelSight sensors.

For DIGIT, the dataset is a mixture of the YCB-Slide dataset and in-house collected data: Touch-Slide. It contains approximately 360k samples of tactile images with a diverse set of no-contact images or backgrounds.

For GelSight, we use open source datasets available online, Touch and Go and ObjectFolder-Real.

DIGIT

To download the dataset, please edit path_dataset in the bash script scripts/download_digitv1_dataset.sh. This will download and extract the data in path_dataset for both YCB-Slide and Touch-Slide datasets.

The structure of the dataset is:

digitv1/Object-Slide
β”œβ”€β”€ object_0 # eg: 004_sugar_box
β”‚   β”œβ”€β”€ dataset_0.pkl
    β”œβ”€β”€ dataset_1.pkl
    β”œβ”€β”€ dataset_2.pkl
    β”œβ”€β”€ dataset_3.pkl
    β”œβ”€β”€ dataset_4.pkl
β”œβ”€β”€ object_1 # eg: bread
...
β”œβ”€β”€ bgs
    β”œβ”€β”€ bg_0.jpg
    ...
    β”œβ”€β”€ bg_18.jpg

In the bgs/ folder there are images of several no-contact images or backgrounds from different DIGIT sensors. This is necessary for pre-processing the data. Please add to these folder background images from your sensor in case you're adding new tactile data.

To load this dataset, use tactile_ssl/data/vision_tactile.py

GelSight dataset

We use Touch and Go to pretrain on GelSight'17 (with markers). The dataset consists of short videoclips making contact with in-the-wild objects. We use all frames from those videoclips, including no-contact frames. We do not perform any preprocessing since the markers contain relevant static shear information.

We also use sequences from the ObjectFolder-Real dataset for pre-training. We preprocess the data by extracting only the tactile images (GelSight Mini), as we do not use the other modalities.

We provide a script to download the preprocessed and compatible version of these datasets with our pipeline. To do so, run the bash script scripts/download_gelsight_dataset.sh. This will download and extract the data. Don't forget to edit path_dataset in the script.

The structure of the dataset is:

gelsight/touch_go
    β”œβ”€β”€ 20220410_031604.pkl
    β”œβ”€β”€ 20220410_031843.pkl
    β”œβ”€β”€ ...
    β”œβ”€β”€ 20220607_133934.pkl
gelsight/object_folder
    β”œβ”€β”€ 001.pkl
    β”œβ”€β”€ 002.pkl
    ...
    β”œβ”€β”€ 051.pkl

If you would like to download the data directly from Touch and Go and ObjectFolder-Real, you can also do so. Data can be downloaded by running the bash scripts scripts/download_datasets_scratch/download_gelsight_object_folder.sh and scripts/download_datasets_scratch/download_gelsight_touchgo.sh. Then, you can process the data to make it compatible with our pipeline by running the Python scripts scripts/download_datasets_scratch/compress_object_folder.py and scripts/download_datasets_scratch/compress_touch_go.py. Please modify the corresponding paths in all scripts accordingly.

To load this dataset, use tactile_ssl/data/vision_tactile.py

Downstream task datasets

We open-source the data that we collected in-house for force estimation, slip detection and pose estimation downstream tasks. The datasets can be downloaded from the Sparsh collection in Hugging Face:

Please locate these datasets in a directory designated for hosting all downstream task datasets.

T1 Force estimation and T2 slip detection

This dataset contains paired tactile and force data, intended for use in predicting 3-axis normal and shear forces applied to the sensor's elastomer. We used three different indenter shapes to collect force-labeled data: hemisphere, sharp, and flat. To measure force ground truths, we employed the ATI nano17 force/torque sensor. The protocol consisted of applying a random normal load followed by a shear load, achieved by sliding the probe 2mm on the sensor's elastomer.

The dataset consists a collection of normal/shear load trajectories for each probe. The structure is as follows (example for DIGIT dataset):

T1_force/digit/sphere
β”œβ”€β”€ batch_1
β”‚   β”œβ”€β”€ dataset_digit_00.pkl
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ dataset_digit_03.pkl
β”‚   β”œβ”€β”€ dataset_slip_forces.pkl
β”œβ”€β”€ batch_2
β”‚   β”œβ”€β”€ ...
T1_force/digit/flat
β”œβ”€β”€ batch_1
β”‚   β”œβ”€β”€ dataset_digit_00.pkl
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ dataset_digit_03.pkl
β”‚   β”œβ”€β”€ dataset_slip_forces.pkl
β”‚   ...
T1_force/digit/sharp
β”œβ”€β”€ ....

For each batch:

  • dataset_digit_xy.pkl: contains the binarized tactile images only.
  • dataset_slip_forces.pkl: it's a dictionary where each key represents a sliding trajectory. Each trajectory has the corresponding force and slip labels.

To load this dataset (DIGIT and GelSight Mini), use tactile_ssl/data/vision_based_forces_slip_probes.py

T3 Pose estimation

This dataset contains time-synchronized pairs of DIGIT images and SE(3) object poses. In our setup, the robot hand is stationary with its palm facing downwards and pressing against the object on a table. The robot hand has DIGIT sensors mounted on the index, middle, and ring fingertips, all of which are in contact with the object. A human manually perturbs the object's pose by translating and rotating it in SE(2). We use tag tracking to obtain the object's pose. We collect data using two objects: a Pringles can and the YCB sugar box, both of which have a tag fixed to their top surfaces.

The dataset is a collection of sequences where a human manually perturbs the object's pose. We collect data using two objects: a Pringles can and the YCB sugar box. Each sequence corresponds to a pickle file containing the following labeled data:

  • DIGIT tactile images for index, middle and ring fingers
  • Object pose tracked from tag in format (x, y, z, qw, qx, qy, qz)
  • Robot hand joint positions
  • object_index_rel_pose_n5: the pose change within the last 5 samples as a transformation matrix. The object pose is with respect to the index finger.
  • object_middle_rel_pose_n5: the pose change within the last 5 samples as a transformation matrix. The object pose is with respect to the middle finger.
  • object_ring_rel_pose_n5: the pose change within the last 5 samples as a transformation matrix. The object pose is with respect to the ring finger. We also provide reference (no contact) images for each of the DIGITs to facilitate pre-processing such as background subtraction.
T3_pose/digit/train
β”œβ”€β”€ pringles
β”‚   β”œβ”€β”€ bag_00.pkl
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ bag_37.pkl
β”‚   β”œβ”€β”€ bag_38.pkl
β”œβ”€β”€ sugar
β”‚   β”œβ”€β”€ ...
T3_pose/digit/test
β”œβ”€β”€ pringles
β”‚   β”œβ”€β”€ bag_00.pkl
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ bag_05.pkl
β”‚   β”œβ”€β”€ bag_06.pkl
β”œβ”€β”€ sugar
β”‚   β”œβ”€β”€ ...
T3_pose/digit/bgs
β”œβ”€β”€ digit_index.png
β”œβ”€β”€ digit_index.png
β”œβ”€β”€ digit_index.png

To load this dataset use tactile_ssl/data/vision_based_pose_probes.py

T4 Grasp stability

We use the Feeling of Success dataset. It contains approximately 9k grasp trials over 100k objects using GelSight'17 sensors mounted on a parallel gripper.

You can download the data directly from the webpage or run the bash script scripts/download_datasets_scratch/download_gelsight_feeling_success.sh to download the data and the Python script scripts/download_datasets_scratch/compress_feeling_success.py to preprocess the dataset compatible with our pipeline. Please update the paths in the scripts accordingly.

T5 Textile recognition

Please download the Clothing dataset. The dataset consist of 4467 short video clips (10-25 frames), of a robot with a GelSight'17 grasping several types of textile (20 classes), such as leather, cotton, polyester, etc.

πŸ‹οΈβ€β™‚οΈ Training Sparsh

We use hydra for configuration management in this repository. The configuration files are located in config/.

The config folder is organized as follows:

β”œβ”€β”€ config
β”‚   β”œβ”€β”€ data # contains dataset configs
β”‚   β”œβ”€β”€ experiment # contains full config for a specific experiment (eg. Sparsh(DINO) or downstream task)
β”‚   β”œβ”€β”€ model # contains configs for each ssl algorithm
β”‚   β”œβ”€β”€ paths # add your config here with paths to datasets / checkpoint / outputs / etc.
β”‚   β”œβ”€β”€ task # contains downstream_task configs for each downstream task in TacBench
β”‚   β”œβ”€β”€ wandb # add your wandb config here for experiment tracking
β”‚   β”œβ”€β”€ default.yaml # The SSL training default config is overridden by experiment/dino_vit.yaml and the like

Following are the instructions to train a Sparsh model:

  • Setup the pretraining datasets according to the instructions above
  • add paths/${YOUR_PATHS}.yaml similar to existing examples and point to the data root accordingly
  • similarly add wandb/${YOUR_CONFIG}.yaml
  • Then choose an experiment, for example: dino_vit.yaml and use the following script

You may need to adjust batch size according to your GPU. All training experiments were done with 8 A100-80GB GPUs.

python train.py +experiment=${YOUR_EXP_NAME} paths=${YOUR_PATHS} wandb=${YOUR_CONFIG}

Training downstream tasks

For training downstream tasks, in our paper we largely follow frozen evaluation, where we freeze the weights of the Sparsh encoder, and only train a lightweight decoder for each downstream task. Training downstream tasks is quite similar to the above instructions but additionally requires a pre-trained model checkpoint checkpoint_encoder which can be specified by updating the task.checkpoint_encoder field in the config. Downstream tasks also need a labeled dataset for the corresponding downstream task.

Use the following script to train downstream tasks:

python train_task.py --config-name=experiment/downstream_task/${EXPERIMENT} paths=${YOUR_PATH_CONFIG} wandb=${YOUR_WANDB_CONFIG}

Once you complete training downstream task decoders for each task, you can also test the checkpoints using the test_task.py script which essentially follows the same format as above. For convenience, we also provide a submit_task.sh bash script to train and test downstream tasks if you're using a SLURM based cluster.

Finally, we also tacbench_report.ipynb where we compute metrics for all the downstream tasks, once the data is computed from the test_task.py script.

πŸ€Ήβ€β™€οΈ Sparsh demo: force field visualization

animated

For testing Sparsh(DINO) + force field decoder live, you only need one DIGIT or GelSight Mini sensor. Follow these steps to run the demo:

  1. Create a folder for downloading the task checkpoints. For example, ${YOUR_PATH}/outputs_sparsh/checkpoints.
  1. Download the decoder checkpoints from Hugging Face for DIGIT and GelSight Mini.

  2. Connect the sensor to your PC. In case of DIGIT, please make sure you have digit-interface installed.

  3. Make sure the device is recognized by the OS (you can use Cheese in Linux to see the video that the sensor is streaming).

  4. Running the demo for DIGIT:

python demo_forcefield.py +experiment=digit/downstream_task/forcefield/digit_dino paths=${YOUR_PATH_CONFIG} paths.output_dir=${YOUR_PATH}/outputs_sparsh/checkpoints/ test.demo.digit_serial=${YOUR_DIGIT_SERIAL}`

The DIGIT serial number is printed on the back of the sensor and has the format DXXXXX.

  1. Running the demo for GelSight Mini:
python demo_forcefield.py +experiment=digit/downstream_task/forcefield/gelsight_dino paths=${YOUR_PATH_CONFIG} paths.output_dir=${YOUR_PATH}/outputs_sparsh/checkpoints/ test.demo.gelsight_device_id=${YOUR_GELSIGHT_VIDEO_ID}`

The GelSight Mini is recognized as a webcam. You can get the video ID by checking in a terminal ls -l /dev/video*.

  1. Take the sensor and slide it across the edge of a table, or across objects with interesting textures! Look at the normal field to localize where you're making contact on the sensor's surface. Look at the shear field to gather an intuition about the direction of the shear force that you applied while sliding the sensor. For example, slide the sensor over an edge up and down to get translational shear or rotate the sensor in place to see torsional slip!

License

This project is licensed under LICENSE.

πŸ“š Citing Sparsh

If you find this repository useful, please consider giving a star ⭐ and citation:

@inproceedings{
    higuera2024sparsh,
    title={Sparsh: Self-supervised touch representations for vision-based tactile sensing},
    author={Carolina Higuera and Akash Sharma and Chaithanya Krishna Bodduluri and Taosha Fan and Patrick Lancaster and Mrinal Kalakrishnan and Michael Kaess and Byron Boots and Mike  Lambeta and Tingfan Wu and Mustafa Mukadam},
    booktitle={8th Annual Conference on Robot Learning},
    year={2024},
    url={https://openreview.net/forum?id=xYJn2e1uu8}
}

🀝 Acknowledgements

We thank Ishan Misra, Mahmoud Assran for insightful discussions on SSL for vision that informed this work, and Changhao Wang, Dhruv Batra, Jitendra Malik, Luis Pineda, Tess Hellebrekers for helpful discussions on the research.

We also thank the team behind datasets like YCB-Slide, Touch and Go, ObjectFolder-Real, Feeling of Success and Clothing dataset for contributing to the research community by open-sourcing their tactile data.

About

Sparsh Self-supervised touch representations for vision-based tactile sensing

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published