Skip to content

Code for ICRA24 paper "Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation" Paper:https://arxiv.org/abs/2310.07968 Video:https://www.youtube.com/watch?v=rN5S8QIhhQc

License

Notifications You must be signed in to change notification settings

sled-group/navchat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ORION Framework for Open-World Interactive Personalized Robot Navigation

Paper: Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation

We propsoe a general framework for Open-woRld Interactive persOnalized Navigation (ORION) that uses Large Language Models (LLMs) to make sequential decisions to manipulate different modules, so the robot can search, detect and navigate in the environment and talk with the user in natural language.

Local Image

Contents

Demo: ORION interacts with human in a household scene

YouTube Video

Demo: ORION interacts with a user simulator

YouTube Video

Demo: ORION contains spatial reasoning ability

YouTube Video

Installation

Git clone the project and cd to the project main directory.

  1. Install habitat-sim. We use 0.2.2 version.
conda create -n <name> python=3.7 cmake=3.14.0
conda activate <name>
conda install habitat-sim==0.2.2 withbullet headless -c conda-forge -c aihabitat
  1. Install pytorch. We use 1.13.1+cu117 version. But you can change the version according to your own cuda version.
conda install install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
  1. Install submodules
git submodule update --init --recursive
  • 3.1. habitat-lab. We use 0.2.2 version.
cd third_party/habitat-lab/
pip install -r requirements.txt
python setup.py develop --all
  • 3.2. Install dependency for LSeg, openclip, openai chatgpt api and other dependencies.
cd <project_main_dir>
pip install -r requirements.txt
cd third_party/Grounded-Segment-Anything
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.7/    # make sure this cuda version is same as pytorch cuda version

python -m pip install -e segment_anything
python -m pip install -e GroundingDINO
  1. Finally, install the project itself
cd <project_main_dir>
python setup.py develop

Data

Build a data directory in the project main directory. The structure of the data folder should be as follows:

.
├── data
│   ├── datasets                  # soft link to habitat-lab/data/
│   │   └── objectnav_hm3d_v2
│   ├── experiments               # experment results save_dir
│   ├── pretrained_ckpts          # pretrained model checkpoints
│   │   ├── groundingdino_swint_ogc.pth
│   │   ├── lseg_demo_e200.ckpt
│   │   └── sam_vit_h_4b8939.pth
│   └── scene_datasets            # soft link to habitat-lab/data/scene_datasets
│       └── hm3d_v0.2
├── demos
├── orion
├── README.md
├── requirements.txt
├── scripts
├── setup.py
├── tests
└── third_party

Habitat Data

We use (Habitat HM3D dataset)[https://github.com/matterport/habitat-matterport-3dresearch]. Please apply through the website for permission. After downloading the HM3D v0.2, add soft link to the data/scene_datasets/hm3d_v0.2 following the data practice of habiat-lab. Downlaod objectnav_hm3d_v2 dataset and save it to data/datasets/objectnav_hm3d_v2.

In our experiment, we use 4ok3usBNeis, MHPLjHsuG27, mL8ThkuaVTM, QaLdnwvtxbs, TEEsavR23oF, qyAac8rV8Zk, h1zeeAwLh9Z, cvZr5TUy5C5, LT9Jq6dN3Ea, y9hTuugGdiq 10 scenes from the val set.

Pre-trained Model Checkpoints

  1. Download SAM ckpt sam_vit_h_4b8939.pth to data/pretrained_ckpts/sam_vit_h_4b8939.pth.

  2. Download grounding-dino ckpt groundingdino_swint_ogc.pth to data/pretrained_ckpts/groundingdino_swint_ogc.pth.

  3. Download Lseg ckpt to data/pretrained_ckpts/lseg_demo_e200.ckpt.

Experiment

  1. cd <project_main_dir>. All python scripts should be run in the project main directory.
  2. python scripts/collect_scene_fbe.py --scene_id==<scene_id> to collect rgbd frames in habitat scenes using frotier-based exploration. You can set args to decide which scenes to collect. Optional: use scripts/create_video.py to create video from the collected frames.
  3. python scripts/build_vlmap.py --scene_id=<scene_id> --feature_type=lseg to build vlmap for each scene. It takes 30 min for LSeg and 60min for ConceptFusion to build one scene (around 500 frames).
  4. Now after set your openai api key in orion/config/chatgpt_config.py and prepare the data and pre-trained model ckpts, you can directly run python demos/play_interactive_terminal.py to talk with ORION in the terminal.
  5. After build VLMap or ConceptFusionMap for the scenes, you can run complete experiments:
  • python scripts/user_agent_talk_orion.py run the ORION method
  • python scripts/user_agent_talk_cow.py run the CoW method
  • python scripts/user_agent_talk_vlmap.py run the VLMap method
  • python scripts/user_agent_talk_cf.py run the Conceptfusion method

Make sure to set suitable arguments and the chatgpt config for both the user simulator and the agent.

ChatGPT API key

We use OpenAI's ChatGPT for the chatbot, using either Azure or OpenAI's API. Please edit the orion/config/chatgpt_config.py with your own api keys.

Demo

  1. To test the chagpt api, you can try to run python demos/play_chatgpt_api.py to talk with the chatbot. Need to set your openai api key in orion/config/chatgpt_config.py.
  2. To test the LSeg model, you can run python demos/play_lseg.py to see the LSeg model in action.
  3. To test the GroundingSAM model, you can run python demos/play_groundingSAM.py to see the GroundingSAM model in action.
  4. To test GradCAM, you can run python demos/play_gradcam.py to see the GradCAM in action.
  5. To play with keyboard control, you can run python demos/play_habitat_teleop.py to control the agent with keyboard in the habitat-sim window, where "w/s/a/d/p" denotes forward/backward/left/right/finish.
  6. To play with a Gradio GUI, you can run python demos/play_interactive_gradio.py with orion agent. You can interactive with the agent in natural language and see the agent's response and actions.

Trouble Shooting

  1. If the model downloading is very slow, it might be the network issue of IPV6 connect. Try to disble the IPV6
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
  1. The LSeg model can only process certain size of image, try to resize the image to HxW=480x640.
  2. If GroundingDINO is not installed successfully, you can try to build and install it manually
cd third_party/Grounded-Segment-Anything/GroundingDINO
python setup.py build
python setup.py install

Citation

@article{dai2023think,
  title={Think, act, and ask: Open-world interactive personalized robot navigation},
  author={Dai, Yinpei and Peng, Run and Li, Sikai and Chai, Joyce},
  journal={ICRA},
  year={2024}
}

Acknowledgement

This work is supported by Amazon Consumer Robotics, NSF IIS-1949634, NSF SES-2128623, and has benefited from the Microsoft Accelerate Foundation Models Research (AFMR) grant program.

About

Code for ICRA24 paper "Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation" Paper:https://arxiv.org/abs/2310.07968 Video:https://www.youtube.com/watch?v=rN5S8QIhhQc

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages