Skip to content

Latest commit

 

History

History
41 lines (27 loc) · 1.13 KB

README.md

File metadata and controls

41 lines (27 loc) · 1.13 KB

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

This is the code of paper Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization. [arXiv]

Instructions

  • install requirements (python=3.6):
pip install -r requirements.txt
  • run sc-sac in the Walker2d-v2 task with default configs:
python launch.py -e Walker2d-v2 
  • run sac in the HalfCheetah-v2 task:
python launch.py -n sac -e HalfCheetah-v2 
  • run pr-sac in the Hopper-v2 task:
python launch.py -n pr-sac -e Hopper-v2
  • plot heatmap of sc-sac trained policies in the HalfCheetah-v2 task:
python plot.py HalfCheetah-v2 sc-sac /path/to/data/save/dir
  • note that before ploting the heatmap, you have to manually replace the codes in /path_to_gym_module/envs/mujoco with the codes in ./mujoco_env_enhancing_codes, which enables us to change the relative mass and the relative friction of these environments during test.
cp -r ./mujoco_env_enhancing_codes/* /path_to_gym_module/envs/mujoco/