Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

This is the code of paper Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization. [arXiv]

pip install -r requirements.txt

python launch.py -e Walker2d-v2

python launch.py -n sac -e HalfCheetah-v2

python launch.py -n pr-sac -e Hopper-v2

python plot.py HalfCheetah-v2 sc-sac /path/to/data/save/dir

note that before ploting the heatmap, you have to manually replace the codes in /path_to_gym_module/envs/mujoco with the codes in ./mujoco_env_enhancing_codes, which enables us to change the relative mass and the relative friction of these environments during test.

cp -r ./mujoco_env_enhancing_codes/* /path_to_gym_module/envs/mujoco/

Provide feedback