Experiment with Policy Gradient methods (description), as well as variance reduction.
Current implementation:
- Continuous and discrete environments
- Baseline network for variance reduction
$ conda env create -f [environment.yml | environment_cuda.yml]
$ conda activate [policy_grad | policy_grad_cuda]
$ python main.py --config_filename config_filename