An implementation of a vanilla policy gradient algorithm for TensorFlow and PyTorch.
Detailed Documentation
Implementation