Exploring OpenAI Gym: A Platform for Reinforcement Learning Algorithms


According to the OpenAI Gym GitHub repository “OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.”

Open AI Gym has an environment-agent arrangement. It simply means Gym gives you access to an “agent” which can perform specific actions in an “environment”. In return, it gets the observation and reward as a consequence of performing a particular action in the environment.


There are four values that are returned by the environment for every “step” taken by the agent.

  1. Observation (object): an environment-specific object representing your observation of the environment. For example, board state in a board game etc

  2. Reward (float): the amount of reward/score achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward/score.

  3. Done (boolean): whether it’s time to reset the environment again. E.g you lost your last life in the game.

  4. Info (dict): diagnostic information useful for debugging. However, official evaluations of your agent are not allowed to use this for learning.

Following are the available Environments in the Gym:

  1. Classic control and toy text

  2. Algorithmic

  3. Atari

  4. 2D and 3D robots

Here you can find a full list of environments.

Cart-Pole Problem

Here we will try to write a solve a classic control problem from Reinforcement Learning literature, “The Cart-pole Problem”.

The Cart-pole problem is defined as follows:
“A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.”

The following code will quickly allow you see how the problem looks like on your computer.

This is what the output will look like:


Coding the neural network 

This is what the result will look like:


Though we haven’t used the Reinforcement Learning model in this blog, the normal fully connected neural network gave us a satisfactory accuracy of 60%. We used tflearn, which is a higher level API on top of Tensorflow for speeding-up experimentation. We hope that this blog will give you a head start in using OpenAI Gym.

We are waiting to see exciting implementations using Gym and Reinforcement Learning. Happy Coding!

Screen Shot 2018-03-05 at 3.20.37 PM.png

Vipul is an R&D Engineer at Velotio. He is interested in the areas of Deep Learning and Distributed Systems. He has worked on a variety of technologies including containers, virtualization and machine learning. His hobbies include motorcycles, photography and playing the violin. Also, he is an amateur boxer.