Cart Pole, life is sometimes disappointing – Educational thought experiments in AI/ML written up with fun in mind along with source-code in GitHub

A while back I came across the “Gym Library”, and noticed they had various games that one could teach AI to play. I took my fancy to solving, the “Cart Pole”, here:

https://www.gymlibrary.dev/environments/classic_control/cart_pole [external]

I have no idea what drew me to it, but I will spend more effort vetting the challenges I set myself. So why does it sound like I have a sour taste? Because it took 2 nanoseconds to train; having spent an hour or more converting the Python code to c# rendering a bitmap, and a little more time adding the framework around it.

The game itself (I made both a user-playable version, AI-controlled, plus a training app) takes most mere mortals a little while to perfect. Even then, the AI destroys them.

With the Python source, the AI needs to return “0” (steer cart left) or “1” (steer car right). It’s as easy as follows:

double[] input = [
  env.State.CartPosition, 
  env.State.CartVelocity / 4, 
  env.State.PoleAngle / 4, 
  env.State.PoleAngularVelocity / 4];

double nn =
 (HardTANH((0.017999999225139618 * input[0]) + 
           (0.4300000071525574 * input[1]) + 
           (0.3140000104904175 * input[2]) + 
           (0.39100000262260437 * input[3])));

direction = (nn < 0) ? 0 : 1;

HardTANH is nothing more than a function that ensures the value is between -1 and 1.

internal static double HardTANH(double value)
{
  if (value < -1) return -1;
  if (value > 1) return 1;

  return value;
}

It doesn’t even really take any neurons per se. That formula is little more than multiplying the inputs by a few magic easily derived numbers…

Rather unsatisfying – isn’t it? Ok, so it’s awesome at the game, but not the first solution, hopefully, I am the last.

The game loop and scoring are as follows. Rewarding the AI based on the number of steps is not going to win any awards for smarts. Clearly the goal is to achieve the 501 steps and the more steps, the closer you are to that goal. I punished it based on the average cart velocity because the goal should be to minimise the amount the car moves (i.e. keep the pole very upright).

 internal void PlayGame()
 {
     int steps = 0;
     float totVelocity = 0;

     while (!Terminated)
     {
         int action = GetActionFromAI();
         Step(action);

         steps++;
         totVelocity += Math.Abs(State.CartVelocity);
     }

     // points for steps, minus points for excessive velocity
     Score = (int) (steps*1000 - (int)100f*totVelocity/steps);
 }

The games (user and AI) can be found on GitHub here.

Because it seems such a waste, I will repeat this challenge using the “video” screen rather than the known inputs.

If you are to learn anything from this post, it should be to find something worthy of your AI skills; that might mean doing a little homework first.

Related Posts

Leave a Reply Cancel reply