Background
Sing along to the Elvis song… Ok, so he didn’t sing “convolution”, that’s not my fault.
If you’re into AI, I probably don’t need to introduce this topic. If you missed my original post, it’s here.
This post introduces playing CartPole using the video display (not state), without any fancy neural network or other advanced techniques.
By now you’ll hopefully know I am never afraid to challenge myself. After the huge disappointment with the first challenge, I set out to solve it using the video display rather than the “state” object. How hard can it be? Yeah. Don’t ask.
In keeping with my mantra of “don’t use advanced AI for simple tasks“, here’s the first part. A lot of time was spent and along that journey, new scars were gained. As I previously wrote, we learn from failure, so try to see them as beneficial.
I am still at a loss as to why someone would be silly enough to attach a pole to a cart expecting it won’t fall over. Simple fix: attach it with multiple bolts.
Before we start
I made updates to the original GitHub code.
In making this follow-up post, I chose to add features to the original – tracking the min/max for the state object, improved resolution drawing, frame-by-frame stepping, and arrows showing AI direction in real-time.
Not everything goes to plan
Before I explain (skip if you want), maybe we should poke fun at my dumb mistakes.
Dave’s lesson of the day is, “to never spend hours training something with incorrect training data“. Obvious right? It isn’t always that clear-cut.
After laughing at my evil genius solving the “Cart-Pole” using the “state” object, when others felt the need to use RL, or deep learning, my smugness got karma’d. (Yes, you can write that in English, you have my permission).
I didn’t think to run my simple formula for long (thousands of games). Complacency kicked in. Had I done that I would have found that whilst it works for many, it doesn’t for all variations of the initial “state” parameters. Not a huge crime, until I then used that imperfect algorithm for supervised learning. If you can relate, you’re probably feeling my pain – if you haven’t done something this silly, well done.
Then during my fun journey, I made another silly mistake, a Bob Ross, style, “there are no mistakes, just happy accidents [external]“.
When it refused to train past generation “2416”?!!! I thoroughly checked everything. I was bamboozled for longer than I care to admit.
That got me modifying my random number generator +/-0.5 code to be truly cryptographic, not pseudo. Inadvertently, I missed a zero, and my weights ended up “int” not “float” and greater than 1. The scream was short-lived when I noticed it still worked – beat the game!
I therefore present my contribution to AI, “Awesome, thank you, Dave“, said no one.
private int GetCorrectDirection() { float direction = State.CartVelocity + State.PoleAngle + State.PoleAngularVelocity; return (direction < 0) ? 0 : 1; }
Don’t believe me it can be that simple? Check out my GitHub, or try it yourself. The “user” version has an “AI Play” button using it.
It seems to work – for the 15,000 + games I tried it.
Disclaimer: If someone is sad/bored enough to leave it running for weeks, and it eventually loses, please don’t come crying to me.
On the next page, we’ll look at video to action.