1969 Lunar Lander

Understanding the AI

I’d love to say it’s because I have achieved something awesome. Alas, it’s most definitely not.

The simulation allows a choice of how many hidden neurons. 0 = no hidden neurons, and when set at 0, the output is connected to inputs via a weight + bias. You’ll probably have noticed that in the previous code – the burn is 200 * (Math.TanH((weight * input+bias).

200 = max fuel burn.

The code for it is:

    /// <summary>
    /// Ask the neural network how much fuel to burn.
    /// </summary>
    private double GetAIBurnAmount()
    {
        // If we're going for a suicide burn, then don't ask the AI until the required altitude is reached.
        // Of course you could make this part of the training, punishing it for a premature burn, and adding the 
        // min altitude as an input. It'll take longer to train, and probably need more neurons.
        if (_altitudeMiles > c_minimumAltitudeInMilesToBurn) return 0;

        // What data does it require to land without running out of fuel, and not causing the occupants to die?

        // Available Inputs  are: elapsed time, altitude, downward speed, fuel remaining
        //           Outputs are: fuel burn rate

        // The burn is at specific points in time so "time" would be a logical dimension. As it's trying to reduce the downward velocity, it probably needs to know that. It is also designed to minimise fuel usage.
        // There are many questions as to what it really needs. Play with the constants and see how it does.

        List<double> inputs = [];

        // add the chosen inputs to the neural network "inputs" list
        if (c_altitudeIsNetworkInput) inputs.Add(_altitudeMiles / 150);
        if (c_downwardSpeedIsNetworkInput) inputs.Add(_downwardSpeedMilesPerSec);
        if (c_fuelRemainingIsNetworkInput) inputs.Add(FuelRemaining / c_weightOfFullTankOfFuelLBs);
        if (c_timeRemainingIsNetworkInput) inputs.Add(_elapsedTimeSec / 200);

        double[] neuralNetworkInput = [.. inputs];
        double[] outputFromNeuralNetwork = Brain.FeedForward(neuralNetworkInput); // process inputs

        // AI should return 0..1. It might return less than 0 (tanh min = -1), but we don't care - we override that.
        // 0 = don't burn, 1 = full burn (tanh max). Therefore we scale it 0..200
        _fuelRateLBsPerSec = outputFromNeuralNetwork[0] * 200;

        // the thrusters won't even fire without at least a rate of 8, so anything below that is 0. If AI returned minus values, this takes care of them.
        if (_fuelRateLBsPerSec < 8) return 0;

        // the thrusters can't fire at more than 200 lbs per second.
        if (_fuelRateLBsPerSec > 200) return 200;

        // valid values are 0, 8-200
        return _fuelRateLBsPerSec;
    }

Evolution / Generations

The AI is blocked from using the thrusters above 48 miles (“suicide burn” enabled). It can burn 0, or 8-200 (in the original, the thrusters don’t fire with less than 8lb of fuel). It’s therefore prompted for fuel for each of the 10 seconds that follow. If we picked a random weight that outcome is somewhere between the creation of a new crater, and a gentle landing (or worse still, upwards). If we create 3000 landers, all with different random weights, we can score which one does best (closest to the desired outcome).

The best one is part way there at worst, or good at best. If we discard the worst performing 50% and replace them with clones of the best 50% (1500), mutate each a little (change the multiplier) then either one or more will perform better (or not). If it does, we use that neural network as the reference network to clone next time. If the bottom 50% aren’t better, the next attempt will hopefully give a different outcome.

With each epoch (generation) the multiplier will be adjusted until the sum of all burns results in a better landing. How long it takes, and what the ultimate best impact speed ends up depends – because mutation uses a random number generator.

It’s worth remembering the multiplier (weight) also has a bias. Both numbers can change.

Elapsed time as an input

A simple way to look at this is that it cannot burn fuel before 47 miles.

Time is linear (10-second intervals) the input will increase each time due to multiplying weight x time + bias. Imagine it chose 100 as the multiplier it will slam into the moon very hard. We already know that a burn of 200 for all but the 47 miles, gives the right amount. We have 7 burns after 47. So if we don’t do 200 for those 7 we’ll need larger than 164 at 47miles.

Say we had 200 on our final burn and 199 on the preceding one, and so on we can’t have 200-8 for 47 miles – it’ll result in upwards movement. It opts for a 0.8694499936791544 increase per 10 seconds plus a bias which means it starts at 190.66 and ends with less than 200.

It doesn’t matter what the altitude is because altitude is proportional to the time. It doesn’t care what fuel it has or hasn’t because if it runs out, the network will have scored badly and be eliminated.

The thing to remind yourself is that it isn’t smart. I would liken it to trying on a shoe. Once you’ve tried the shoe on you know it fits. If it didn’t fit, you’d have abandoned that shoe.

There is nothing smart about it. Any small delay in firing the thruster or mismeasurement could cause a fatal outcome. It won’t adjust the next burn to compensate – please never build a control system this dumb.

Fuel as an input

This works similarly to elapsed time.

As time passes, fuel reduces (albeit proportional to fuel rate). We observe an upside-down burn pattern of 197.7, 196.6, 194.9, 192.5, 189.0, 184.1… That’s because fuel remaining decreases over time, whereas elapsed time increases.

Through refinement, it worked out that if starting on 197.7, and applying a specific weight, the sum of the subsequent burns results in the desired outcome.

It’s possibly less brittle as long as the fuel measurement is accurate. But I still wouldn’t want to be on a lander controlled by it.

Altitude and Velocity

Given the neural network has no hidden neurons and therefore is input x weight + bias, the same overall process can be said for altitude and velocity. From guess and refinement, the network has discarded those that don’t burn enough proportionally to the altitude or velocity, and therefore with enough generations they come up with a formula that works.

Does having all 4 inputs make for a better solution?

No. Or at least I believe that to be the case.

With 4 inputs into one output, the overall “fuel rate” is dependent on all 4 being correct. If one of the inputs is incorrect, I would be hugely surprised if the neural network gave the correct result. For example, time is opposite in sign to the others. Just one bad reading is likely to break things.

If I had to rely on an input it would be downward vertical velocity. As long as the burn proportionally arrests the downward speed, it shouldn’t damage the craft. Of course, if the thruster doesn’t give the lift expected (nozzle/valve fails to open on one burn, the subsequent burn *could* compensate – although as stated earlier, corrective action is unlikely possible for a suicide burn (as it cannot exceed 200).

Scoring

To improve with each generation, we need to score the landers. This is remarkably trivial. Higher = Better.

40 mph vertical impact is the cutoff, any higher you’ve made a crater. So the score is (40-impact velocity). i.e. if it lands at 0 impact speed, it receives 40 points, and if it lands at 39 mph, it receives 1 point. The score is then multiplied by 100,000 because it may have decimal points that are cancelled out by the fuel points without it.

We then add the remaining proportion of fuel (0 to 100 points). If the score is below zero, it failed to use the fuel, so we additionally punish based on the volume of fuel left (that it should have used).

ImpactVelocityMPH = 3600 * _downwardSpeedMilesPerSec;

// Don't reward it for a smoother landing than the acceptable impact. This enables it to maximise fuel.
if (ImpactVelocityMPH >= 0 && ImpactVelocityMPH < c_acceptableImpactMPH) ImpactVelocityMPH = c_acceptableImpactMPH;

double score = (40 - ImpactVelocityMPH) * 100000; // multiplier to stop fuel overriding the mph. 40mph = 0 points, 0mph = 4,000,000 points.                                                              0

// more points for fuel left. It must never get points for fuel left if it crashes (score<0).  Points are 0-100, ratio of fuel remaining : total fuel
// we apply a negative bonus, if it crashed with remaining fuel, positive if good attempt
score += ((score <= 0)?-1:1) * (int)(FuelRemaining / c_weightOfFullTankOfFuelLBs * 100f ); //100 points for full tank of fuel left.
        
Score = (int)score;

How many neurons is optimal?

It seems to work with input connected to output, just fine. I wouldn’t say adding 10 hidden neurons improves it dramatically. Feel free to edit the AddNewBrain() and try additional layers (it contains just zero or one hidden layer at the moment). If you find something that works much better, please add a comment.

Let’s finish up with a quick conclusion.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *