Consonant or Vowel, sounds like Countdown! – Educational thought experiments in AI/ML written up with fun in mind along with source-code in GitHub

I wrote this post not because it’s a difficult task, but one that I wanted to explain properly.

Whilst experimenting I saw someone have a task of training a neural network to learn to differentiate between consonants and vowels. I am quite appalled anyone even thought of this, it is a perfect example where the use of AI/ML is like using an RS-28 SARMAT to wipe out a scurry of squirrels.

There are many ways to do this task, a basic algorithm is this (conveniently ignoring accented characters):

char c = ...
bool isVowel = "aeiou".IndexOf(c.ToString(), StringComparison.InvariantCultureIgnoreCase) >= 0;

This is not a post to provide the fastest / coolest way to do it. But I can claim that we can do it a lot less efficiently using so-called AI/ML!

How not to separate vowels from consonants

We make a Perceptron neural network with 26 inputs, 1 output and a TANH activation function. Backpropagation was used to train. No hidden layers, so calling it ML is a bit misleading.

The training data is this:

A 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 1
B 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
C 0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
D 0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
E 0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 1
F 0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
G 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
H 0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
I 0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 1
J 0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
K 0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
L 0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
M 0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0 => 0
N 0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0 => 0
O 0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0 => 1
P 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0 => 0
Q 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0 => 0
R 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0 => 0
S 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0 => 0
T 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0 => 0
U 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0 => 1
V 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0 => 0
W 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0 => 0
X 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0 => 0
Y 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0 => 0
Z 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 => 0

We allocate one input per letter, and for the output, we put a 1 if it’s a vowel or a 0 if it’s a consonant.

After 4000 or so epoch’s it trains.

TEST OF TRAINED NETWORK:
A Vowel. Difference: -0.05
B Consonant. Difference: 0.001
C Consonant. Difference: 0.001
D Consonant. Difference: 0.001
E Vowel. Difference: -0.049
F Consonant. Difference: 0.001
G Consonant. Difference: 0.001
H Consonant. Difference: 0.001
I Vowel. Difference: -0.05
J Consonant. Difference: 0.001
K Consonant. Difference: 0.001
L Consonant. Difference: 0.001
M Consonant. Difference: 0.001
N Consonant. Difference: 0.001
O Vowel. Difference: -0.05
P Consonant. Difference: 0.001
Q Consonant. Difference: 0.001
R Consonant. Difference: 0.001
S Consonant. Difference: 0.001
T Consonant. Difference: 0.001
U Vowel. Difference: -0.05
V Consonant. Difference: 0.001
W Consonant. Difference: 0.001
X Consonant. Difference: 0.001
Y Consonant. Difference: 0.001
Z Consonant. Difference: 0.001

The output above is from the following.

To ask the neural network we put a “1” in the relevant letter array element, and as long as the output exceeds 0.5 the probability is that it’s a vowel. Due to the training, we do it until the accuracy is in fact within 0.05f.

for (int letter = 0; letter < 26; letter++)
{
  double[] letters = new double[26];
  letters[letter] = 1;

  double desiredOutput1ifVowel0IfConsonant = (letter == 0 || letter == 4 || letter == 8 || letter == 14 || letter == 20) ? 1 : 0;

  Console.WriteLine($"{char.ConvertFromUtf32(65 + letter)} {(networkToTeachVowels.FeedForward(letters)[0] > 0.5 ? "Vowel" : "Consonant")}. Difference: {(networkToTeachVowels.FeedForward(letters)[0] - desiredOutput1ifVowel0IfConsonant):0.###}");
}

Wow, you’re amazed – right? No? Good, that part should leave you empty, but hopefully not too empty. I do hope the next bit makes up for it.

How it works

You know very well magic doesn’t exist, so it’s not magic.

The bit that is somewhat missing is the explanation. Whilst complex networks can be difficult to understand, this one is easy.

Look at the following code, and hopefully, you’ll spot a pattern. I generated the code off the weightings and biases.

double[] outputFromNeuralNetwork = new double[1];

outputFromNeuralNetwork[0] = /* L1.N0 -> */ Math.Tanh(
(/* weight L0.N0-L0.N0 x value */ 1.5222740741375804*input[0])+
(/* weight L0.N0-L0.N1 x value */ -0.3092248744839273*input[1])+
(/* weight L0.N0-L0.N2 x value */ -0.3092136835720426*input[2])+
(/* weight L0.N0-L0.N3 x value */ -0.30920248993812216*input[3])+
(/* weight L0.N0-L0.N4 x value */ 1.529896942020771*input[4])+
(/* weight L0.N0-L0.N5 x value */ -0.3092395937753276*input[5])+
(/* weight L0.N0-L0.N6 x value */ -0.30922840625095094*input[6])+
(/* weight L0.N0-L0.N7 x value */ -0.3092172160061604*input[7])+
(/* weight L0.N0-L0.N8 x value */ 1.5215685507934011*input[8])+
(/* weight L0.N0-L0.N9 x value */ -0.3092559255033144*input[9])+
(/* weight L0.N0-L0.N10 x value */ -0.3092447420839209*input[10])+
(/* weight L0.N0-L0.N11 x value */ -0.3092335559462542*input[11])+
(/* weight L0.N0-L0.N12 x value */ -0.3092223670889398*input[12])+
(/* weight L0.N0-L0.N13 x value */ -0.3092111755106017*input[13])+
(/* weight L0.N0-L0.N14 x value */ 1.5261910385030881*input[14])+
(/* weight L0.N0-L0.N15 x value */ -0.30924898750656465*input[15])+
(/* weight L0.N0-L0.N16 x value */ -0.3092378023510434*input[16])+
(/* weight L0.N0-L0.N17 x value */ -0.3092266144763459*input[17])+
(/* weight L0.N0-L0.N18 x value */ -0.30921542388109685*input[18])+
(/* weight L0.N0-L0.N19 x value */ -0.30920423056391944*input[19])+
(/* weight L0.N0-L0.N20 x value */ 1.5263640679030304*input[20])+
(/* weight L0.N0-L0.N21 x value */ -0.3092420088817847*input[21])+
(/* weight L0.N0-L0.N22 x value */ -0.30923082197177726*input[22])+
(/* weight L0.N0-L0.N23 x value */ -0.30921963234167754*input[23])+
(/* weight L0.N0-L0.N24 x value */ -0.30920843999010833*input[24])+
(/* weight L0.N0-L0.N25 x value */ -0.30919724491569345*input[25])+

+0.31026941850693873);

Notice the weight for [0], [4], [8], [14], [20] are around 1.52* and the remaining elements are 0.309*.

A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
^           ^           ^                 ^                 ^

Elements 0,4,8,14,20 are the vowels.

Remember we want ZERO for consonants and ONE for vowels to be output from the neural network.

If a 1 is in elements [0], [4], [8], [14], [20]
- compute TANH( 1.5222740741375804 * 1 + 0.31026941850693873 )
- = 0.95007430643
- which is within the 0.05f threshold of 1.

If a 1 is in any other element
- compute TANH(-0.30919724491569345 * 1 + 0.31026941850693873)
- = 0.00109999955,
- which is within the 0.05f threshold of 0.

Only one of the 26 elements can be a 1, the rest must be 0 as we’re testing one letter at a time. Therefore for vowel elements, it will return almost 1, and the others 0.

That’s it. The code (for what it’s worth) is on GitHub.

Please don’t ever do something this silly. What I do recommend is that you to attempt to understand how your neural network is working.

I hope this 10-minute read was worth it.