Can an NLP program outperform a 3-year-old?

4 min readMay 20, 2020

Can an NLP program outperform a 3-year-old?

Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s?
— Alan Turing

I was trying to develop an NLP algorithm to parse simple commands like
“Add me to the mailing group”
“Remove me from the mailing group”

After wrangling with Artificial Neural Networks, Convolutional Neural networks, Recurrent Neural Networks, Deep Learning, and the likes for some time and then being unable to understand even the first few paragraphs, I conceded failure. I then decided to develop my own simple poor man’s NLP implementation using some keyword-based lookups and leveraging NLTK ‘s Part of Speech Tagging. My requirement did not warrant a huge corpus, but a handful of guessable keywords like Add, Remove, Delete, Create

I eat my hat when I explain the logic of my custom sentence classifier cum named entity recognizer to my fellow teammates.

If the keyword “Add” is found, call Add API
Else if the keyword is “Delete”/”Remove”, call Delete API
Get all the nouns in the sentence (Thanks to NLTK POS) and send it as parameters to the API. It was the headache of the API to look up all the nouns against the user database and find the correct one. For example, if the user command was — “Remove antony123 from the mailing list”, NLTK will return 3 nouns “antony123”, ”mail”, ”list” and I searched the database to find out that “antony123” is the expected username entity whom I have to remove from the mailing group

To make my implementation not to look so naive, I decided to throw a cosine similarity function to replace my keyword based sentence classifier. I came up with some sample sentences and calculated the vector representations using the simplest down-to-earth algorithm — bagofwords (the only algorithm which did not scare me or make me feel unworthy of this life)

add username to the mailing list
remove username from the mailing list
delete username from the mailing list

I compared the vector of my input sentences say — “add dummy123 to the mailing list” to the list of sample vectors I already had and found the closest matching sentence, in this case “add someone to the mailing list” and thus classifying my sentence as of“add member” category.

Pheew… I was relieved and I sounded less naive and even like a whiz while explaining my NLP implementation to my non-NLP/AI folks ;)

But my algorithm went for a toss when I wanted to use more use-cases and cosine similarity won’t help me out. It dawned on my bird brain that deep learning and other obscure algorithms exist for a reason.

I took inspiration from Alan Turing’s quote —” Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s?” and started pondering over my 3 year old daughter’s way of understanding words and sentences and facts.

Occasion 1:

I switched off the night lamp too and made the room pitch dark as the last resort to make my daughter sleep by scaring her with darkness, after failing at all other attempts. She appeared undaunted and asked me the question -

“Why did you switch on the ‘Dark Light’ “

Occasion 2:

My daughter’s cot is of platform type with a tall panel on one side and accessible from all other three sides. One day she was visiting my in-laws and their cot had tall panels lengthwise and one of the breadthwise sides was leaning against a wall, making it accessible from only one side. My daughter looked at that cot and asked me-

“Why there is only one ‘Down’ for this cot”

I was amazed at the ways in which a brain builds up the language. I understood from my little daughter that -

Language is not putting the words together but thoughts.

Circling back to Turing and my daughter’s way of developing language, I wanted to create word vectors not using some random TF-IDF method which i consider as some brut force method to attempt at language modeling, however good its results may be.

Word2Vec makes more sense and it is very human in approach except for the input it feeds on- all the words that are available in the whole world at its disposal, while a kid constructs language with a handful of words that the poor small fella ever heard and remembered.

I started building up word vectors by analyzing how I would understand a word. Say as in

Down = space + position + negative

Up = space + position + positive

On = space +postion+neutral

How will I represent this in a vector ?

The problems are manifold. To start with I see the below issues

I can not hand-code all the words of a domain.
The features-space, position, positive, negative will not hold good for other kinds of words. Like in below example

I am wrapping up with the hope that one day I will stumble upon the golden feature list that the brain uses to “understand” a word.

Can an NLP program outperform a 3-year-old?

Written by Shanmugakannan NR