Once a loser…

Anybody familiar with the periodic table of elements would be familiar with the concept of exceptions. Despite our human brains exceptional desperation to classify and group things into buckets, the world we live in is full of exceptions. The list of things also includes people. History has often shown that the people who are different and do not fall into this buckets are often the ones who contribute most to the world. These are the ones who defy the status quo and the normal notions of the society. These are the ones who can bring change. Some times bad, more than often good. These are called odd balls in simple language, data science calls them outliers. Do you know what data engineers do to outliers? They discard them.

Artificial intelligence and Machine Learning seem to be the latest and hottest trends in technology according to many. Its not only the latest ‘fad’ for CEOs and CTOs but has also become a very integral part of our lives (even if you don’t want it to – more on that later). But that is this? Machine learning algorithms are just simple programs that are instructed to look into a data set and come up with a model that gives the most accurate prediction for future data. Might sound simple, but there is a lot more to it. All it does is split the data into a training set and a testing set. It trains itself with training set is one of they predefined ways and then tests itself against the testing set to see if its logic was correct. if not, it tries another predefined way. There are about 10 commonly used predefined ways. If you don’t fit into any category, then you are screwed.

One way that Netflix or YouTube or other video streaming service would suggest a video to a user would be using Naive Bayes algorithm. Its a simple calculation of probability of you liking A because you liked B. For this they use your Likes, Age and location etc together with genres or key words they associate certain programs with. Lets say, you watched Daredevil, then you must like super-heros and Marvel cinematic universe. You watched a standup comedy show? More from same comic or other comics who do similar kinds of material must be a good guess.

Metflix: Because You Watched X. Where are we at? | by Mohtadi Ben Fraj |  HackerNoon.com | Medium

Now lets go step further, A mall or an e-commerce sites wants to target ads to users and want to link your age, gender to your spending patterns. The ones marked below are outliers, that means the k-means algorithm used for customer segmentation cannot place you into a cluster/group because you do not fit the purchasing patterns of the data they have. Here the words “Data they have” is very important. More on that later. So, since you do not fit, the data engineer discards these outliers as faulty data or insignificant data. It means you get generic suggestions/ads or irrelevant ads. Annoying but still not dangerous.

But the problem comes when these standard data mining algorithms are used to take decisions that may make or break lives. For example, approval of a loan or admission into a college or a job. May be someone had a personal tragedy and had bad scores for a couple of years? May be someone went broke because they spent all their income on medical bills of a family member. Then you seem as an unusual data point to the Machine learning algorithm. It cannot handle you because it never looks for a cause. For it, you are just a row in a spreadsheet that doesn’t fit its idea of the world. The data it has is its world view.

If you wanted to develop a program that screened resumes and shortlisted potential candidates for interviews. Old school way was to write a lengthy list of skills you are looking for and match the resumes with them or a sub set of skills. In the modern way, you just feed the resumes of all your employees to it and ask the Machine learning algorithm to figure out what your company needs. You may think that the program will deduce the skills required. But it can also come to a conclusion not to select female candidates if only 20% of your workforce is woman. As per the data you prefer Men 80% or 4 times over the other gender. If majority of the work force is in 30’s, it can also conclude that the company doesn’t want to hire other age groups. You may think this is too hypothetical, but this actually happened.

Being ignored for a job because of your gender is…. BAD, but not a catastrophe. How about being denied a loan because of your race? How about being arrested for a crime you did not commit because the data engineers used the facial data related to only a specific ethnic group to a facial recognition algorithm?

A lot of the technology news is filled with success stories of Machine learning and how it can help businesses earn more. But the other-side of the story is often suppressed. Did you know that a twitter bot by Microsoft was suspended mere 16 hours after it went live after it started to tweet stuff like this.

TayTweets: How Far We've Come Since Tay the Twitter bot

Training a machine learning algorithm is like teaching a child. Like a child trusting their parents to give them all the accurate,true information, the algorithms assumes that the information given to it is the only information. But unlike children who may change their opinion when they see a new point of view or set of information, computers do not have the autonomy to seek additional data. They always think that the data given to them is full and final, no other type of data can exist. For a company with 20% female work force, the algorithm truly believes that this is the way things should be, the percentage cannot increase. Thus AI kind of works to ensure that the status-quo is maintained. Thus killing progress. For computers, once a loser is always a loser.

If a bias is conscious then it means that you are dealing with an Asshole, if a bias is unconscious then you are dealing with someone ignorant. A computer program is only as smart as the people who build it. If the people who build it are ignorant, then it is going to be ignorant. But unlike humans, computers do not look for reason, that makes them dangerous. Like all human beings, the programmers or data scientists or data engineers are also biased. They may fail to acknowledge it because most of us don’t know our own biases. If you don’t believe me, Harvard has a simple online test to prove this. Don’t worry, they don’t ask for your personal information to take the test.

If majority of the venture capitalists and investors were using Machine learning to decide their choices, What would be the AI calculated value of success for a college drop out, a lumber-mill worker and a guy selling books from a garage? I am pretty sure it would be in single digits if not zero. If that had happened, we would be in a world without Facebook, Tesla and Amazon. The same algorithms that these guys are creating now, would prevent similar success stories. May be that’s the plan. May be they do not want competition.

Imagine If they decided sporting results, same teams would always win. A winner will always be a winner and a loser will always be a loser.

Imagine if machine learning algorithms decided elections, politicians will become de-facto dictators. Wait! Hasn’t Facebook already almost ensured this?

I will let Will Smith summarize why I detest these algorithms.