When Algorithms Go Bad

We are algorithms with legs. Ok, maybe I should clarify that. Our brains employ self-learning algorithms. Our brains induce patterns on the data that is being input into it. As babies, we may assume that all four-legged creatures are dogs. However, over time, our brains begin to differentiate dogs from cows and dogs from cats. With continued input, they will differentiate dogs from foxes and coyotes. In this way, our brains have followed certain implicit rules that led to particular conclusions. Our brains reflect the environment within which they are placed.

Self-learning machines do much the same. They are given an input of data and, mainly through trial, error, and feedback induce patterns on it. In some cases, the programmers can set the machine off on a certain track, give it certain parameters, and fine tune it with feedback, but it is up to the machine to find the patterns. Shown numerous pictures of dogs, for example, they will eventually identify ‘dogness’ and, at some point, when shown a cat, will see it lacks this ‘dogness’ thereby establishing dog and non-dog categories. If shown a hyena, the algorithm may incorrectly assign it to the dog category. Humans sometimes must step in to correct such associations, but, eventually, the self-learning machine will get it right. Learning from mistakes is as important to machines as it is to humans.

Unanticipated mistakes are simply a by-product of machine learning. A facial recognition camera in China was set up to identify jaywalkers. It accidentally identified a well-known billionaire as a criminal jaywalker. It seems a picture of him was on the side of a bus. The camera saw it and the algorithm did its job. Unfortunately, the algorithm hadn’t learned that a photograph is not human. Now, it knows.

Just as IBM’s Deep Blue had to be beaten by Gary Kasparov in order to eventually beat him, machine learning must make mistakes in order to perform their tasks better. Unfortunately, it is this fact that presents a problem to researchers. If the researcher sees the mistake not as a mistake but as a correct interpretation of the data, they may rush to hasty conclusions, especially if it is a conclusion they have been hoping to find. Indeed, this has already occurred and is starting to tarnish the reputations of scientists and science itself.

But before discussing these broader implications, it is necessary to point out a few mistakes algorithms have already made. Google has suffered a number of algorithm-related embarrassments. Google Photos got into trouble for categorizing blacks as gorillas.

gorilla

Google Search was considered as biased for suggesting results or for favoring one media outlet over another. I researched this topic and found that Google News was, in fact, biased in its suggestion of certain media. That has not changed much but Google Search seems more objective than it was.

Facial recognition used for security at airports falsely classifies about 1500 people a week as terrorists. This is because those people looked similar enough to people in a database of known terrorists. The designers of the algorithm knew, in advance, what they wanted the algorithm to look for and used the miscategorization as an opportunity to hone the algorithm’s abilities. That’s all well and good, but what if airport security simply accepted the algorithm’s original assessment of a person as a terrorist? This may not be a problem when you have an actual standard to compare the results to, but not all algorithms work this way.

What researchers really want is an algorithm to use big data to predict future trends. In these cases, there is no standard (such as ‘dogness’) to be compared with. According to Dr. Genevera Allen of Rice University in Houston, depending too much on the results gleaned from algorithms is leading to a “crisis in science”. Algorithms are only as good as the data they are fed on. Bad datasets can lead to bad or misleading conclusions. In the worse case scenario, researchers may choose the conclusions that they want to find. In other words, they may lose their scientific objectivity or may prime the algorithm to find conclusions they hope to find. The algorithm itself is not biased but its users may be.

A main, if not the main, principle in science is that research results should be reproducible. What scientists are finding, however, is that the conclusions based on one dataset may differ completely when the same algorithm is presented with a different dataset on the same topic. An often quoted statistic is that “85% of all biomedical research carried out in the world is wasted effort”. This statistic is misleading. The actual statistic comes from an article by Chalmers and Glasziou that was published in The Lancet in 2009. They made the point that only half of all funded research is published, and, of that, 50% is flawed to the point of being irreproducible. We can conclude only that some of the irreproducibility is due to faulty algorithms or faulty use of algorithms.

Another feature of algorithms that can baffle researchers is termed, “the black box problem”. An algorithm may find unexpected patterns in data and how it did so may be a mystery to the researchers using the algorithm. Biases and prejudices may emerge from what seems like objective data and it may be impossible to tell if this is because of a flawed database, a biased interpretation, or a simple reflection of an uncomfortable truth.

Nature was once considered one of the most trusted publications in science. Its reputation, however, has been tarnished by irresponsible use of data by some researchers and a failure by the editors of Nature to be scientifically skeptical. The article, Quantification of ocean heat uptake from changes in atmospheric O2 and CO2 composition by L. Resplandy, R. F. Keeling, Y. Eddebbar, M. K. Brooks, R. Wang, L. Bopp, M. C. Long, J. P. Dunne, W. Koeve, and A. Oschlies, all hailing from the University of San Diego or Princeton, claimed that the oceans were warming 60% more than current estimates and that drastic preventative measures were called for. Because it was published in Nature, and because it plugged nicely into the popular climate change narrative, the results were quickly picked up by other publications and mainstream media and were, thus, spread around the globe, causing no little alarm. However, something must be wrong somewhere when a reader, Nicholas Lewis, saw flaws in the article three hours after publication. “Because of the wide dissemination of the paper’s results, it is extremely important that these errors are acknowledged by the authors without delay and then corrected.” The researchers relied on data from the Scripps Institute, the Argos array of ocean temperature sensors, and their own experimental methods, but came up with seriously flawed results which led to seriously flawed conclusions. The authors admitted that there were errors and Nature printed this correction several weeks later, but the damage was already done. Few major publications or news outlets announced a retraction. Lewis feels that the problem may exist in the computer code used in analyzing the data. We will have to wait for a further analysis to see if this is the case. We need to wonder if this is just the tip of the iceberg.

iceberg

The basic problem with algorithms is that they are so objective. They lack the understanding of human values or at least the same values that those who use them hold. But can machines learn human values? If so, will the values of a Chinese-made machine be different from an American-made machine?

Let me give an example. Imagine that a self-driving car is traveling down a road with its passenger sitting in the back seat. At the same time, a grandmother, pushing a baby carriage, suddenly loses control of the carriage and it rolls into the street in front of the car. Imagine that the car has three choices. If it turns left to miss the carriage it will crash into oncoming traffic and risk injury to the passenger. If it turns to the right, it will hit, and possibly kill, the grandmother. The third choice is to hit the carriage. It must ‘decide’ instantly based on what it has ‘learned’ to do. What would you do if you were the driver and faced with the same three choices? You might be surprised to find that the car would choose to hit the carriage. Why? Because it may not have learned that carriages often contain babies. To the car, it is only a carriage with the trait, ‘non human’ attached to it. However, even if it learned this, it would have to learn accompanying social values. Are passengers more important than babies? Are babies more important than old people? These values could be culturally based or may be adapted to the philosophy of the car’s owner. In fact, I would predict that self-driving cars will be developed that learn the driving habits of its owner and will be programmed to align with the owner’s philosophy. The one-size fits all self-driving car will be only a temporary phenomenon.

The scenario outlined above highlights the sensitivity of algorithms to manipulation by nefarious agents. Algorithms could be manipulated to have weapons misfire or fire on the wrong target, for example. A database, itself, could be infected to make algorithms reach dangerous conclusions. Hacking and cybersecurity may evolve into battles between algorithms as opposing algorithms try to out maneuver each other. In the end, it may become impossible to tell how reliable any conclusion based on big data actually is. If you think that reality is difficult to pin down now, wait until the future.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s