Making Self-Driving Cars Hallucinate

There’s a rumbling on the horizon. It’s the sound of a paradigm shift: A paradigm shift in hacking. This change is to hacking what gene therapy is to traditional surgery. Both seek to attain some outcome by manipulating deeper levels of a target.

For hackers, the deeper levels I’m speaking of are found in self-learning algorithms. I should note up front that I will make this discussion non-technical, which, admittedly, may be oversimplified for some. However, delineating types of neural net systems is not my goal. My goal is to show that the next generation of hackers will be targeting such self-learning systems because they have no choice. They simply have to do this because these systems are proliferating at such a rapid rate.

So what do such systems do? Why are they called, ‘self-learning? These systems, unlike traditional computer systems, do not need to be programmed. They are presented with sets of data and, over time, during a training period, they find patterns in such data. Sometimes they are told to look for certain patterns and at other times they are free to find their own.

Let’s say I want a neural network to learn the difference between a circle and a square. I present a set of images to the net and they, at first, randomly assign each image to a category. If the category of the assignment is correct, it is given feedback that indicates this. This is not as simple as it may sound. The computer has to learn what a border is; that is, how to distinguish the border of the object from its background. Slowly, it gets better at performing this task. It learns the difference between a straight and curved line. It begins to differentiate a square from an octagon or a rectangle. It can tell a circle from an ellipse. In fact, you could develop it to learn many different shapes. This all takes a lot of time, but, for this price, the net has a built in ‘creativity’. If suddenly presented with a novel shape, like a trapezoid, for example, it may at first classify it as a rectangle, until it learns to recognize its inclusion into its own new category while, incidentally, learning that rectangles have right angles at their corners. It will not, of course, be able to verbalize these findings. It learns in the same way that a child learns language, by inducing patterns on random sounds without being able to express the grammatical rules it is using to do so.

The immense power and creativity in these nets can be seen in the way computers have come to dominate human chess players. It is now impossible for the best human player to beat the best computers. Computers now play each other. It’s the only real competition left. Humans are left using these computers as trainers to improve their own techniques.

Recently, a computer program, called AlphaGo, beat Ke Jie at the game of Go. Ke Jie was considered a prodigy in the game of Go, but he was no match for the computer. Ke later stated he believed his emotions got in the way of winning. He said at one point when he realized he had a chance to win, he got too excited and made some stupid moves. But in a previous match with a Korean player, AlphaGo made a stupid move…or at least that was what other experts watching the game thought. In fact, it was a move that was so creative, that none of them saw it for what it was; the move that decided the outcome of the game. But could hackers manipulate these programs to arrange a way for a human to win?

To answer this question, it is necessary to look at a recent paper by three researchers at New York University. These researchers showed that it was possible to corrupt the learning process to trigger an ‘illogical’ outcome during future performance. In the chess example given above, the hackers could put a trigger into the learning process so that at a point when the pieces are in one particular arrangement, the computer would make the wrong move. It would ignore its normal pattern-based output in this one instance because that was what it was taught to do.

The researchers demonstrated this with a program called, MNIST, which is programmed to recognize handwritten digits (0 to 9). They showed that they could contaminate the data output so that each digit, i, would read as (i + 1). Thus, the program would recognize a 1 as a 2 and a 2 as a 3. They did this by putting a pixel or group of pixels into the corners of some images in the training dataset, so that the program, when ‘seeing’ these pixels, would deliver the ‘poisoned’ output. If the pixel ‘backdoors’ are not seen, the program works as usual.

badnet backdoor

I’m not sure who uses this handwritten-number recognition program, but you can probably see the problems that would arise if numbers were misread at certain times,

Real world problems arise when similar backdoor attacks are used on traffic sign recognition programs. These programs are used in virtually all self-driving cars. You can imagine the problems that would arise if a stop sign or stop light was not recognized as such.

To see if they could retrain such a system, the researchers used a pre-trained neural net that was taught to recognize US traffic signs. The three main sign categories are stop signs, speed limit signs, and warning signs. There are numerous subsets within these categories but these were not important in these poisoning hacks.

The researchers superimposed small yellow images on the stop sign dataset. These images, in real life, would be about the size of a yellow Post-it note, as seen below.

stop sign

They then retrained the system and, to make a long story short, this poisoning resulted in the program misclassifying stop signs with these symbols as speed limit signs.

That’s all very well and good with images being misclassified, but would this misclassification actually occur in real-world situations? To test the sign recognition program, the researchers put a Post-it note on a real stop sign.

stop sign real

Sure enough, the program misclassified the stop sign as a speed limit sign. It should be evident that if such hacks made it into self-driving cars, chaos could result as simple Post-it notes could cause serious accidents in whatever area of a city the hacker wanted to target. A sudden increase in accidents in one area of a large city could cause a large deployment of police, ambulances, and other emergency vehicles which would basically shut down that area. This could be used to distract attention from unlawful actions occurring in another area of the city.

But how could hackers actually begin such hacks? They could use traditional hacking methods (spearphishing) to get within the network of companies that make pre-trained neural nets for various purposes. They could then remotely poison the learning process. They could pay insiders to inject the special training set, or they could build their own poisoned net and sell it as a valid one. Many companies such as Google, Microsoft, and Amazon offer Machine Learning as a Service (MLaaS). But the most likely scenario is for hackers to simply buy pre-built nets and then retrain them. Keep in mind that these nets will perform perfectly well until they are triggered by the pre-set backdoor to do something they should not.

Here are some common ways machine learning is used and some ways hackers could manipulate them through corrupting the learning process.

Spam filters – avoid detection

Antivirus – avoid detection

Stock Market – manipulate different stocks or trading activity

Search engines – manipulate search result to reflect certain biases

Marketing – promote select products through online ads

The possibilities for the use of machine learning are continuously expanding. The expansion would occur at the end nodes in this diagram from an online advertising site.

machine learning diagram

What can you do? Not much. These algorithms will work quite normally and the changes in them will remain undetected. In addition, all of these algorithms will occasionally make mistakes. That’s just the way it is with self-learning machines. So even when the programmed mistake occurs, it may be thought of nothing more than a one-off bug.

For the time being, I know of no instance where self-learning machines have been poisoned with false data, but, then again, how would anyone know?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s