Using Artificial Intelligence to Unmask Users of the Deep Web

Back in 2019, MIT released a story that got little media attention. It was entitled, Artificial intelligence shines light on the dark web. The few people who visited the site probably just thought, “that’s nice” and continued on their way. However, hidden beneath the somewhat ambiguous title was some surprising information. Apparently, MIT’s Lincoln Laboratory’s Artificial Intelligence Technology and Systems Group had developed an algorithm which could unmask users of the deep web. If true, this information alone could bring down the deep web. Who would take a chance of going there if they knew they could be identified? The whole idea of having a deep web at all is to have your privacy protected.

At this point, I need to add a few explanations. First of all, I differentiate between the ‘deep web’ and the ‘dark web’ not on a structural level but on a behavioral level. Both webs use the Tor browser. Thus, since both webs are structurally intertwined, users with different motivations use the same sites as gateways. They just have vastly diverse goals. On the deep web, no serious crimes are committed. I would not, for example, consider a person who wanted to buy marijuana for personal use as someone who was using the dark web, even though they were trying to obscure what they were doing by using the Tor browser. However, I would accuse a person who was trying to buy and distribute opioids as engaging in dark web activity because opioid addiction can actually kill people. In short, dark web activities are those which most people would agree are illegal and, indeed, there are markets on the hidden internet that cater only to such activities, just as there are markets that cater only to less serious activities. This distinction is important because, for the most part, law enforcement (LE in deep web slang) is really only interested in disrupting dark web activity.

The Lincoln Lab’s algorithm targets those denizens of the dark web who are engaged in illegal activities on multiple sites. According to one of the developers, Charlie Dagli, the algorithm combs through massive amounts of data in an effort to identify particular user patterns. To this end, the algorithm looks for, “How they identify to others, what they write about, and with whom they write to.”

In other words, if, on one dark web market, a person uses the identity Santa Claus and on another site uses the name Santa Claws, the algorithm will detect the similarity. If the person uses the same phrases in their posts and tends to write to people interested in particular subjects, the algorithm sees a pattern and determines the users are the same. So the more sites the criminal appears on, the more their chances are of being exposed. Still, this may not actually identify them. For that, the algorithm would have to look for similar activity on the regular internet and, step by step, they follow the trail to a site, like a social media site, which will actually expose the criminal to the point that the person can be arrested. Any social media site must give law enforcement personal information on its users if those officials have a compelling reason to ask for it. These sites can, theoretically, resist, but they usually don’t.

Notice that this technique does not use glitches in the Tor browser itself. However, if you combine this algorithm with traditional methods of compromising Tor, such as gaining control of entry and exit nodes, the dark web becomes a much more dangerous place for criminals to go. If, into this mix, you throw in the relatively new technique of intercepting DNS resolution, you have a deep web that is on the brink of collapse.

For obvious reasons, intelligence agencies don’t want to reveal the tricks they use to capture dark web criminals. However, the British intelligence service, GCHQ, has recently released a report stating that they would be using AI to identify traffickers and to disrupt disinformation efforts by foreign governments. Although they give no details, it is clear from their description that the AI they are depending on is based on algorithms similar to, if not identical to, those presented above. Interestingly, they also mention the fact that foreign governments are using algorithms to penetrate networks and post disinformation. So we are left with what could be a future scenario of algorithms battling each other for dominance. More on this later.

For the moment, it appears that LE is gaining the upper hand on the dark web in a way that they have never done before. As one dark web user wrote,

 “there has been a lot of disruption lately on the Dark Net and it has created instability and therefore opportunity for LE to identify exploits. LE has been taking down sites and other DN organizations at an alarming rate and the amount of information they obtain from these takedowns are massive.”

In January, what was purportedly the dark webs biggest marketplace, DarkMarket, was taken down and its owner, a 34-year-old Australian, was taken into custody. The multitude of LE agencies participating in this takedown used the site’s own logo to revel in their triumph.

Logo Before Takedown                  Logo After Takedown     

But how was LE able to infiltrate such a heavily protected market? Well, don’t expect LE to tell you their methods; however, around the same time MIT published its article on Lincoln Laboratories in May of 2019, LE took down the dark web’s Wall Street Market and seized its servers in Germany. Coincidentally (?), this was the same time that DarkMarket started to use those same servers. Some may even wonder whether LE was part of the DarkMarket from the beginning, as it is well known that LE joins all of these sites. Sources say that LE made its move only when they suspected that the DarkMarket owners were preparing for an exit scam. An exit scam is somewhat common on dark web markets. In these scams, the site’s owners simply leave with all of their customers’ money.

The deep web, as we know it, may be dead, but this doesn’t mean that dark web activity is dead. What this means is that the dark web is in the process of becoming deeper and more hidden. Petty criminals and non-criminals may hang around on Tor-based market sites which offer good enough protection, but the serious criminals have already moved on. Most have gone to encrypted sites like Telegram to set up their communities. There are also some new approaches to deep web browsing like Lokinet and Yggdrasil which offer more decentralized services that avoid weaknesses in the Tor browser.

But this would not really stop fingerprinting algorithms from identifying criminals. It could be that at some point LE determines that it simply isn’t cost effective to search for code-based vulnerabilities in these new dark web internets. Instead, they may focus their attention on developing better algorithms which collect data from marketplaces, forums, and chat rooms and use this information to unmask suspicious individuals.

As I alluded to previously, it now appears as if criminals and hostile nation-states are employing their own algorithms to penetrate targeted networks. According to researchers, these algorithms can scan networks for vulnerabilities, find key individuals to spearphish, and even compose or select the appropriate spearphishing email. Because these are algorithms, they can do all of this hundreds of times faster than humans could and, thus, boost their success rate significantly. Such AI-based attacks are unique in that even failures are the key to success. By this I mean that these algorithms keep learning even when they fail. The failures help them adjust their attack techniques until they approach perfection more closely.

This all means that humans will be a company’s weakest links. The only way to combat such a wave of attacks and overcome this basic weakness will be to deploy algorithms that can assess, unemotionally, every spearphishing email that enters the network before it is accessed by a human. Again, the speed of analysis will be key to keeping a network safe, and only computers with their algorithms could do such an analysis fast enough.

As the battle between offensive and defensive AI develops, the battlelines will repeatedly swing back and forth as both sides have moments when they are in the ascendancy. To a large degree, this means that victory could depend on the speed of the computer using the algorithm. Super computers may come into play here and the owners of these will be positioned to make some good money by renting computer time, if they are allowed to do so. However, since nation-states have permanent access to such computers, they will become the main competitors in this arena.

But, for the moment, the dark web has become a much more dangerous place for criminals and we can expect more stories about criminal networks being infiltrated and dark web markets being taken down. Sadly, all of these victories will only be temporary.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s