For a number of reasons, some people value remaining anonymous while online. Experts in the field often give such people a list of ways to attain this goal. Among the most common suggestions are, use a VPN, use encrypted messaging apps, use the Tor browser, use secure or temporary email, and don’t use Google. There are many more suggestions but at some point a user looking for anonymity may sacrifice the quality of their online experience. For example, true anonymity must include not using social media, using an obscure operating system, disabling all cookies, and using cryptocurrency. Many sites will block your access to them unless you allow cookies, and others won’t deal with Bitcoins let alone lesser known cryptocurrencies like Monero.
No one wants anonymity more than cyber criminals. One main allure of joining the cybercrime world is that the chances of being caught are lower than in the real world. Add to this the fact that these crimes don’t have as severe penalties as regular crimes and that the average beginning hacker in a crime gang can make over $40,000 a year, and you have cybercrime as a valid career choice. If cybercrime is ever to be eradicated, it won’t be because of occasional takedowns by law enforcement. It will be because cybercrime becomes too risky. The main risk will come from technology that can undermine anonymity and expose cyber criminals.
That’s where something called stylometry comes into play. Stylometry analyzes a person’s writing style to identify them. A writer’s style is as unique as their fingerprints. So, when criminals go to the deep web to monetize their stolen goods, they will need to do some marketing. They need to advertise what it is they have. In so doing, they risk having their identity compromised. So how exactly is this done?
I will focus on how deep web criminals are exposed by stylometry because these are the people who need anonymity the most. However, it should go without saying that normal users seeking normal anonymity can and will be exposed in the same way, only it will be much easier to do. In short, the normal user who wants full anonymity will not find it. If marketers and data collectors such as social media want to identify you and market to your interests, they will do so one way or another.
If you manage to navigate all the obstacles put in your way to visit a deep web market, you may be surprised to see that it’s not much different in its design from any market you’d visit on the open internet. It’s the products being sold that are different. Here’s an example.
You’ll see a picture of the product for sale on the main page and if you click on it you’ll get several more photographs and a description. The one above is only part of the description you will find. Other information on the product is also available.
Law enforcement would, of course, want to find out who’s behind these drug sales. But with all the safety precautions that are in place, that’s not easy to do. Many of these vendors are found on multiple markets even though they may use a different name on each market for a number of reasons, primary among these is the desire to maintain anonymity.
In 2019, a group of researchers developed a self-learning algorithm for identifying vendors on the deep web. They visited a number of deep web markets and had their algorithm scrape vendor information to find out if the same person was selling on multiple sites. They combined photo analysis with linguistic analysis to unmask these vendors. It should be noted that all vendors on these sites must supply photographs of their products, which is why photographic analysis is important.
The most obvious similarities would be in the products that the vendor sells and the information given about the vendors themselves, but that can only go so far. The same limitations apply to photographs, since they may be copied from other sources. The main key to identification, therefore, would come from stylometry; the linguistic analysis of the product and vendor descriptions.
Here are the features the algorithm was trained to look for.
And here is how all the elements in the identification algorithm work together.
The main use for such an algorithm would be to identify big players in deep web drug sales. These can then be targeted through other more traditional law enforcement methods. As such, law enforcement agents wouldn’t waste their time pursuing minor players.
It would be wrong to think that the use of stylometry was limited to the deep web, however. Social media uses it to identify hate speech and misinformation and trace it back to a particular author or group.
For the most part, stylometrics, when used by marketers, is not used to identify a specific person. Marketers only care if they present the right person with the right product at the right time. As long as they tempt you into visiting a page or making a purchase, they have done their job. In this respect, stylometrics is quite accurate in identifying the age, sex, and location of a person based on their writing style alone. In other words, even if you work as hard as possible to be anonymous, you can still be hit with targeted ads. The more you post on social media, the more precisely you can be identified and, yes, it may be possible for your identity to be completely unveiled.
That said, stylometry has its limitations. It has problems differentiating a tweet by a bot from a tweet by a human. Oddly, humans have the same problem. Bots using language generation models can respond to tweets or other posts when triggered by certain words or phrases. It is estimated that 25% of all tweets are made by bots. To add to the confusion, new algorithms have been developed to fool stylometry algorithms. They will rewrite your post to make it look more anonymous. That said, as stylometry algorithms become more and more sophisticated, true online anonymity will remain as only an ideal.