De-Onion: How Tor Site Administrators Get Caught
In this article, I’ll explain how administrators of resources in the Tor Network (the so-called dark web) are identified. We’ll look at the structure of Tor sites, discuss known de-anonymization cases, and cover many other features of this dark corner of the internet, which is often considered anonymous. Along the way, I’ll recommend tools that can help with this work.
You probably already know that sites ending in .onion aren’t ordinary, and you can’t open them in a regular browser without extra effort. The so-called dark web consists of these sites, which are often dedicated to trading illegal goods and services. After all, site admins don’t have to provide contact details during registration, there’s no censorship, and “onion” routing through a chain of proxy servers is supposed to ensure anonymity.
Sites in the Tor Network aren’t indexed by regular search engines, but there are specialized search engines that search only within Tor. As you can see, it’s a whole separate world.
How the Tor Network Works
With regular direct IP routing, it’s simple: one node makes a request to an address, another responds to the same address. In onion routing, any request first passes through three nodes, called Tor nodes. By default, the entry and exit nodes encrypt information so it can pass through the next node.
Is this perfect protection against surveillance? Not quite. In theory, anyone can make their computer an intermediary node and collect data about requests. You might ask, who would want to do that if the information is encrypted? But what if an attacker collects some information before encryption by infecting the entry node? Or, more commonly, the exit node, to get data about requested resources? The latter is the most widespread attack.
Additionally, an attacker can modify or completely change information sent from the server to the client. This can even infect the client’s device with malicious code.
In 2020, a hacker group called KAX17 was discovered, which controlled 900 infected servers that up to 16% of Tor users connected to.
Here are some tools to research Tor nodes:
- TOR Node List — list of nodes
- ExoneraTor — check if an IP was used as a Tor node
- Onionite — node information
- Tor Metrics — node information
- Collector Tor — archive of node IPs and ports
Just like on the regular internet, Tor sites can collect information from clients about screen resolution, number of CPU cores, and other parameters that together can form a unique fingerprint.
That’s why experts recommend not enabling JavaScript on dark web sites, or at least not using the browser in full-screen mode to avoid revealing your screen size. A digital fingerprint isn’t as dangerous as real personal data, but it can single out a unique visitor from a group.
“Onion” DNS
Reconnaissance through Whois and services like DNSdumpster is impossible in Tor, because the onion domain system works very differently. Here are the main differences:
- There’s only one domain zone: .onion. Domains are generated identifiers, so there’s no hierarchical structure with TLDs, SLDs, and subdomains.
- Decentralized storage is the main problem for information gatherers, since you can’t send a Whois request. In classic DNS, domain and IP info is stored on centralized DNS servers. In Tor, .onion domain info and addresses are stored on distributed nodes in the Tor network.
- The protocols are different. Classic DNS uses UDP and TCP requests, but Tor’s DNS system directly queries distributed storage nodes to get the needed address.
TorWhois is like a Whois service for Tor. It lets you get info about open ports, certificates, keys, and robots.txt files.
There’s a study showing that DNS traffic in Tor can be used to accurately determine which sites are visited. Researchers analyzed DNS requests passing through Tor exit nodes and found correlations with specific sites.
You can simply look for domains in requests. Since .onion addresses are generated identifiers, it’s easy to compare them with identifiers in DNS requests and establish matches. This lets you determine which specific sites a user visited via Tor.
In rare cases, admins don’t remove metadata from files uploaded to the site, and metadata can include info like camera model, name, geolocation, and more. Even regular social networks now remove metadata when files are uploaded.
Site Structure
Tor sites use regular CMSs, just like “clearnet” sites. Inside, it’s the same HTML, CSS, and other familiar technologies. For example, you might see a site built with Bootstrap. Using popular technologies opens up the possibility for automated auditing for reconnaissance. For this, you can use:
- Onionscan (onion site audit)
- Onion Nmap (Nmap for onion sites)
- OWASP ZAP (scanner)
- Nikto (scanner)
- WPScan (scanner)
- Burp Suite (scanner)
- Wapiti (scanner)
- Mitre.org vulnerability list
The Shadow Economy
The dark web is most often used for trading illegal goods and services. The money earned then needs to be laundered, and sellers invent the most sophisticated schemes, usually involving cryptocurrency. It’s at the money-out stage that marketplace owners most often get caught.
Imagine: a client buys crypto, uses it to buy something on the dark web, the crypto is held in the marketplace’s deposit, then most of it goes to the seller, who then tries to exchange it for fiat currency.
This means you can identify which exchange the seller uses if you know their crypto wallet address. You just need to visualize their activity with a special program. The exchange’s wallet will, of course, have a huge number of transactions and a large sum of money.
Visualizers are often paid, but there are some free ones:
Crypto mixers are often used for laundering money. They help hide crypto assets by distributing them among many wallets, then combining them again. This makes tracking transactions harder, but not completely anonymous.
If you visualize transactions from a wallet that used a mixer, you might notice:
- Many inputs and outputs in one transaction, including addresses not linked to the original wallet
- Mixing of funds between different addresses and wallets
- Links to other transactions — chains and clusters associated with a bitcoin mixer
- Non-uniform transaction amounts
- Unusual time intervals between transactions
Finding the real buyer’s address is hard, but possible. However, there’s no open-source software for analyzing mixer transactions yet, so you have to follow the transaction chain until you find something resembling a personal wallet.
Money laundering and tracking is a huge topic on its own, but you should know the basics. There are countless schemes for legalizing criminal proceeds, from creating offshore companies to buying various assets. We won’t cover all that here.
Search Engines
Search engines and dorks (search query recipes) have always been the main weapon of modern OSINT specialists, and it’s the same in Tor. Let’s look at which search engines index the dark web.
Here are search engines available in the clearnet that index onion sites:
Many of these are convenient and let you combine results from the clearnet and dark web.
Here are search engines with sites in the Tor network (links are to onion addresses):
- DuckDuckGo
- Not Evil
- Ahmia
- Haystak
- Torch
- Demon
With these, you can try basic dorks like exact match search (double quotes), specifying the site to search (the site
operator), intext
operator, and so on. Most search engines support this.
For more on dorks, see the articles “Using Little-Known Google Features to Find the Hidden” and “Google as a Hacking Tool: Actual Google Dork Recipes.”
If your goal is to identify a forum administrator, any reconnaissance method is fair game. For example, if you know their interests, you can check thematic forums for mentions of their nickname.
Here’s a sample query to search the “Hacker” forum archive for the user moon:
site:oldforum.xakep.ru intext:moon
Speaking of thematic forums, there are wikis that collect links to dark web sites, making it easy to find criminal forum addresses. Here are some:
If you know someone is interested in, say, reading, you can check the relevant forum sections.
They’re Human Too!
Forum users and marketplace admins aren’t robots, so they make mistakes. For example, someone might send their photo to a person they met online. I’ve personally heard of several cases where admins of major illegal platforms were caught after being lured into a meeting. Experts use all sorts of traps and honeypots to trick criminals into downloading a file, clicking a link, or even using a fake app or marketplace.
Traps
Traps like IP Logger or Canary Tokens are the simplest and cheapest options. With Canary Tokens, you can deploy your own server using a ready-made Docker image provided by the developers. This tool has many interesting features and is often underestimated.
As for IP Logger, I don’t recommend using it to track professionals. It’s more like a toy than a real tool, and any advanced user will immediately suspect something’s up.
Fingerprinting
Since Tor sites can use all standard technologies, fingerprinting (tracking users via unique fingerprints) can work here too.
For example, check out AmIUnique.org. The service easily detects engine version, OS, language, fonts, plugins, and with some accuracy, supported audio and video plugins. It’s not precise identification, but it can help single out a suspect from thousands.
Tor Browser specifically masks screen resolution to make identification harder, and users can spoof their fingerprint using the canvas tag. All this makes fingerprinting less accurate, but doesn’t prevent it entirely.
There are more sophisticated tactics based on fingerprinting. Not everyone knows that if you open Tor Browser and a regular browser, then switch between them with hotkeys or the mouse, you can reveal a link between your real IP and your Tor IP. Unique patterns like mouse cursor position can be tracked. The same goes for using two tabs in Tor Browser. Tor will use different entry nodes for them, but if JavaScript is enabled, the connection between tabs can still be established.
Text Analysis
Everyone has their own style of messaging on social networks, and forum and marketplace admins are no exception. Some people often put spaces before commas, some don’t like capital letters, and some have a broken keyboard with a key that often doesn’t work.
All these little quirks can help find other accounts on other forums, social networks, and so on. It’s said that Ross Ulbricht, owner of the major Silk Road marketplace, made exactly these kinds of mistakes.
Crawlers, Spiders, Scrapers
There are different types of tools for collecting data online:
- Crawler — a program that automatically visits sites and collects information, like a spider, but can gather different types of data.
- Scraper — a program that extracts data from websites, often automatically, and saves it in a structured format for further use or analysis.
- Spider — a program that automatically follows links on sites, analyzes page content, and indexes them for search or other purposes.
These tools are useful for analyzing Tor sites. They help collect info about images, directories, and all sorts of site structure data. They’re interesting because they give you maximum information about what’s happening on a site without visiting it directly.
Start with crawlers; you can use them to collect specific types of data on a site, like photos, videos, text, etc. For example, you might want to go through all the photos on a site and find those with metadata.
Here are some crawlers for Onion:
Scrapers work according to a set algorithm that determines what data to collect and how to extract it. They usually make requests to the server, then analyze the returned HTML to extract the needed info. They use various methods like HTML parsing, searching by tags and CSS classes, regular expressions, and so on. Sites are often downloaded in full for further analysis.
Here are some programs and libraries for scraping:
Spiders are designed to index hundreds or thousands of links. For Tor, there are Onioff and Onion Spider.
Forensics
Finally, let’s touch on forensics, not just OSINT. In a forensic examination of a computer that used Tor, you should first check:
- The
C:\Windows\Prefetch
folder, where files related to launching Tor Browser (the browser executable or DLLs loaded during use) may be found. Analyzing their timestamps can show when the browser was launched. - Thumbnail cache, which may store previews of images viewed through Tor. These can be matched to specific sites.
- Pagefile. This may also contain info about browser launches, site visits, and file operations related to Tor use.
- Windows Registry. This helps extract browser settings, browsing history, cached data, and records of installed extensions and plugins.
Memory dump analysis is also an essential part of forensic examination. Dumps contain lots of info about what happened on the computer. You can capture a RAM dump, for example, with Belkasoft RAM Capturer.
For registry analysis, try Regshot.
For network traffic analysis, I recommend Wireshark and NetworkMiner. Wireshark is good for identifying different packet types and connections between nodes. It helps identify protocol characteristics used in Tor. NetworkMiner specializes in analyzing network traffic and finding hidden connections and patterns. It can help detect and analyze activity in the Tor network, including information exchange and use of anonymous proxy servers.
And of course, you should study the Tor Browser’s own database, located at:
TorBrowser\Browser\TorBrowser\Data\Browser\Profile.default
Depending on browser settings, this may store browsing history, bookmarks, saved passwords, cookies, and other user data.
Studying Bitcoin wallet data is a separate and complex topic, but for evidence collection, you can use Internet Evidence Finder.
Conclusion
Despite the apparent anonymity of Tor sites, there are always ways to identify their owners. Some methods are complex and require serious work, but since admins also make mistakes, they sometimes work. I recommend everyone involved in such investigations to use not only the tactics described here, but also those that work in the clearnet.