How Tor Works
Tor is a tool for anonymity, used by people seeking privacy and fighting internet censorship. Over time, Tor has become quite effective at its job. That’s why the security, stability, and speed of this network are critically important for those who rely on it. But how does Tor work “under the hood”? In this article, we’ll dive into the structure and protocols used in the network to get a close look at how Tor operates.
A Brief History of Tor
The concept of onion routing (we’ll explain the name later) was first proposed in 1995. Initially, this research was funded by the U.S. Naval Research Laboratory, and in 1997, DARPA joined the project. Since then, the Tor Project has been funded by various sponsors, and not long ago, it won a fundraising campaign on Reddit.
The code for the modern version of Tor software was released in October 2003, marking the third generation of onion routing software. The idea is to wrap traffic in encrypted layers (like an onion) to protect the data and the anonymity of both sender and receiver.
Tor Basics
With the history covered, let’s move on to how it works. At the highest level, Tor operates by routing your computer’s connection to a target (like google.com) through several intermediary computers, or relays.
- Packet Path: Guard node, middle node, exit node, destination
As of February 2015, about 6,000 routers handle traffic in the Tor network. They’re spread worldwide and run thanks to volunteers who donate some of their bandwidth for the cause. Most nodes don’t require special hardware or extra software—just the Tor software configured as a node.
The speed and anonymity of the Tor network depend on the number of nodes—the more, the better! This makes sense, since each node’s bandwidth is limited. The more nodes there are to choose from, the harder it is to track a user.
Types of Nodes
By default, Tor routes traffic through three nodes, each with its own role:
- Guard (Entry) Node: The entry point into the network. Guard nodes are chosen from those that have been running for a long time and have proven to be stable and fast.
- Middle Node: Passes traffic from the guard to the exit node. As a result, the guard doesn’t know anything about the exit node.
- Exit Node: The exit point from the network, sending traffic to the destination the client wants.
Running a guard or middle node is usually safe—using a virtual server (like DigitalOcean or EC2) means operators only see encrypted traffic. However, exit node operators have special responsibility. Since they send traffic to its final destination, any illegal activity conducted through Tor will be associated with the exit node. This can lead to police raids, notices of illegal activity, and other issues. If you meet an exit node operator, thank them—they deserve it.
Why the Onion?
Now that we understand the route connections take through nodes, let’s ask: how can we trust them? Can we be sure they won’t hack the connection and extract all the data? In short—we don’t need to trust them!
The Tor network is designed so that nodes require minimal trust. This is achieved through encryption.
So what about onions? Let’s break down how encryption works when a client connects through the Tor network:
- The client encrypts the data so only the exit node can decrypt it.
- This data is then encrypted again so only the middle node can decrypt it.
- Then, the data is encrypted once more so only the guard node can decrypt it.
This means the original data is wrapped in layers of encryption—like an onion. Each node only has the information it needs: where the encrypted data came from and where to send it next. This benefits everyone—the client’s traffic isn’t exposed, and nodes aren’t responsible for the content of the data they relay.
Note: Exit nodes can see the original data, since they have to send it to the destination. They can extract valuable information from traffic sent in plain text over HTTP or FTP!
Nodes and Bridges: The Node Problem
After launching the Tor client, it needs to get lists of all guard, middle, and exit nodes. This list isn’t secret—later, we’ll discuss how it’s distributed (look up “consensus” in the documentation). Publicity is necessary, but it comes with a problem.
Let’s think like an attacker: what would an Authoritarian Government (AG) do? Since Tor helps bypass censorship, an AG would want to block users from accessing Tor. There are two ways to do this:
- Block users exiting from Tor
- Block users entering Tor
The first is possible and is up to the router or website owner. They just need to download the list of Tor exit nodes and block all traffic from them. This is bad, but Tor can’t do much about it.
The second option is much worse. Blocking users exiting Tor can prevent access to certain services, but blocking all users entering Tor would make the network useless for those already suffering from censorship—the very people who need Tor. If Tor only had nodes, this would be possible, since the AG could download the list of guard nodes and block traffic to them.
Fortunately, Tor developers thought of this and came up with a clever solution: bridges.
Bridges
Bridges are essentially nodes that aren’t publicly listed. Users behind censorship walls can use them to access the Tor network. But if they’re not published, how do users find them? Isn’t a special list needed? We’ll discuss this later, but in short—yes, there’s a list of bridges managed by the project developers.
It’s just not public. Instead, users can get a small list of bridges to connect to the rest of the network. This list, called BridgeDB, gives users only a few bridges at a time. This makes sense, since users don’t need many bridges at once.
By giving out only a few bridges, it’s harder for an Authoritarian Government to block the network. Of course, as new bridges are discovered, they can be blocked too, but can anyone find all the bridges?
Can All Bridges Be Discovered?
The bridge list is strictly confidential. If an AG gets this list, it could block Tor entirely. That’s why the network’s developers have researched how bridges might be discovered.
Two main methods have been used to find bridges:
- Running a middle node, which can monitor incoming requests. Only guard nodes and bridges connect to a middle node—if a connecting node isn’t in the public list, it’s obviously a bridge. This is a serious challenge for Tor and any similar network.
- Scanning the entire IPv4 address space using a port scanner like ZMap, which found 79% to 86% of all bridges.
Since users can’t be fully trusted, the network must be as anonymous and closed as possible, which is why it’s designed this way.
Consensus
Let’s look at how the network functions at a lower level. How is it organized, and how do you know which nodes are active? We’ve mentioned the node and bridge lists. Who creates these lists?
Every Tor client contains fixed information about 10 powerful nodes maintained by trusted volunteers. These are called directory authorities (DA). They’re distributed worldwide and are responsible for distributing the constantly updated list of all known Tor nodes. They decide which nodes to work with and when.
Why 10? Usually, you don’t want an even number in a committee to avoid tie votes. Nine DAs handle node lists, and one (Tonga) manages the bridge list.
Achieving Consensus
How do DAs keep the network running?
- Each DA creates a list of known nodes.
- They calculate other data—node flags, traffic weights, etc.
- They send this as a “status vote” to the others.
- They receive votes from the others.
- They combine and sign all parameters from all votes.
- They send the signed data to the others.
- A majority of DAs must agree and confirm consensus.
- The consensus is published by each DA.
The consensus is published via HTTP so anyone can download the latest version. You can check it yourself by downloading the consensus through Tor or the tor26 gateway.
Anatomy of the Consensus
Just reading the specification, the document can be hard to understand. Visual representations help clarify the structure. For this, a corkami-style poster was created to graphically represent the document.
What Happens If a Node Goes Rogue?
So far, we haven’t discussed how exit nodes work in detail. These are the last links in the Tor chain, providing the path from client to server. Since they send data to the destination, they can see it as if it just left the device.
This transparency means a lot of trust is placed in exit nodes, and they usually act responsibly. But not always. What happens when an exit node operator turns against Tor users?
The Sniffer Problem
Tor exit nodes are almost a textbook example of a “man-in-the-middle” (MitM) attack. This means any unencrypted communication protocols (FTP, HTTP, SMTP) can be monitored. This includes logins, passwords, cookies, and files being uploaded or downloaded.
Exit nodes can see traffic as if it just left the device. The catch is, there’s nothing we can do about this (except use encrypted protocols). Sniffing, or passive network listening, doesn’t require active participation, so the only defense is to understand the problem and avoid sending important data unencrypted.
But suppose an exit node operator wants to do more harm. Sniffing is for amateurs. Let’s modify the traffic!
Going Further: Attacks on Exit Nodes
- SSL MiTM & sslstrip: SSL ruins the fun for attackers, but many sites have implementation issues that let attackers force users onto unencrypted connections. Examples include HTTP-to-HTTPS redirects or mixed HTTP content on HTTPS sites. Tools like sslstrip can exploit these vulnerabilities. Attackers can also use self-signed certificates to peek into SSL traffic passing through the node.
- Injecting BeEF into Browsers: By analyzing traffic, attackers can use the BeEF framework to gain control over browsers. They can then use Metasploit’s “browser autopwn” feature to compromise the host and execute commands.
- Backdoored Binaries: If binaries (software or updates) are downloaded through the node, attackers can add a backdoor using tools like The Backdoor Factory. When the program runs, the host is compromised.
How Bad Exit Nodes Are Caught
While most Tor exit nodes behave well, destructive behavior does occur. All the attacks described above have happened in practice.
Fortunately, developers have created a precaution against clients using bad exit nodes. It works as a flag in the consensus called BadExit.
To catch bad exit nodes, a clever system called exitmap was developed. For each exit node, a Python module logs in, downloads files, and more. The results are recorded. Exitmap uses the Stem library (for working with Tor in Python) to build profiles for each exit node. Simple, but effective.
Exitmap was created in 2013 as part of the “bad onions” program. The authors found 65 exit nodes modifying traffic. While this wasn’t catastrophic (there were about 1,000 exit nodes at the time), it was serious enough to warrant monitoring. Exitmap is still maintained today.
In another example, a researcher created a fake login page and logged in through each exit node. Then, HTTP server logs were checked for login attempts. Many nodes tried to access the site using the credentials the researcher had used.
This Problem Isn’t Unique to Tor
It’s important to note that this isn’t just a Tor problem. Between you and the cat photo you want to see, there are already plenty of nodes. It only takes one malicious person to do a lot of harm. The best thing you can do is force encryption wherever possible. If traffic can’t be read, it can’t be easily modified.
And remember, these are just examples of bad operator behavior, not the norm. The vast majority of exit node operators take their role very seriously and deserve a lot of gratitude for the risks they take in the name of free information.