Anders Tornblad

All about the code

403s for the Naughty List

As I mentioned in Complete Blog Remake, Part 2, there are lots of evil bots out there. They are relentless in their automated search for known exploits, and a lot of those target WordPress installations and plugins. Most of these go through the normal HTTP protocol, trying to find URLs that are routed to some badly written, exploitable PHP code. In my logs, I find thousands of calls to /xmlrpc.php, /wp-admin/admin-ajax.php, /wp-content/uploads/locate.php and others where there are current or older versions that expose known SQL injection or script injection exploits.

Because of how my routing is built, all of these requests are interpreted as possible article titles and sent to the ArticleController's Single(string postname) method, which searches for an article with a weird name, doesn't find it, and responds with a 404 page. The request gets logged by Azure, and when there are many bots (or just one unusually intense one), Azure alerts me of having many client errors in a short time period.

In the beginning, I used these logs to double-check that I hadn't missed any incoming links, but because of the huge amount of bots out there, the requests that I'm really interested in gets drowned out by the low signal-to-noise ratio.

Building the naughty list

Some requests could be people or crawlers (Google, Yahoo, Baidu, ...) just doing their job, following links that may or may not lead somewhere, so I don't want to blindly and automatically block the IP address of everyone making mistakes in typing or following a misspelled link. But if there are a few bad requests from the same IP address (say eight in 24 hours), I will block them.

Other requests are just blatant attempts at finding exploits. I will block the IP address of those calls instantly. The Single method makes use of the PageNotFound method of the base class, so the result is really straightforward:

public ActionResult Single(string postname) { if (postname.StartsWith("xmlrpc.php") || postname.Contains("wp-admin") || postname.Contains("wp-content/plugins")) { return PageNotFound(403); } /* Edited out: Code that searches for the requested article */ if (article == null) { return PageNotFound(); } }

The PageNotFound method of the base class isn't too complicated either. It calls the ApplicationData class to handle the list of suspicious or blocked IP addresses:

public ActionResult PageNotFound(int statusCode = 404) { if (applicationData.SuspectUserAddress(Request.UserHostAddress, statusCode == 403)) { return new HttpStatusCodeResult(403); } else { /* Edited out: Code that gives a nice 404 page */ } }

And here is finally some of the code that keeps track of suspicious IP addresses:

internal bool SuspectUserAddress(string address, bool confidentSuspicion) { // Is this address already blocked? Just return true. if (BlockedAddresses.Contains(address)) return true; // If I'm not sure yet, check some more rules if (!confidentSuspicion) { // How many times has this address acted suspiciously already? int count = SuspiciousRequestAddresses.Count(sra => sra == address); if (count >= 5) { // Do a reverse DNS lookup. Is it NOT a known nice crawler? if (!IsNiceCrawler(address)) { // Then this suspicion is a confident one! confidentSuspicion = true; } } } // Are we sure now? if (confidentSuspicion) { // Remove from list of suspicious requests SuspiciousRequestAddresses.RemoveWhere(sra => sra == address); // Add to list of blocked addresses BlockedAddresses.Add(address); return true; } else { // We are not sure... That means this request should be stored as a suspicious one SuspiciousRequestAddresses.Add(address); return false; } } private bool IsNiceCrawler(string address) { var parsed = IPAddress.Parse(address); var hostInfo = Dns.GetHostEntry(parsed); // Something like (google.com$)|(googlebot.com$)|(msn.com$)|(crawl.baidu.com$) string validationRegex = ConfigurationManager.AppSettings["NiceCrawlersRegex"]; // Check all of hostInfo's aliases for one that matches the regex bool isNice = hostInfo.Aliases.Any( alias => Regex.IsMatch(alias, validationRegex, RegexOptions.IgnoreCase) ); return isNice; }

After doing this, the amount of 404s went down by a lot, but the 403 errors started rising. I checked a few times to see that the blocked requests are really exploit attempts, and I feel comfortable with this solution.

Also, I changed my Azure alerts to separate the different 4xx responses. I still want those unhandled 404s to generate an alert so that I can fix broken links. This works really well for me.

Complete blog remake, part 1
Complete blog remake, part 2
403s for the Naughty List (this part)

Add a comment