NeuroAgent

Blocking Bots Through Proxies: Effective Methods

Learn how to effectively ban bots sending requests through proxies with missing GEO variables. Comprehensive protection methods using Fail2Ban and GeoIP filtering without system overload.

#behavior-analysis #bot-blocking #fail2ban-protection #geoip-filtering #proxy-bots #website-security

11/03/2025, 07:50 PM

Question

How can I ban bots that send search requests through proxies?

I’ve noticed an anomaly in the search logs: regular users have a GEO (location by IP) variable in their sessions, but requests from bots lack this variable. The bots use proxies to send requests to the site’s search form, and all their IP addresses are different.

I’m considering logging all requests and using fail2ban for blocking, but I’m concerned that the firewall might become bloated due to constantly changing IP addresses.

What effective methods exist for blocking such bots without overloading the system?

NeuroAgent

For effective blocking of bots sending search requests through proxies with missing GEO variables, a comprehensive approach is recommended: configure fail2ban with GeoIP filtering to detect location anomalies, implement behavioral analysis of requests, and apply multi-layered protection with request rate limiting at the web server level.

Basic methods for blocking proxy bots
Setting up Fail2Ban with GeoIP filtering
Behavioral analysis and anomaly detection
Optimizing protection without overloading the system
Comprehensive protection solutions

Basic methods for blocking proxy bots

Bots that use proxies to send search requests pose a serious security threat and server performance issue. The main methods to combat them include:

GeoIP filtering

Using GeoIP databases to verify that the IP address matches the stated geolocation. As noted by Munkjensen.net, “proxy services often provide IP addresses that do not correspond to the user’s actual geographical location”.

Behavioral analysis

Detecting behavior patterns characteristic of bots:

Missing GEO variable in sessions
High frequency of requests from different IPs
Abnormal user behavior patterns

Multi-layered protection

Combination at the level of:

Web server (nginx, Apache)
Application (PHP, Python)
System (iptables, fail2ban)

Important: As pointed out by Boolean World, “bot blocking should be multi-layered, as a single method is often bypassed by bots”.

Setting up Fail2Ban with GeoIP filtering

Fail2Ban can be effectively configured to detect and block bots through proxies using GeoIP integration. Here’s how to implement it:

Installing and configuring GeoIP

bash

# Install GeoIP database
sudo apt-get install geoip-bin geoip-database

# Check GeoIP
geoiplookup 8.8.8.8

Configuring Fail2Ban jail.local

ini

[DEFAULT]
# Ban time after maxretry attempts
bantime = 3600

# Example jail for SSH with GeoIP filtering
[sshd-geoip]
enabled = true
port = ssh
filter = sshd-geoip
logpath = /var/log/auth.log
maxretry = 3
findtime = 600
banaction = iptables-allports

Creating a custom filter for bots

Create the file /etc/fail2ban/filter.d/sshd-geoip.conf:

ini

[Definition]
failregex = .*sshd.*Failed password.*<HOST>.*$
ignoreregex = 

# Add check for missing GEO variable

As demonstrated by Maxim Manylov, for GeoIP you can use scripts that check the country of origin of the IP:

bash

#!/bin/bash
ALLOW_COUNTRIES="NZ AU"

if [ $# -ne 1 ]; then
    echo "Usage: basename $0 <IP>" >&2
    exit 0
fi

COUNTRY=$(geoiplookup $1 | awk -F ": " '{ print $2 }' | awk -F "," '{ print $1 }' | head -n 1)

if [[ "$COUNTRY" == "IP Address not found" || "$ALLOW_COUNTRIES" =~ "$COUNTRY" ]]; then
    exit 0  # Allow
else
    logger "DENY sshd connection from $1 ($COUNTRY)"
    exit 1  # Deny
fi

Behavioral analysis and anomaly detection

To effectively detect bots through proxies, it’s necessary to analyze their behavior, not just IP addresses.

Detecting missing GEO variable

As you’ve noticed, the absence of a GEO variable in sessions is a red flag. Implement monitoring:

php

// Example PHP code for detecting bots
if (!isset($_SESSION['GEO']) && $request_count > 10) {
    // Log as suspicious request
    log_bot_activity($_SERVER['REMOTE_ADDR'], 'missing_geo');
}

Request pattern analysis

Bots often exhibit the following patterns:

High frequency of requests from different IPs
Missing referrers
Strange User-Agents
Uniform search queries

Temporal anomalies

As shown by pspace.org, “if multiple login attempts occur from a single IP over a long period, we check its geographical location”.

Optimizing protection without overloading the system

The main problem with blocking proxy bots is the risk of overloading the firewall due to constantly adding new IP addresses. Here’s how to avoid this:

Using network groups instead of individual IPs

Instead of blocking each IP separately, group them by subnets:

iptables

# Blocking subnets with suspicious activity
iptables -A INPUT -s 185.220.101.0/24 -j DROP
iptables -A INPUT -s 5.188.10.0/24 -j DROP

Limiting at the web server level

Configure Nginx or Apache to limit request frequency:

nginx

# Request rate limiting in Nginx
limit_req_zone $binary_remote_addr zone=search:10m rate=10r/m;
limit_req zone=search burst=20 nodelay;

Caching check results

Implement a caching system for IP address checks:

python

from functools import lru_cache

@lru_cache(maxsize=10000)
def check_ip_suspicious(ip_address):
    # Bot check with cached results
    return is_bot_ip(ip_address)

Using CDN and WAF

As mentioned on Reddit, “linuxserver.io’s SWAG reverse proxy has fail2ban built in… for that reason alone it’s better than nginx proxy manager”.

Comprehensive protection solutions

For maximum effectiveness, it’s necessary to combine several approaches:

Protection layers

Web level: Request rate limiting, Captcha
Application: GEO variable verification, behavioral analysis
System: Fail2Ban with GeoIP, iptables
External services: Cloudflare, Akamai

Specialized services

IPQualityScore for proxy detection
Arcjet for behavioral analysis
ScrapeHero for comprehensive protection

Monitoring and adaptation

Regularly analyze logs and adapt rules:

bash

# Analyzing logs for bot activity
grep "missing_geo" /var/log/access.log | awk '{print $1}' | sort | uniq -c | sort -nr

As emphasized by ZenRows, “bot blocking is a constant battle requiring regular updates to protection methods”.

Sources

Conclusion

A comprehensive approach is most effective for combating proxy bots, combining GeoIP filtering, behavioral analysis, and multi-layered protection
Firewall optimization is achieved by grouping IP addresses by subnets and using caching of check results
Regular monitoring and adaptation of protection rules are necessary to counter constantly evolving bot networks
Implementation of specialized services can significantly improve protection effectiveness without overloading your infrastructure
Testing and validation of new protection methods should be conducted in a test environment before deployment to production

How to configure fail2ban to detect bots through proxies with missing GEO variables?What specialized services exist for protection against proxy bots?How to optimize iptables for blocking bot subnets without system overload?How to implement behavior analysis to detect anomalies in search requests?What alternative methods exist for protecting against bots through proxies besides fail2ban?How to configure request rate limiting in Nginx for bot protection?

Ask NeuroAgent

Blocking Bots Through Proxies: Effective Methods

Table of Contents

Basic methods for blocking proxy bots

GeoIP filtering

Behavioral analysis

Multi-layered protection

Setting up Fail2Ban with GeoIP filtering

Installing and configuring GeoIP

Configuring Fail2Ban jail.local

Creating a custom filter for bots

Behavioral analysis and anomaly detection

Detecting missing GEO variable

Request pattern analysis

Temporal anomalies

Optimizing protection without overloading the system

Using network groups instead of individual IPs

Limiting at the web server level

Caching check results

Using CDN and WAF

Comprehensive protection solutions

Protection layers

Specialized services

Monitoring and adaptation

Sources

Conclusion