NeuroAgent

Blocking Bots Through Proxies: Effective Methods

Learn how to effectively ban bots sending requests through proxies with missing GEO variables. Comprehensive protection methods using Fail2Ban and GeoIP filtering without system overload.

Question

How can I ban bots that send search requests through proxies?

I’ve noticed an anomaly in the search logs: regular users have a GEO (location by IP) variable in their sessions, but requests from bots lack this variable. The bots use proxies to send requests to the site’s search form, and all their IP addresses are different.

I’m considering logging all requests and using fail2ban for blocking, but I’m concerned that the firewall might become bloated due to constantly changing IP addresses.

What effective methods exist for blocking such bots without overloading the system?

NeuroAgent

For effective blocking of bots sending search requests through proxies with missing GEO variables, a comprehensive approach is recommended: configure fail2ban with GeoIP filtering to detect location anomalies, implement behavioral analysis of requests, and apply multi-layered protection with request rate limiting at the web server level.

Table of Contents


Basic methods for blocking proxy bots

Bots that use proxies to send search requests pose a serious security threat and server performance issue. The main methods to combat them include:

GeoIP filtering

Using GeoIP databases to verify that the IP address matches the stated geolocation. As noted by Munkjensen.net, “proxy services often provide IP addresses that do not correspond to the user’s actual geographical location”.

Behavioral analysis

Detecting behavior patterns characteristic of bots:

  • Missing GEO variable in sessions
  • High frequency of requests from different IPs
  • Abnormal user behavior patterns

Multi-layered protection

Combination at the level of:

  • Web server (nginx, Apache)
  • Application (PHP, Python)
  • System (iptables, fail2ban)

Important: As pointed out by Boolean World, “bot blocking should be multi-layered, as a single method is often bypassed by bots”.


Setting up Fail2Ban with GeoIP filtering

Fail2Ban can be effectively configured to detect and block bots through proxies using GeoIP integration. Here’s how to implement it:

Installing and configuring GeoIP

bash
# Install GeoIP database
sudo apt-get install geoip-bin geoip-database

# Check GeoIP
geoiplookup 8.8.8.8

Configuring Fail2Ban jail.local

ini
[DEFAULT]
# Ban time after maxretry attempts
bantime = 3600

# Example jail for SSH with GeoIP filtering
[sshd-geoip]
enabled = true
port = ssh
filter = sshd-geoip
logpath = /var/log/auth.log
maxretry = 3
findtime = 600
banaction = iptables-allports

Creating a custom filter for bots

Create the file /etc/fail2ban/filter.d/sshd-geoip.conf:

ini
[Definition]
failregex = .*sshd.*Failed password.*<HOST>.*$
ignoreregex = 

# Add check for missing GEO variable

As demonstrated by Maxim Manylov, for GeoIP you can use scripts that check the country of origin of the IP:

bash
#!/bin/bash
ALLOW_COUNTRIES="NZ AU"

if [ $# -ne 1 ]; then
    echo "Usage: basename $0 <IP>" >&2
    exit 0
fi

COUNTRY=$(geoiplookup $1 | awk -F ": " '{ print $2 }' | awk -F "," '{ print $1 }' | head -n 1)

if [[ "$COUNTRY" == "IP Address not found" || "$ALLOW_COUNTRIES" =~ "$COUNTRY" ]]; then
    exit 0  # Allow
else
    logger "DENY sshd connection from $1 ($COUNTRY)"
    exit 1  # Deny
fi

Behavioral analysis and anomaly detection

To effectively detect bots through proxies, it’s necessary to analyze their behavior, not just IP addresses.

Detecting missing GEO variable

As you’ve noticed, the absence of a GEO variable in sessions is a red flag. Implement monitoring:

php
// Example PHP code for detecting bots
if (!isset($_SESSION['GEO']) && $request_count > 10) {
    // Log as suspicious request
    log_bot_activity($_SERVER['REMOTE_ADDR'], 'missing_geo');
}

Request pattern analysis

Bots often exhibit the following patterns:

  • High frequency of requests from different IPs
  • Missing referrers
  • Strange User-Agents
  • Uniform search queries

Temporal anomalies

As shown by pspace.org, “if multiple login attempts occur from a single IP over a long period, we check its geographical location”.


Optimizing protection without overloading the system

The main problem with blocking proxy bots is the risk of overloading the firewall due to constantly adding new IP addresses. Here’s how to avoid this:

Using network groups instead of individual IPs

Instead of blocking each IP separately, group them by subnets:

iptables
# Blocking subnets with suspicious activity
iptables -A INPUT -s 185.220.101.0/24 -j DROP
iptables -A INPUT -s 5.188.10.0/24 -j DROP

Limiting at the web server level

Configure Nginx or Apache to limit request frequency:

nginx
# Request rate limiting in Nginx
limit_req_zone $binary_remote_addr zone=search:10m rate=10r/m;
limit_req zone=search burst=20 nodelay;

Caching check results

Implement a caching system for IP address checks:

python
from functools import lru_cache

@lru_cache(maxsize=10000)
def check_ip_suspicious(ip_address):
    # Bot check with cached results
    return is_bot_ip(ip_address)

Using CDN and WAF

As mentioned on Reddit, “linuxserver.io’s SWAG reverse proxy has fail2ban built in… for that reason alone it’s better than nginx proxy manager”.


Comprehensive protection solutions

For maximum effectiveness, it’s necessary to combine several approaches:

Protection layers

  1. Web level: Request rate limiting, Captcha
  2. Application: GEO variable verification, behavioral analysis
  3. System: Fail2Ban with GeoIP, iptables
  4. External services: Cloudflare, Akamai

Specialized services

Monitoring and adaptation

Regularly analyze logs and adapt rules:

bash
# Analyzing logs for bot activity
grep "missing_geo" /var/log/access.log | awk '{print $1}' | sort | uniq -c | sort -nr

As emphasized by ZenRows, “bot blocking is a constant battle requiring regular updates to protection methods”.


Sources

  1. How to filter by geolocation in Fail2ban | webfoobar
  2. Access control using Fail2Ban and geoip - munkjensen.net
  3. How to protect your server on Ubuntu with Fail2Ban with email alerts and GeoIP filter | Medium
  4. Dead simple subnet and geo blocking in fail2ban | pspace.org
  5. Blocking bad bots with Fail2ban - Boolean World
  6. Bot Detection and Mitigation Techniques | bunny.net
  7. Bot detection techniques for developers | Arcjet
  8. How Websites Use Bot Mitigation Tools for Bot Detection | ScrapeHero
  9. Malicious Bot Detection Through A Complex Proxy Network - Security Boulevard
  10. How to bypass bot detection in 2025: 7 Proven Methods | ScraperAPI

Conclusion

  • A comprehensive approach is most effective for combating proxy bots, combining GeoIP filtering, behavioral analysis, and multi-layered protection
  • Firewall optimization is achieved by grouping IP addresses by subnets and using caching of check results
  • Regular monitoring and adaptation of protection rules are necessary to counter constantly evolving bot networks
  • Implementation of specialized services can significantly improve protection effectiveness without overloading your infrastructure
  • Testing and validation of new protection methods should be conducted in a test environment before deployment to production