How can I ban bots that send search requests through proxies?
I’ve noticed an anomaly in the search logs: regular users have a GEO (location by IP) variable in their sessions, but requests from bots lack this variable. The bots use proxies to send requests to the site’s search form, and all their IP addresses are different.
I’m considering logging all requests and using fail2ban for blocking, but I’m concerned that the firewall might become bloated due to constantly changing IP addresses.
What effective methods exist for blocking such bots without overloading the system?
For effective blocking of bots sending search requests through proxies with missing GEO variables, a comprehensive approach is recommended: configure fail2ban with GeoIP filtering to detect location anomalies, implement behavioral analysis of requests, and apply multi-layered protection with request rate limiting at the web server level.
Table of Contents
- Basic methods for blocking proxy bots
- Setting up Fail2Ban with GeoIP filtering
- Behavioral analysis and anomaly detection
- Optimizing protection without overloading the system
- Comprehensive protection solutions
Basic methods for blocking proxy bots
Bots that use proxies to send search requests pose a serious security threat and server performance issue. The main methods to combat them include:
GeoIP filtering
Using GeoIP databases to verify that the IP address matches the stated geolocation. As noted by Munkjensen.net, “proxy services often provide IP addresses that do not correspond to the user’s actual geographical location”.
Behavioral analysis
Detecting behavior patterns characteristic of bots:
- Missing GEO variable in sessions
- High frequency of requests from different IPs
- Abnormal user behavior patterns
Multi-layered protection
Combination at the level of:
- Web server (nginx, Apache)
- Application (PHP, Python)
- System (iptables, fail2ban)
Important: As pointed out by Boolean World, “bot blocking should be multi-layered, as a single method is often bypassed by bots”.
Setting up Fail2Ban with GeoIP filtering
Fail2Ban can be effectively configured to detect and block bots through proxies using GeoIP integration. Here’s how to implement it:
Installing and configuring GeoIP
# Install GeoIP database
sudo apt-get install geoip-bin geoip-database
# Check GeoIP
geoiplookup 8.8.8.8
Configuring Fail2Ban jail.local
[DEFAULT]
# Ban time after maxretry attempts
bantime = 3600
# Example jail for SSH with GeoIP filtering
[sshd-geoip]
enabled = true
port = ssh
filter = sshd-geoip
logpath = /var/log/auth.log
maxretry = 3
findtime = 600
banaction = iptables-allports
Creating a custom filter for bots
Create the file /etc/fail2ban/filter.d/sshd-geoip.conf:
[Definition]
failregex = .*sshd.*Failed password.*<HOST>.*$
ignoreregex =
# Add check for missing GEO variable
As demonstrated by Maxim Manylov, for GeoIP you can use scripts that check the country of origin of the IP:
#!/bin/bash
ALLOW_COUNTRIES="NZ AU"
if [ $# -ne 1 ]; then
echo "Usage: basename $0 <IP>" >&2
exit 0
fi
COUNTRY=$(geoiplookup $1 | awk -F ": " '{ print $2 }' | awk -F "," '{ print $1 }' | head -n 1)
if [[ "$COUNTRY" == "IP Address not found" || "$ALLOW_COUNTRIES" =~ "$COUNTRY" ]]; then
exit 0 # Allow
else
logger "DENY sshd connection from $1 ($COUNTRY)"
exit 1 # Deny
fi
Behavioral analysis and anomaly detection
To effectively detect bots through proxies, it’s necessary to analyze their behavior, not just IP addresses.
Detecting missing GEO variable
As you’ve noticed, the absence of a GEO variable in sessions is a red flag. Implement monitoring:
// Example PHP code for detecting bots
if (!isset($_SESSION['GEO']) && $request_count > 10) {
// Log as suspicious request
log_bot_activity($_SERVER['REMOTE_ADDR'], 'missing_geo');
}
Request pattern analysis
Bots often exhibit the following patterns:
- High frequency of requests from different IPs
- Missing referrers
- Strange User-Agents
- Uniform search queries
Temporal anomalies
As shown by pspace.org, “if multiple login attempts occur from a single IP over a long period, we check its geographical location”.
Optimizing protection without overloading the system
The main problem with blocking proxy bots is the risk of overloading the firewall due to constantly adding new IP addresses. Here’s how to avoid this:
Using network groups instead of individual IPs
Instead of blocking each IP separately, group them by subnets:
# Blocking subnets with suspicious activity
iptables -A INPUT -s 185.220.101.0/24 -j DROP
iptables -A INPUT -s 5.188.10.0/24 -j DROP
Limiting at the web server level
Configure Nginx or Apache to limit request frequency:
# Request rate limiting in Nginx
limit_req_zone $binary_remote_addr zone=search:10m rate=10r/m;
limit_req zone=search burst=20 nodelay;
Caching check results
Implement a caching system for IP address checks:
from functools import lru_cache
@lru_cache(maxsize=10000)
def check_ip_suspicious(ip_address):
# Bot check with cached results
return is_bot_ip(ip_address)
Using CDN and WAF
As mentioned on Reddit, “linuxserver.io’s SWAG reverse proxy has fail2ban built in… for that reason alone it’s better than nginx proxy manager”.
Comprehensive protection solutions
For maximum effectiveness, it’s necessary to combine several approaches:
Protection layers
- Web level: Request rate limiting, Captcha
- Application: GEO variable verification, behavioral analysis
- System: Fail2Ban with GeoIP, iptables
- External services: Cloudflare, Akamai
Specialized services
- IPQualityScore for proxy detection
- Arcjet for behavioral analysis
- ScrapeHero for comprehensive protection
Monitoring and adaptation
Regularly analyze logs and adapt rules:
# Analyzing logs for bot activity
grep "missing_geo" /var/log/access.log | awk '{print $1}' | sort | uniq -c | sort -nr
As emphasized by ZenRows, “bot blocking is a constant battle requiring regular updates to protection methods”.
Sources
- How to filter by geolocation in Fail2ban | webfoobar
- Access control using Fail2Ban and geoip - munkjensen.net
- How to protect your server on Ubuntu with Fail2Ban with email alerts and GeoIP filter | Medium
- Dead simple subnet and geo blocking in fail2ban | pspace.org
- Blocking bad bots with Fail2ban - Boolean World
- Bot Detection and Mitigation Techniques | bunny.net
- Bot detection techniques for developers | Arcjet
- How Websites Use Bot Mitigation Tools for Bot Detection | ScrapeHero
- Malicious Bot Detection Through A Complex Proxy Network - Security Boulevard
- How to bypass bot detection in 2025: 7 Proven Methods | ScraperAPI
Conclusion
- A comprehensive approach is most effective for combating proxy bots, combining GeoIP filtering, behavioral analysis, and multi-layered protection
- Firewall optimization is achieved by grouping IP addresses by subnets and using caching of check results
- Regular monitoring and adaptation of protection rules are necessary to counter constantly evolving bot networks
- Implementation of specialized services can significantly improve protection effectiveness without overloading your infrastructure
- Testing and validation of new protection methods should be conducted in a test environment before deployment to production