If you have ever tried to loop through a list of 1,000 keywords to scrape Google, you know the feeling. The first 10 requests work fine. Then, suddenly:
HTTP 429 Too Many Requests
Or worse, you get a CAPTCHA challenge that your script can’t solve.
In 2026, web scraping isn’t just about downloading HTML. It’s about evading detection. In this guide, we’ll explain the mechanics of how Google detects bots and how to use Rotating Proxies to bypass these blocks permanently.
Why You Get Blocked: The 3 Layers of Defense
Google (and other tech giants) use a multi-layered approach to stop automated traffic.
1. IP Rate Limiting
This is the most basic defense. If your IP address makes 50 requests in a minute, Google flags it as non-human.
The Error
429 Too Many Requests or “Unusual traffic from your computer network.”
The Fix
You cannot use your local IP or a single server IP. You must rotate IP addresses.
2. TLS Fingerprinting & Headers
Even if you change your IP, Google looks at how your client establishes the connection. Standard Python requests libraries have a distinct TLS fingerprint that screams “I am a bot”.
The Fix
You need to mimic real browser headers (User-Agent, Accept-Language) and potentially use specialized HTTP clients.
For more on avoiding detection, see our complete Python scraping guide.
3. Behavioral Analysis (CAPTCHAs)
If you browse too linearly or too fast, or if you don’t execute JavaScript like a real user, you get hit with a ReCAPTCHA.
The “DIY” Solution: Building a Proxy Rotator
To bypass this yourself, you need to build a Proxy Rotation System.
What is Proxy Rotation?
It means assigning a new IP address to every single request you send.
Datacenter Proxies
Cheap, but easily detected by Google.
Residential Proxies
Real user IPs. Hard to detect, but expensive.
The Complexity:
You would need to buy a pool of proxies, write middleware to handle failures, retry requests on 429 errors, and manage cookie jars. As noted by scraping experts, implementing a robust rotation system is a full-time job.
DIY Implementation Example
import requests
import random
# List of proxy IPs (you'd need to buy these)
PROXY_POOL = [
"http://proxy1.com:8080",
"http://proxy2.com:8080",
"http://proxy3.com:8080",
# ... hundreds more
]
def scrape_with_rotation(url):
max_retries = 3
for attempt in range(max_retries):
proxy = random.choice(PROXY_POOL)
try:
response = requests.get(
url,
proxies={'http': proxy, 'https': proxy},
timeout=10,
headers={'User-Agent': 'Mozilla/5.0...'}
)
if response.status_code == 200:
return response.text
elif response.status_code == 429:
print(f"429 error with {proxy}, retrying...")
continue
except Exception as e:
print(f"Proxy {proxy} failed: {e}")
continue
raise Exception("All proxies failed")
Problems with DIY Proxy Management:
- Cost: Residential proxies cost $5-15 per GB of traffic
- Maintenance: Proxies go dead, you need to constantly refresh the pool
- Geographic Distribution: Need proxies from multiple countries for location-specific searches
- Session Management: Cookies and sessions need to be managed per proxy
The “API” Solution: Let SearchCans Handle It
This is why SERP APIs exist. At SearchCans, we don’t just “fetch” the URL. We run a massive infrastructure of rotating proxies and headless browsers.
When you send a request to SearchCans:
- We route your request through a clean Residential IP (we have millions).
- We generate a realistic Browser Fingerprint (headers, TLS).
- We handle any CAPTCHA challenges automatically.
- We return the clean data to you.
Code Comparison
The “Blocked” Way (Standard Python):
import requests
# This will get blocked after ~20 requests
response = requests.get("https://www.google.com/search?q=test")
if response.status_code == 429:
print("Blocked!")
The “Unblockable” Way (SearchCans):
import requests
# This automatically rotates IPs on the backend
# You can run this 100,000 times without a 429 error
response = requests.get(
"https://www.searchcans.com/api/search",
params={"q": "test", "engine": "google", "num": 10},
headers={"Authorization": "Bearer YOUR_KEY"}
)
print("Success!")
Advanced: Understanding Proxy Types
1. Datacenter Proxies
Speed
Very fast
Cost
$1-3 per proxy/month
Detection Rate
High (Google knows datacenter IP ranges)
Best For
Low-risk scraping (not Google)
2. Residential Proxies
Speed
Moderate
Cost
$5-15 per GB
Detection Rate
Low (real user IPs)
Best For
Google, social media, e-commerce
3. Mobile Proxies
Speed
Slower
Cost
$50-100 per proxy/month
Detection Rate
Lowest
Best For
Most restrictive sites
Cost Analysis: DIY vs API
| Approach | Setup Cost | Monthly Cost (100k requests) | Maintenance Time |
|---|---|---|---|
| DIY Residential Proxies | $500 (infrastructure) | $150-300 (proxy fees) | 20+ hours/month |
| SearchCans API | $0 | $56 | 0 hours |
For a complete cost breakdown, see our pricing comparison guide.
Best Practices for Avoiding Blocks
Even when using an API, follow these guidelines:
- Randomize Request Timing: Don’t send requests at exact intervals
- Vary Search Queries: Don’t repeat the same query excessively
- Use Appropriate User-Agents: Match your scraping target
- Monitor Success Rates: Track and alert on anomalies
For high-volume operations, see our guide on scaling with unlimited concurrency.
Monitoring Your Scraping Health
import time
from collections import defaultdict
class ScrapingMonitor:
def __init__(self):
self.stats = defaultdict(int)
self.start_time = time.time()
def log_request(self, status_code):
self.stats[status_code] += 1
def get_success_rate(self):
total = sum(self.stats.values())
success = self.stats[200]
return (success / total * 100) if total > 0 else 0
def get_report(self):
return {
'total_requests': sum(self.stats.values()),
'success_rate': f"{self.get_success_rate():.2f}%",
'errors_429': self.stats[429],
'uptime': time.time() - self.start_time
}
# Usage
monitor = ScrapingMonitor()
# ... log your requests
print(monitor.get_report())
Integration with Popular Frameworks
For Node.js:
Check out our Node.js and Puppeteer guide for JavaScript developers.
For AI Agents:
See our LangChain integration tutorial for building autonomous agents.
Conclusion: Stop Buying Proxies
Managing your own proxy pool is expensive and inefficient in 2026. By using SearchCans, you get enterprise-grade Anti-Bot Evasion included in the price ($0.56/1k).
Don’t let 429 errors stop your data pipeline. Explore our full documentation or check out our pricing page for transparent rates.