DeepResearch systems seem magical—you ask a question, and they return comprehensive research reports. But behind the magic is a sophisticated architecture built on two critical components: SERP APIs for information discovery and Reader APIs for content extraction. Let’s deconstruct how it works.
The Core Architecture
┌─────────────────────────────────────────────────────────�?
�? User Research Query �?
└──────────────────────────┬──────────────────────────────�?
�?
┌──────▼───────�?
�? Planner LLM �?
�? (GPT-4) �?
└──────┬───────�?
�?
┌────────▼────────�?
�? Research Loop �?
└────────┬────────�?
�?
┌──────────────────┼──────────────────�?
�? �? �?
┌─────▼──────�? ┌──────▼──────�? ┌──────▼──────�?
�?SERP API �? �?Reader API �? �? LLM �?
�?(Discover) �? �?(Extract) �? �?(Synthesize)�?
└─────┬──────�? └──────┬──────�? └──────┬──────�?
�? �? �?
└──────────────────┼──────────────────�?
�?
┌──────▼───────�?
�?Final Report �?
└──────────────�?
Component 1: SERP API - The Information Discovery Engine
Why SERP API is Essential
Problem: AI models have knowledge cutoffs and can’t access real-time information.
Solution: SERP API provides programmatic access to search engines.
What SERP API Delivers
response = serp_api.search(
query="AI market size 2025",
num=10
)
# Returns structured data
{
"organic_results": [
{
"position": 1,
"title": "AI Market Report 2025",
"link": "https://example.com/ai-market-2025",
"snippet": "The global AI market reached $450 billion...",
"domain": "example.com"
},
# ... 9 more results
],
"related_searches": ["AI market forecast", "AI industry growth"],
"people_also_ask": [...]
}
Key Features:
- Real-time search results
- Structured JSON format
- Multiple search engines (Google, Bing)
- Rich metadata (position, domain, snippet)
- Related queries and PAA questions
Learn more about SERP API capabilities.
DeepResearch’s SERP API Usage Pattern
class ResearchDiscovery:
def multi_step_search(self, initial_query):
# Phase 1: Broad search
broad_results = self.serp_api.search(initial_query, num=20)
# Phase 2: Analyze results to identify subtopics
subtopics = self.identify_subtopics(broad_results)
# Phase 3: Targeted searches for each subtopic
detailed_results = {}
for subtopic in subtopics:
detailed_results[subtopic] = self.serp_api.search(
f"{initial_query} {subtopic}",
num=10
)
# Phase 4: Follow-up searches based on findings
gaps = self.identify_knowledge_gaps(detailed_results)
for gap in gaps:
additional_results = self.serp_api.search(gap)
detailed_results[gap] = additional_results
return self.consolidate_sources(broad_results, detailed_results)
Example Research Flow:
Query: "Analyze SERP API competitive landscape"
Step 1: Initial search
�?"SERP API providers 2025"
Step 2: Discovered subtopics
�?"SERP API pricing comparison"
�?"SERP API vs web scraping"
�?"SERP API reliability"
Step 3: Deep dive searches
�?"SerpApi pricing"
�?"Serper.dev review"
�?"SearchCans SERP API features"
Step 4: Follow-up questions
�?"SERP API market size"
�?"SERP API use cases"
Why Static Databases Aren’t Enough
Limitation of Static RAG:
- Knowledge cutoff (data becomes stale)
- No access to recent news/trends
- Can’t answer time-sensitive questions
- Limited to pre-indexed documents
SERP API Solution:
- Always current (searches in real-time)
- Access to entire web
- Captures latest developments
- Discovers relevant sources dynamically
Example:
# Static RAG approach (limited)
def research_with_rag(question):
# Only searches pre-indexed documents
docs = vector_db.similarity_search(question)
answer = llm.generate(f"Based on {docs}, answer: {question}")
return answer
# DeepResearch with SERP API (comprehensive)
def research_with_serp(question):
# Searches the entire web in real-time
results = serp_api.search(question, num=20)
# Can follow multiple threads
for result in results:
content = reader_api.extract(result.url)
# Can do follow-up searches based on findings
if self.needs_more_info(content):
follow_up = self.generate_follow_up_query(content)
additional_results = serp_api.search(follow_up)
return comprehensive_report
Component 2: Reader API - The Content Extraction Engine
Why Reader API is Critical
Problem: Web pages are messy—ads, navigation, scripts, boilerplate.
Solution: Reader API extracts clean, LLM-ready content.
What Reader API Delivers
response = reader_api.extract(url="https://example.com/article")
# Returns clean markdown
{
"url": "https://example.com/article",
"title": "AI Market Analysis 2025",
"author": "Jane Smith",
"published_date": "2025-12-20",
"content": """
# AI Market Analysis 2025
The global AI market reached $450 billion in 2025...
## Key Findings
- Enterprise adoption: 67%
- Market growth: 35% YoY
...
""",
"word_count": 2500,
"reading_time": "10 minutes"
}
Benefits:
- Markdown format (LLM-optimized)
- No ads or navigation noise
- Extracts metadata (author, date)
- Preserves structure (headings, lists)
- Handles complex layouts
Read about Reader API.
DeepResearch’s Reader API Usage
class ContentProcessor:
def process_search_results(self, search_results):
extracted_contents = []
for result in search_results:
try:
# Extract clean content
content = self.reader_api.extract(result.url)
# Structure the information
structured = {
"source": {
"url": result.url,
"domain": result.domain,
"title": content.title,
"author": content.author,
"date": content.published_date
},
"content": content.text,
"key_facts": self.extract_facts(content.text),
"statistics": self.extract_statistics(content.text),
"citations": self.extract_citations(content.text)
}
extracted_contents.append(structured)
except Exception as e:
# Some pages may fail - handle gracefully
continue
return extracted_contents
Why Web Scraping Isn’t Sufficient
Challenges with Raw HTML Scraping:
<!-- Raw webpage HTML -->
<div class="header">
<nav>...</nav>
<div class="ads">...</div>
</div>
<div class="content">
<article>
<h1>Actual Content Title</h1>
<p>Actual content...</p>
</article>
<aside class="sidebar">...</aside>
</div>
<div class="footer">...</div>
You’d need custom parsing logic for each site structure.
Reader API Solves This:
# Actual Content Title
Actual content...
Clean, consistent, LLM-ready.
Compare: SERP API vs Web Scraping.
Component 3: The Integration - SERP + Reader API
The Golden Workflow
class DeepResearchEngine:
def __init__(self, serp_key, reader_key):
self.serp_api = SerpAPI(serp_key)
self.reader_api = ReaderAPI(reader_key)
self.llm = ChatGPT()
def research(self, question):
research_context = {"sources": [], "findings": []}
# Iterative research loop
queries = self.generate_initial_queries(question)
for iteration in range(5): # Max 5 iterations
for query in queries:
# STEP 1: Discover sources (SERP API)
search_results = self.serp_api.search(query, num=10)
# STEP 2: Extract content (Reader API)
for result in search_results[:5]: # Top 5 results
content = self.reader_api.extract(result.url)
# STEP 3: Analyze content (LLM)
analysis = self.llm.analyze(f"""
Content: {content.text}
Extract:
1. Key facts relevant to: {question}
2. Data and statistics
3. Expert opinions
4. Contradictions or uncertainties
""")
research_context["sources"].append({
"url": result.url,
"domain": result.domain,
"content": content.text
})
research_context["findings"].append(analysis)
# Determine if more research needed
if self.is_research_complete(research_context, question):
break
else:
# Generate follow-up queries
queries = self.generate_follow_up_queries(
research_context,
question
)
# Synthesize final report
report = self.synthesize_report(research_context, question)
return report
Real-World Example: Market Research
Task: “Analyze the SERP API market in 2025”
Step-by-Step Process:
# Iteration 1: Broad overview
query_1 = "SERP API market overview 2025"
results_1 = serp_api.search(query_1)
for url in results_1.top_5_urls:
content = reader_api.extract(url)
# LLM extracts: Market size, growth rate, key players
# Iteration 2: Deep dive on key players
query_2 = "SerpApi Serper SearchCans comparison"
results_2 = serp_api.search(query_2)
for url in results_2.top_5_urls:
content = reader_api.extract(url)
# LLM extracts: Features, pricing, reviews
# Iteration 3: Customer perspective
query_3 = "SERP API reviews use cases"
results_3 = serp_api.search(query_3)
for url in results_3.top_5_urls:
content = reader_api.extract(url)
# LLM extracts: Common use cases, pain points, satisfaction
# Iteration 4: Technical details
query_4 = "SERP API integration documentation"
results_4 = serp_api.search(query_4)
# ... and so on
# Final synthesis
report = llm.synthesize(all_findings)
Output: Comprehensive 15-page report with 40+ cited sources
Learn how to build this: Building a Mini-DeepResearch Agent.
Performance Optimizations
1. Parallel Processing
from concurrent.futures import ThreadPoolExecutor
def research_parallel(queries):
with ThreadPoolExecutor(max_workers=5) as executor:
# Search in parallel
search_futures = [
executor.submit(serp_api.search, query)
for query in queries
]
search_results = [f.result() for f in search_futures]
# Extract content in parallel
all_urls = [url for results in search_results for url in results.top_urls]
content_futures = [
executor.submit(reader_api.extract, url)
for url in all_urls
]
contents = [f.result() for f in content_futures]
return contents
Speed Improvement: 5x faster (5 sequential searches �?1 parallel batch)
2. Caching
class CachedResearch:
def __init__(self):
self.serp_cache = {}
self.reader_cache = {}
def cached_search(self, query):
if query in self.serp_cache:
return self.serp_cache[query]
results = serp_api.search(query)
self.serp_cache[query] = results
return results
def cached_extract(self, url):
if url in self.reader_cache:
return self.reader_cache[url]
content = reader_api.extract(url)
self.reader_cache[url] = content
return content
Cost Savings: 40-60% for repeated research topics
3. Smart Source Selection
Don’t extract all search results—prioritize high-value sources.
def prioritize_sources(search_results):
scored_results = []
for result in search_results:
score = 0
# Domain authority
if result.domain in TRUSTED_DOMAINS:
score += 30
# Recency
if result.published_recently:
score += 20
# Relevance (position in search)
score += (20 - result.position) * 2
# Content type
if "research" in result.title or "analysis" in result.title:
score += 15
scored_results.append((result, score))
# Sort by score and return top N
scored_results.sort(key=lambda x: x[1], reverse=True)
return [r for r, s in scored_results[:8]]
Cost Analysis
Traditional Research (Human)
Market researcher salary: $75,000/year
Average research project: 20 hours
Cost per project: $750
DeepResearch (AI)
SERP API: $0.56/1K requests
Reader API: $0.50/1K requests
LLM (GPT-4): $30/1M input tokens
Typical research project:
- 50 SERP searches
- 30 page extractions
- 500K LLM tokens
Cost: $0.03 + $0.015 + $15 = ~$15.05 per project
Savings: 98% cost reduction
Time: 20 hours �?15 minutes (99% faster)
Building Your Own DeepResearch System
Minimal Implementation
import requests
class SimpleDeepResearch:
def __init__(self, serp_key, reader_key, openai_key):
self.serp_key = serp_key
self.reader_key = reader_key
self.openai_key = openai_key
def research(self, question):
# Step 1: Search
results = requests.get(
"https://www.searchcans.com/api/search",
headers={"Authorization": f"Bearer {self.serp_key}"},
params={"q": question, "engine": "google", "num": 10}
).json()
# Step 2: Extract top 5
contents = []
for result in results["organic_results"][:5]:
content = requests.get(
"https://www.searchcans.com/api/url",
headers={"Authorization": f"Bearer {self.reader_key}"},
params={"url": result["link"], "b": "true", "w": 2000}
).json()
contents.append(content.get("markdown", "") or content.get("text", ""))
# Step 3: Synthesize
combined = "\n\n---\n\n".join(contents)
report = requests.post(
"https://api.openai.com/v1/chat/completions",
headers={"Authorization": f"Bearer {self.openai_key}"},
json={
"model": "gpt-4",
"messages": [{
"role": "user",
"content": f"Based on:\n{combined}\n\nAnswer: {question}"
}]
}
).json()
return report["choices"][0]["message"]["content"]
# Usage
researcher = SimpleDeepResearch(serp_key, reader_key, openai_key)
report = researcher.research("What is the SERP API market size?")
print(report)
Start building: Tutorial
Why SearchCans for DeepResearch
SERP API Advantages:
- �?10x cheaper than competitors
- �?Bing support (Google alternative)
- �?Fast response (<1.5s average)
- �?LLM-optimized output format
Reader API Advantages:
- �?Clean markdown output
- �?Handles complex layouts
- �?Extracts metadata
- �?High success rate
Combined Benefits:
- Single platform for both APIs
- Consistent authentication
- Unified billing
- Purpose-built for AI applications
SERP + Reader APIs are the foundation of DeepResearch. They enable AI to actively investigate topics like human researchers, transforming knowledge work.
Related Resources
DeepResearch Series:
API Documentation:
Implementation:
SearchCans provides the infrastructure for DeepResearch systems. Start free with $5 credits and build your research agent today.