SearchCans

The Core Role of SERP & Reader APIs | Deconstructing DeepResearch

Explore the architecture behind DeepResearch systems and understand why SERP and Reader APIs are fundamental to building autonomous research agents that can investigate topics like human researchers.

5 min read

DeepResearch systems seem magical—you ask a question, and they return comprehensive research reports. But behind the magic is a sophisticated architecture built on two critical components: SERP APIs for information discovery and Reader APIs for content extraction. Let’s deconstruct how it works.

The Core Architecture

┌─────────────────────────────────────────────────────────�?
�?                   User Research Query                   �?
└──────────────────────────┬──────────────────────────────�?
                           �?
                    ┌──────▼───────�?
                    �? Planner LLM �?
                    �? (GPT-4)     �?
                    └──────┬───────�?
                           �?
                  ┌────────▼────────�?
                  �? Research Loop  �?
                  └────────┬────────�?
                           �?
        ┌──────────────────┼──────────────────�?
        �?                 �?                 �?
  ┌─────▼──────�?   ┌──────▼──────�?  ┌──────▼──────�?
  �?SERP API   �?   �?Reader API  �?  �? LLM        �?
  �?(Discover) �?   �?(Extract)   �?  �?(Synthesize)�?
  └─────┬──────�?   └──────┬──────�?  └──────┬──────�?
        �?                 �?                 �?
        └──────────────────┼──────────────────�?
                           �?
                    ┌──────▼───────�?
                    �?Final Report �?
                    └──────────────�?

Component 1: SERP API - The Information Discovery Engine

Why SERP API is Essential

Problem: AI models have knowledge cutoffs and can’t access real-time information.

Solution: SERP API provides programmatic access to search engines.

What SERP API Delivers

response = serp_api.search(
    query="AI market size 2025",
    num=10
)

# Returns structured data
{
    "organic_results": [
        {
            "position": 1,
            "title": "AI Market Report 2025",
            "link": "https://example.com/ai-market-2025",
            "snippet": "The global AI market reached $450 billion...",
            "domain": "example.com"
        },
        # ... 9 more results
    ],
    "related_searches": ["AI market forecast", "AI industry growth"],
    "people_also_ask": [...]
}

Key Features:

  • Real-time search results
  • Structured JSON format
  • Multiple search engines (Google, Bing)
  • Rich metadata (position, domain, snippet)
  • Related queries and PAA questions

Learn more about SERP API capabilities.

DeepResearch’s SERP API Usage Pattern

class ResearchDiscovery:
    def multi_step_search(self, initial_query):
        # Phase 1: Broad search
        broad_results = self.serp_api.search(initial_query, num=20)
        
        # Phase 2: Analyze results to identify subtopics
        subtopics = self.identify_subtopics(broad_results)
        
        # Phase 3: Targeted searches for each subtopic
        detailed_results = {}
        for subtopic in subtopics:
            detailed_results[subtopic] = self.serp_api.search(
                f"{initial_query} {subtopic}",
                num=10
            )
        
        # Phase 4: Follow-up searches based on findings
        gaps = self.identify_knowledge_gaps(detailed_results)
        for gap in gaps:
            additional_results = self.serp_api.search(gap)
            detailed_results[gap] = additional_results
        
        return self.consolidate_sources(broad_results, detailed_results)

Example Research Flow:

Query: "Analyze SERP API competitive landscape"

Step 1: Initial search
�?"SERP API providers 2025"

Step 2: Discovered subtopics
�?"SERP API pricing comparison"
�?"SERP API vs web scraping"
�?"SERP API reliability"

Step 3: Deep dive searches
�?"SerpApi pricing"
�?"Serper.dev review"
�?"SearchCans SERP API features"

Step 4: Follow-up questions
�?"SERP API market size"
�?"SERP API use cases"

Why Static Databases Aren’t Enough

Limitation of Static RAG:

  • Knowledge cutoff (data becomes stale)
  • No access to recent news/trends
  • Can’t answer time-sensitive questions
  • Limited to pre-indexed documents

SERP API Solution:

  • Always current (searches in real-time)
  • Access to entire web
  • Captures latest developments
  • Discovers relevant sources dynamically

Example:

# Static RAG approach (limited)
def research_with_rag(question):
    # Only searches pre-indexed documents
    docs = vector_db.similarity_search(question)
    answer = llm.generate(f"Based on {docs}, answer: {question}")
    return answer

# DeepResearch with SERP API (comprehensive)
def research_with_serp(question):
    # Searches the entire web in real-time
    results = serp_api.search(question, num=20)
    
    # Can follow multiple threads
    for result in results:
        content = reader_api.extract(result.url)
        
        # Can do follow-up searches based on findings
        if self.needs_more_info(content):
            follow_up = self.generate_follow_up_query(content)
            additional_results = serp_api.search(follow_up)
    
    return comprehensive_report

Component 2: Reader API - The Content Extraction Engine

Why Reader API is Critical

Problem: Web pages are messy—ads, navigation, scripts, boilerplate.

Solution: Reader API extracts clean, LLM-ready content.

What Reader API Delivers

response = reader_api.extract(url="https://example.com/article")

# Returns clean markdown
{
    "url": "https://example.com/article",
    "title": "AI Market Analysis 2025",
    "author": "Jane Smith",
    "published_date": "2025-12-20",
    "content": """
# AI Market Analysis 2025

The global AI market reached $450 billion in 2025...

## Key Findings

- Enterprise adoption: 67%
- Market growth: 35% YoY
...
    """,
    "word_count": 2500,
    "reading_time": "10 minutes"
}

Benefits:

  • Markdown format (LLM-optimized)
  • No ads or navigation noise
  • Extracts metadata (author, date)
  • Preserves structure (headings, lists)
  • Handles complex layouts

Read about Reader API.

DeepResearch’s Reader API Usage

class ContentProcessor:
    def process_search_results(self, search_results):
        extracted_contents = []
        
        for result in search_results:
            try:
                # Extract clean content
                content = self.reader_api.extract(result.url)
                
                # Structure the information
                structured = {
                    "source": {
                        "url": result.url,
                        "domain": result.domain,
                        "title": content.title,
                        "author": content.author,
                        "date": content.published_date
                    },
                    "content": content.text,
                    "key_facts": self.extract_facts(content.text),
                    "statistics": self.extract_statistics(content.text),
                    "citations": self.extract_citations(content.text)
                }
                
                extracted_contents.append(structured)
                
            except Exception as e:
                # Some pages may fail - handle gracefully
                continue
        
        return extracted_contents

Why Web Scraping Isn’t Sufficient

Challenges with Raw HTML Scraping:

<!-- Raw webpage HTML -->
<div class="header">
  <nav>...</nav>
  <div class="ads">...</div>
</div>
<div class="content">
  <article>
    <h1>Actual Content Title</h1>
    <p>Actual content...</p>
  </article>
  <aside class="sidebar">...</aside>
</div>
<div class="footer">...</div>

You’d need custom parsing logic for each site structure.

Reader API Solves This:

# Actual Content Title

Actual content...

Clean, consistent, LLM-ready.

Compare: SERP API vs Web Scraping.

Component 3: The Integration - SERP + Reader API

The Golden Workflow

class DeepResearchEngine:
    def __init__(self, serp_key, reader_key):
        self.serp_api = SerpAPI(serp_key)
        self.reader_api = ReaderAPI(reader_key)
        self.llm = ChatGPT()
    
    def research(self, question):
        research_context = {"sources": [], "findings": []}
        
        # Iterative research loop
        queries = self.generate_initial_queries(question)
        
        for iteration in range(5):  # Max 5 iterations
            for query in queries:
                # STEP 1: Discover sources (SERP API)
                search_results = self.serp_api.search(query, num=10)
                
                # STEP 2: Extract content (Reader API)
                for result in search_results[:5]:  # Top 5 results
                    content = self.reader_api.extract(result.url)
                    
                    # STEP 3: Analyze content (LLM)
                    analysis = self.llm.analyze(f"""
                    Content: {content.text}
                    
                    Extract:
                    1. Key facts relevant to: {question}
                    2. Data and statistics
                    3. Expert opinions
                    4. Contradictions or uncertainties
                    """)
                    
                    research_context["sources"].append({
                        "url": result.url,
                        "domain": result.domain,
                        "content": content.text
                    })
                    research_context["findings"].append(analysis)
            
            # Determine if more research needed
            if self.is_research_complete(research_context, question):
                break
            else:
                # Generate follow-up queries
                queries = self.generate_follow_up_queries(
                    research_context,
                    question
                )
        
        # Synthesize final report
        report = self.synthesize_report(research_context, question)
        return report

Real-World Example: Market Research

Task: “Analyze the SERP API market in 2025”

Step-by-Step Process:

# Iteration 1: Broad overview
query_1 = "SERP API market overview 2025"
results_1 = serp_api.search(query_1)

for url in results_1.top_5_urls:
    content = reader_api.extract(url)
    # LLM extracts: Market size, growth rate, key players

# Iteration 2: Deep dive on key players
query_2 = "SerpApi Serper SearchCans comparison"
results_2 = serp_api.search(query_2)

for url in results_2.top_5_urls:
    content = reader_api.extract(url)
    # LLM extracts: Features, pricing, reviews

# Iteration 3: Customer perspective
query_3 = "SERP API reviews use cases"
results_3 = serp_api.search(query_3)

for url in results_3.top_5_urls:
    content = reader_api.extract(url)
    # LLM extracts: Common use cases, pain points, satisfaction

# Iteration 4: Technical details
query_4 = "SERP API integration documentation"
results_4 = serp_api.search(query_4)

# ... and so on

# Final synthesis
report = llm.synthesize(all_findings)

Output: Comprehensive 15-page report with 40+ cited sources

Learn how to build this: Building a Mini-DeepResearch Agent.

Performance Optimizations

1. Parallel Processing

from concurrent.futures import ThreadPoolExecutor

def research_parallel(queries):
    with ThreadPoolExecutor(max_workers=5) as executor:
        # Search in parallel
        search_futures = [
            executor.submit(serp_api.search, query)
            for query in queries
        ]
        search_results = [f.result() for f in search_futures]
        
        # Extract content in parallel
        all_urls = [url for results in search_results for url in results.top_urls]
        content_futures = [
            executor.submit(reader_api.extract, url)
            for url in all_urls
        ]
        contents = [f.result() for f in content_futures]
    
    return contents

Speed Improvement: 5x faster (5 sequential searches �?1 parallel batch)

2. Caching

class CachedResearch:
    def __init__(self):
        self.serp_cache = {}
        self.reader_cache = {}
    
    def cached_search(self, query):
        if query in self.serp_cache:
            return self.serp_cache[query]
        
        results = serp_api.search(query)
        self.serp_cache[query] = results
        return results
    
    def cached_extract(self, url):
        if url in self.reader_cache:
            return self.reader_cache[url]
        
        content = reader_api.extract(url)
        self.reader_cache[url] = content
        return content

Cost Savings: 40-60% for repeated research topics

3. Smart Source Selection

Don’t extract all search results—prioritize high-value sources.

def prioritize_sources(search_results):
    scored_results = []
    
    for result in search_results:
        score = 0
        
        # Domain authority
        if result.domain in TRUSTED_DOMAINS:
            score += 30
        
        # Recency
        if result.published_recently:
            score += 20
        
        # Relevance (position in search)
        score += (20 - result.position) * 2
        
        # Content type
        if "research" in result.title or "analysis" in result.title:
            score += 15
        
        scored_results.append((result, score))
    
    # Sort by score and return top N
    scored_results.sort(key=lambda x: x[1], reverse=True)
    return [r for r, s in scored_results[:8]]

Cost Analysis

Traditional Research (Human)

Market researcher salary: $75,000/year
Average research project: 20 hours
Cost per project: $750

DeepResearch (AI)

SERP API: $0.56/1K requests
Reader API: $0.50/1K requests
LLM (GPT-4): $30/1M input tokens

Typical research project:
- 50 SERP searches
- 30 page extractions
- 500K LLM tokens

Cost: $0.03 + $0.015 + $15 = ~$15.05 per project

Savings: 98% cost reduction

Time: 20 hours �?15 minutes (99% faster)

Building Your Own DeepResearch System

Minimal Implementation

import requests

class SimpleDeepResearch:
    def __init__(self, serp_key, reader_key, openai_key):
        self.serp_key = serp_key
        self.reader_key = reader_key
        self.openai_key = openai_key
    
    def research(self, question):
        # Step 1: Search
        results = requests.get(
            "https://www.searchcans.com/api/search",
            headers={"Authorization": f"Bearer {self.serp_key}"},
            params={"q": question, "engine": "google", "num": 10}
        ).json()
        
        # Step 2: Extract top 5
        contents = []
        for result in results["organic_results"][:5]:
            content = requests.get(
                "https://www.searchcans.com/api/url",
                headers={"Authorization": f"Bearer {self.reader_key}"},
                params={"url": result["link"], "b": "true", "w": 2000}
            ).json()
            contents.append(content.get("markdown", "") or content.get("text", ""))
        
        # Step 3: Synthesize
        combined = "\n\n---\n\n".join(contents)
        report = requests.post(
            "https://api.openai.com/v1/chat/completions",
            headers={"Authorization": f"Bearer {self.openai_key}"},
            json={
                "model": "gpt-4",
                "messages": [{
                    "role": "user",
                    "content": f"Based on:\n{combined}\n\nAnswer: {question}"
                }]
            }
        ).json()
        
        return report["choices"][0]["message"]["content"]

# Usage
researcher = SimpleDeepResearch(serp_key, reader_key, openai_key)
report = researcher.research("What is the SERP API market size?")
print(report)

Start building: Tutorial

Why SearchCans for DeepResearch

SERP API Advantages:

  • �?10x cheaper than competitors
  • �?Bing support (Google alternative)
  • �?Fast response (<1.5s average)
  • �?LLM-optimized output format

Reader API Advantages:

  • �?Clean markdown output
  • �?Handles complex layouts
  • �?Extracts metadata
  • �?High success rate

Combined Benefits:

  • Single platform for both APIs
  • Consistent authentication
  • Unified billing
  • Purpose-built for AI applications

SERP + Reader APIs are the foundation of DeepResearch. They enable AI to actively investigate topics like human researchers, transforming knowledge work.


DeepResearch Series:

API Documentation:

Implementation:

SearchCans provides the infrastructure for DeepResearch systems. Start free with $5 credits and build your research agent today.

SearchCans Team

SearchCans Team

SearchCans Editorial Team

Global

The SearchCans editorial team consists of engineers, data scientists, and technical writers dedicated to helping developers build better AI applications with reliable data APIs.

API DevelopmentAI ApplicationsTechnical WritingDeveloper Tools
View all →

Trending articles will be displayed here.

Ready to try SearchCans?

Get 100 free credits and start using our SERP API today. No credit card required.