SearchCans

Build Your Own AI Research Agent from Scratch

Step-by-step tutorial to build an autonomous research agent using SearchCans SERP and Reader APIs. Complete code examples included for creating your own DeepResearch system.

6 min read

Let’s build a functional DeepResearch agent from scratch. By the end of this tutorial, you’ll have a working system that can autonomously research any topic and generate comprehensive reports.

What We’re Building

A Mini-DeepResearch agent that:

  • Takes a research question as input
  • Searches the web using SERP API
  • Extracts content using Reader API
  • Synthesizes findings with GPT-4
  • Generates a cited research report

Time to build: 30 minutes
Experience level: Intermediate Python

Prerequisites

# Required accounts (all have free tiers)
1. SearchCans account (SERP + Reader API)
2. OpenAI account (GPT-4 API)

# Python packages
pip install requests openai python-dotenv

Get API Keys:

Step 1: Project Setup

Create project structure:

mkdir deepresearch-agent
cd deepresearch-agent
touch research_agent.py .env

Create .env file:

SEARCHCANS_API_KEY=your_searchcans_key_here
OPENAI_API_KEY=your_openai_key_here

Step 2: Core Research Agent Class

# research_agent.py

import os
import requests
from openai import OpenAI
from dotenv import load_dotenv
from typing import List, Dict

load_dotenv()

class MiniDeepResearch:
    def __init__(self):
        self.serp_key = os.getenv("SEARCHCANS_API_KEY")
        self.serp_url = "https://www.searchcans.com/api/search"
        self.reader_url = "https://www.searchcans.com/api/url"
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    def search(self, query: str, num_results: int = 10) -> List[Dict]:
        """Search the web using SERP API"""
        response = requests.get(
            self.serp_url,
            headers={"Authorization": f"Bearer {self.serp_key}"},
            params={
                "q": query,
                "num": num_results,
                "engine": "google"
            }
        )
        response.raise_for_status()
        return response.json().get("organic_results", [])
    
    def extract_content(self, url: str) -> Dict:
        """Extract clean content from URL using Reader API"""
        try:
            response = requests.get(
                self.reader_url,
                headers={"Authorization": f"Bearer {self.serp_key}"},
                params={"url": url, "b": "true", "w": 2000},
                timeout=10
            )
            response.raise_for_status()
            return response.json()
        except Exception as e:
            print(f"Failed to extract {url}: {e}")
            return None
    
    def synthesize(self, prompt: str) -> str:
        """Use GPT-4 to synthesize information"""
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        return response.choices[0].message.content

Step 3: Multi-Step Research Logic

Add research orchestration:

class MiniDeepResearch:
    # ... (previous methods)
    
    def research(self, question: str, max_sources: int = 8) -> str:
        """
        Main research method - orchestrates the full research process
        """
        print(f"\n🔍 Researching: {question}\n")
        
        # Step 1: Initial search
        print("Step 1: Searching web...")
        search_results = self.search(question, num_results=max_sources)
        print(f"Found {len(search_results)} results\n")
        
        # Step 2: Extract content from top results
        print("Step 2: Extracting content...")
        extracted_contents = []
        for i, result in enumerate(search_results[:max_sources], 1):
            print(f"  [{i}/{max_sources}] {result['title'][:60]}...")
            
            content = self.extract_content(result['link'])
            if content and content.get('content'):
                extracted_contents.append({
                    "url": result['link'],
                    "title": result['title'],
                    "domain": result.get('domain', ''),
                    "content": content['content'][:3000]  # First 3000 chars
                })
        
        print(f"Successfully extracted {len(extracted_contents)} sources\n")
        
        # Step 3: Analyze each source
        print("Step 3: Analyzing sources...")
        analyses = []
        for i, source in enumerate(extracted_contents, 1):
            print(f"  [{i}/{len(extracted_contents)}] Analyzing {source['domain']}...")
            
            [analysis_prompt](/blog/advanced-prompt-engineering-for-ai-agents/) = f"""
Analyze this content in relation to the question: "{question}"

Content from {source['domain']}:
{source['content']}

Extract:
1. Key facts and data relevant to the question
2. Important quotes or insights
3. Any contradictions or uncertainties

Format as: [Key Facts] ... [Quotes] ... [Notes] ...
"""
            
            analysis = self.synthesize(analysis_prompt)
            analyses.append({
                "source": source,
                "analysis": analysis
            })
        
        # Step 4: Synthesize final report
        print("\nStep 4: Synthesizing final report...")
        
        # Prepare context for final synthesis
        context = "\n\n---\n\n".join([
            f"SOURCE: {a['source']['title']} ({a['source']['url']})\n{a['analysis']}"
            for a in analyses
        ])
        
        final_prompt = f"""
You are a research analyst. Based on the following analyses from {len(analyses)} sources, 
create a comprehensive research report answering: "{question}"

SOURCES AND ANALYSES:
{context}

Create a report with:
1. Executive Summary (2-3 paragraphs)
2. Key Findings (bullet points with citations)
3. Detailed Analysis (organized by themes)
4. Conclusion
5. Sources (numbered list)

Use citations like [1], [2], etc. to reference sources.
Be objective and note any conflicting information.
"""
        
        final_report = self.synthesize(final_prompt)
        
        # Append sources
        sources_list = "\n".join([
            f"[{i+1}] {a['source']['title']} - {a['source']['url']}"
            for i, a in enumerate(analyses)
        ])
        
        full_report = f"{final_report}\n\n## Sources\n\n{sources_list}"
        
        print("�?Research complete!\n")
        return full_report

Step 4: Enhanced Features

Add follow-up questions and multi-query research:

class MiniDeepResearch:
    # ... (previous methods)
    
    def generate_sub_questions(self, main_question: str) -> List[str]:
        """Generate sub-questions to research comprehensively"""
        prompt = f"""
Given this research question: "{main_question}"

Generate 3-5 specific sub-questions that would help answer it comprehensively.

Return only the questions, one per line, without numbering.
"""
        
        response = self.synthesize(prompt)
        sub_questions = [q.strip() for q in response.strip().split('\n') if q.strip()]
        return sub_questions[:5]
    
    def deep_research(self, question: str) -> str:
        """
        Enhanced research with sub-questions
        """
        print(f"\n🔬 Deep Research Mode: {question}\n")
        
        # Generate sub-questions
        print("Generating sub-questions...")
        sub_questions = self.generate_sub_questions(question)
        print(f"Sub-questions:\n")
        for i, sq in enumerate(sub_questions, 1):
            print(f"  {i}. {sq}")
        print()
        
        # Research each sub-question
        all_findings = []
        
        # Main question
        main_findings = self.research(question, max_sources=5)
        all_findings.append({
            "question": question,
            "findings": main_findings
        })
        
        # Sub-questions
        for sq in sub_questions[:3]:  # Limit to 3 to save API calls
            findings = self.research(sq, max_sources=3)
            all_findings.append({
                "question": sq,
                "findings": findings
            })
        
        # Synthesize everything
        print("\n📊 Creating comprehensive report...\n")
        
        combined_context = "\n\n".join([
            f"# Research on: {f['question']}\n\n{f['findings']}"
            for f in all_findings
        ])
        
        final_prompt = f"""
Based on comprehensive research covering multiple aspects, create a final report 
answering: "{question}"

RESEARCH FINDINGS:
{combined_context}

Create an authoritative report that:
- Synthesizes all research
- Identifies key themes
- Resolves contradictions
- Provides actionable insights
- Includes comprehensive citations
"""
        
        final_report = self.synthesize(final_prompt)
        return final_report

Step 5: Usage Examples

# Basic usage
def main():
    agent = MiniDeepResearch()
    
    # Simple research
    report = agent.research("What is the SERP API market size in 2025?")
    print(report)
    
    # Save to file
    with open("research_report.md", "w") as f:
        f.write(report)

if __name__ == "__main__":
    main()

Deep research mode:

# For comprehensive analysis
agent = MiniDeepResearch()
report = agent.deep_research("Analyze AI trends in healthcare 2025")

Batch research:

# Research multiple topics
questions = [
    "AI in finance 2025 trends",
    "Best CRM software for startups",
    "Vector database comparison"
]

for q in questions:
    report = agent.research(q)
    filename = q.replace(" ", "_")[:50] + ".md"
    with open(filename, "w") as f:
        f.write(report)

Step 6: Advanced Optimizations

Parallel Processing

from concurrent.futures import ThreadPoolExecutor

class OptimizedDeepResearch(MiniDeepResearch):
    def research_parallel(self, question: str, max_sources: int = 8) -> str:
        """Research with parallel content extraction"""
        
        # Search
        search_results = self.search(question, num_results=max_sources)
        
        # Extract in parallel
        with ThreadPoolExecutor(max_workers=5) as executor:
            futures = [
                executor.submit(self.extract_content, result['link'])
                for result in search_results[:max_sources]
            ]
            
            extracted_contents = []
            for future, result in zip(futures, search_results[:max_sources]):
                content = future.result()
                if content and content.get('content'):
                    extracted_contents.append({
                        "url": result['link'],
                        "title": result['title'],
                        "content": content['content'][:3000]
                    })
        
        # Continue with synthesis...
        return self._synthesize_report(question, extracted_contents)

Speed improvement: 3-5x faster

Caching

import hashlib
import json
import os

class CachedDeepResearch(MiniDeepResearch):
    def __init__(self):
        super().__init__()
        self.cache_dir = "research_cache"
        os.makedirs(self.cache_dir, exist_ok=True)
    
    def _cache_key(self, text: str) -> str:
        return hashlib.md5(text.encode()).hexdigest()
    
    def cached_search(self, query: str, **kwargs) -> List[Dict]:
        cache_key = self._cache_key(query)
        cache_file = f"{self.cache_dir}/{cache_key}.json"
        
        if os.path.exists(cache_file):
            with open(cache_file, "r") as f:
                return json.load(f)
        
        results = self.search(query, **kwargs)
        
        with open(cache_file, "w") as f:
            json.dump(results, f)
        
        return results

Cost savings: 50-70% for repeated queries

Complete Working Example

#!/usr/bin/env python3
"""
Mini DeepResearch Agent
A simple but functional autonomous research agent
"""

import os
import requests
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

class MiniDeepResearch:
    def __init__(self):
        self.serp_key = os.getenv("SEARCHCANS_API_KEY")
        self.serp_url = "https://www.searchcans.com/api/search"
        self.reader_url = "https://www.searchcans.com/api/url"
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    def search(self, query, num=10):
        response = requests.get(
            self.serp_url,
            headers={"Authorization": f"Bearer {self.serp_key}"},
            params={"q": query, "engine": "google", "num": num}
        )
        return response.json().get("organic_results", [])
    
    def extract(self, url):
        try:
            response = requests.get(
                self.reader_url,
                headers={"Authorization": f"Bearer {self.serp_key}"},
                params={"url": url, "b": "true", "w": 2000},
                timeout=10
            )
            data = response.json()
            return data.get("markdown", "") or data.get("text", "")
        except:
            return ""
    
    def synthesize(self, prompt):
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content
    
    def research(self, question):
        # Search
        results = self.search(question, num=8)
        
        # Extract
        contents = []
        for r in results[:5]:
            content = self.extract(r['link'])
            if content:
                contents.append(f"SOURCE: {r['title']}\nURL: {r['link']}\n\n{content[:2000]}")
        
        # Synthesize
        context = "\n\n---\n\n".join(contents)
        report = self.synthesize(f"""
Based on these sources, answer: {question}

{context}

Provide a comprehensive answer with citations.
""")
        
        return report

# Usage
if __name__ == "__main__":
    agent = MiniDeepResearch()
    
    question = input("What would you like to research? ")
    print("\nResearching...\n")
    
    report = agent.research(question)
    print(report)
    
    # Save
    with open("report.md", "w") as f:
        f.write(f"# Research Report\n\n**Question**: {question}\n\n{report}")
    
    print("\n�?Report saved to report.md")

Testing Your Agent

python research_agent.py

Example questions to try:

  • “What are the top AI trends in 2025?”
  • “Compare React vs Vue.js for enterprise applications”
  • “Analyze the electric vehicle market in Europe”

Cost Estimation

For a typical research query:

SERP API: 10 searches × $0.56/1000 = $0.0056
Reader API: 5 extractions × $0.50/1000 = $0.0025
GPT-4: ~100K tokens × $30/1M = $3.00

Total per research: ~$3.01

Optimization tips:

  • Use GPT-3.5 for simple syntheses ($0.50 vs $3)
  • Cache search results
  • Limit max sources for quick research

Next Steps

Enhancements to add:

  1. Source credibility scoring
  2. Multi-language support
  3. PDF export
  4. Web interface (Flask/Streamlit)
  5. Scheduled research (cron jobs)
  6. Collaborative features

Troubleshooting

Common issues:

# Issue: API key not found
# Solution: Check .env file and load_dotenv()

# Issue: Timeout errors
# Solution: Increase timeout, add retry logic

def extract_with_retry(self, url, retries=3):
    for attempt in range(retries):
        try:
            return self.extract(url)
        except requests.Timeout:
            if attempt == retries - 1:
                return ""
            time.sleep(2 ** attempt)  # Exponential backoff

Full Code Repository

Get the complete code with additional features:

git clone https://github.com/searchcans/mini-deepresearch
cd mini-deepresearch
pip install -r requirements.txt
python research_agent.py

You now have a functional DeepResearch agent! Experiment, enhance, and build amazing research tools.


DeepResearch Series:

API Documentation:

Get Started:

Build your own DeepResearch agent with SearchCans APIs. Start free with $5 credits.

SearchCans Team

SearchCans Team

SearchCans Editorial Team

Global

The SearchCans editorial team consists of engineers, data scientists, and technical writers dedicated to helping developers build better AI applications with reliable data APIs.

API DevelopmentAI ApplicationsTechnical WritingDeveloper Tools
View all →

Trending articles will be displayed here.

Ready to try SearchCans?

Get 100 free credits and start using our SERP API today. No credit card required.