Tutorial: Build Your Own Mini-DeepResearch Agent with SearchCans APIs

Let’s build a functional DeepResearch agent from scratch. By the end of this tutorial, you’ll have a working system that can autonomously research any topic and generate comprehensive reports.

What We’re Building

A Mini-DeepResearch agent that:

Takes a research question as input
Searches the web using SERP API
Extracts content using Reader API
Synthesizes findings with GPT-4
Generates a cited research report

Time to build: 30 minutes
Experience level: Intermediate Python

Prerequisites

# Required accounts (all have free tiers)
1. SearchCans account (SERP + Reader API)
2. OpenAI account (GPT-4 API)

# Python packages
pip install requests openai python-dotenv

Get API Keys:

SearchCans: Sign up free
OpenAI: https://platform.openai.com/

Step 1: Project Setup

Create project structure:

mkdir deepresearch-agent
cd deepresearch-agent
touch research_agent.py .env

Create .env file:

SEARCHCANS_API_KEY=your_searchcans_key_here
OPENAI_API_KEY=your_openai_key_here

Step 2: Core Research Agent Class

# research_agent.py

import os
import requests
from openai import OpenAI
from dotenv import load_dotenv
from typing import List, Dict

load_dotenv()

class MiniDeepResearch:
    def __init__(self):
        self.serp_key = os.getenv("SEARCHCANS_API_KEY")
        self.serp_url = "https://www.searchcans.com/api/search"
        self.reader_url = "https://www.searchcans.com/api/url"
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    def search(self, query: str, num_results: int = 10) -> List[Dict]:
        """Search the web using SERP API"""
        response = requests.get(
            self.serp_url,
            headers={"Authorization": f"Bearer {self.serp_key}"},
            params={
                "q": query,
                "num": num_results,
                "engine": "google"
            }
        )
        response.raise_for_status()
        return response.json().get("organic_results", [])
    
    def extract_content(self, url: str) -> Dict:
        """Extract clean content from URL using Reader API"""
        try:
            response = requests.get(
                self.reader_url,
                headers={"Authorization": f"Bearer {self.serp_key}"},
                params={"url": url, "b": "true", "w": 2000},
                timeout=10
            )
            response.raise_for_status()
            return response.json()
        except Exception as e:
            print(f"Failed to extract {url}: {e}")
            return None
    
    def synthesize(self, prompt: str) -> str:
        """Use GPT-4 to synthesize information"""
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        return response.choices[0].message.content

Step 3: Multi-Step Research Logic

Add research orchestration:

class MiniDeepResearch:
    # ... (previous methods)
    
    def research(self, question: str, max_sources: int = 8) -> str:
        """
        Main research method - orchestrates the full research process
        """
        print(f"\n🔍 Researching: {question}\n")
        
        # Step 1: Initial search
        print("Step 1: Searching web...")
        search_results = self.search(question, num_results=max_sources)
        print(f"Found {len(search_results)} results\n")
        
        # Step 2: Extract content from top results
        print("Step 2: Extracting content...")
        extracted_contents = []
        for i, result in enumerate(search_results[:max_sources], 1):
            print(f"  [{i}/{max_sources}] {result['title'][:60]}...")
            
            content = self.extract_content(result['link'])
            if content and content.get('content'):
                extracted_contents.append({
                    "url": result['link'],
                    "title": result['title'],
                    "domain": result.get('domain', ''),
                    "content": content['content'][:3000]  # First 3000 chars
                })
        
        print(f"Successfully extracted {len(extracted_contents)} sources\n")
        
        # Step 3: Analyze each source
        print("Step 3: Analyzing sources...")
        analyses = []
        for i, source in enumerate(extracted_contents, 1):
            print(f"  [{i}/{len(extracted_contents)}] Analyzing {source['domain']}...")
            
            [analysis_prompt](/blog/advanced-prompt-engineering-for-ai-agents/) = f"""
Analyze this content in relation to the question: "{question}"

Content from {source['domain']}:
{source['content']}

Extract:
1. Key facts and data relevant to the question
2. Important quotes or insights
3. Any contradictions or uncertainties

Format as: [Key Facts] ... [Quotes] ... [Notes] ...
"""
            
            analysis = self.synthesize(analysis_prompt)
            analyses.append({
                "source": source,
                "analysis": analysis
            })
        
        # Step 4: Synthesize final report
        print("\nStep 4: Synthesizing final report...")
        
        # Prepare context for final synthesis
        context = "\n\n---\n\n".join([
            f"SOURCE: {a['source']['title']} ({a['source']['url']})\n{a['analysis']}"
            for a in analyses
        ])
        
        final_prompt = f"""
You are a research analyst. Based on the following analyses from {len(analyses)} sources, 
create a comprehensive research report answering: "{question}"

SOURCES AND ANALYSES:
{context}

Create a report with:
1. Executive Summary (2-3 paragraphs)
2. Key Findings (bullet points with citations)
3. Detailed Analysis (organized by themes)
4. Conclusion
5. Sources (numbered list)

Use citations like [1], [2], etc. to reference sources.
Be objective and note any conflicting information.
"""
        
        final_report = self.synthesize(final_prompt)
        
        # Append sources
        sources_list = "\n".join([
            f"[{i+1}] {a['source']['title']} - {a['source']['url']}"
            for i, a in enumerate(analyses)
        ])
        
        full_report = f"{final_report}\n\n## Sources\n\n{sources_list}"
        
        print("�?Research complete!\n")
        return full_report

Step 4: Enhanced Features

Add follow-up questions and multi-query research:

class MiniDeepResearch:
    # ... (previous methods)
    
    def generate_sub_questions(self, main_question: str) -> List[str]:
        """Generate sub-questions to research comprehensively"""
        prompt = f"""
Given this research question: "{main_question}"

Generate 3-5 specific sub-questions that would help answer it comprehensively.

Return only the questions, one per line, without numbering.
"""
        
        response = self.synthesize(prompt)
        sub_questions = [q.strip() for q in response.strip().split('\n') if q.strip()]
        return sub_questions[:5]
    
    def deep_research(self, question: str) -> str:
        """
        Enhanced research with sub-questions
        """
        print(f"\n🔬 Deep Research Mode: {question}\n")
        
        # Generate sub-questions
        print("Generating sub-questions...")
        sub_questions = self.generate_sub_questions(question)
        print(f"Sub-questions:\n")
        for i, sq in enumerate(sub_questions, 1):
            print(f"  {i}. {sq}")
        print()
        
        # Research each sub-question
        all_findings = []
        
        # Main question
        main_findings = self.research(question, max_sources=5)
        all_findings.append({
            "question": question,
            "findings": main_findings
        })
        
        # Sub-questions
        for sq in sub_questions[:3]:  # Limit to 3 to save API calls
            findings = self.research(sq, max_sources=3)
            all_findings.append({
                "question": sq,
                "findings": findings
            })
        
        # Synthesize everything
        print("\n📊 Creating comprehensive report...\n")
        
        combined_context = "\n\n".join([
            f"# Research on: {f['question']}\n\n{f['findings']}"
            for f in all_findings
        ])
        
        final_prompt = f"""
Based on comprehensive research covering multiple aspects, create a final report 
answering: "{question}"

RESEARCH FINDINGS:
{combined_context}

Create an authoritative report that:
- Synthesizes all research
- Identifies key themes
- Resolves contradictions
- Provides actionable insights
- Includes comprehensive citations
"""
        
        final_report = self.synthesize(final_prompt)
        return final_report

Step 5: Usage Examples

# Basic usage
def main():
    agent = MiniDeepResearch()
    
    # Simple research
    report = agent.research("What is the SERP API market size in 2025?")
    print(report)
    
    # Save to file
    with open("research_report.md", "w") as f:
        f.write(report)

if __name__ == "__main__":
    main()

Deep research mode:

# For comprehensive analysis
agent = MiniDeepResearch()
report = agent.deep_research("Analyze AI trends in healthcare 2025")

Batch research:

# Research multiple topics
questions = [
    "AI in finance 2025 trends",
    "Best CRM software for startups",
    "Vector database comparison"
]

for q in questions:
    report = agent.research(q)
    filename = q.replace(" ", "_")[:50] + ".md"
    with open(filename, "w") as f:
        f.write(report)

Step 6: Advanced Optimizations

Parallel Processing

from concurrent.futures import ThreadPoolExecutor

class OptimizedDeepResearch(MiniDeepResearch):
    def research_parallel(self, question: str, max_sources: int = 8) -> str:
        """Research with parallel content extraction"""
        
        # Search
        search_results = self.search(question, num_results=max_sources)
        
        # Extract in parallel
        with ThreadPoolExecutor(max_workers=5) as executor:
            futures = [
                executor.submit(self.extract_content, result['link'])
                for result in search_results[:max_sources]
            ]
            
            extracted_contents = []
            for future, result in zip(futures, search_results[:max_sources]):
                content = future.result()
                if content and content.get('content'):
                    extracted_contents.append({
                        "url": result['link'],
                        "title": result['title'],
                        "content": content['content'][:3000]
                    })
        
        # Continue with synthesis...
        return self._synthesize_report(question, extracted_contents)

Speed improvement: 3-5x faster

Caching

import hashlib
import json
import os

class CachedDeepResearch(MiniDeepResearch):
    def __init__(self):
        super().__init__()
        self.cache_dir = "research_cache"
        os.makedirs(self.cache_dir, exist_ok=True)
    
    def _cache_key(self, text: str) -> str:
        return hashlib.md5(text.encode()).hexdigest()
    
    def cached_search(self, query: str, **kwargs) -> List[Dict]:
        cache_key = self._cache_key(query)
        cache_file = f"{self.cache_dir}/{cache_key}.json"
        
        if os.path.exists(cache_file):
            with open(cache_file, "r") as f:
                return json.load(f)
        
        results = self.search(query, **kwargs)
        
        with open(cache_file, "w") as f:
            json.dump(results, f)
        
        return results

Cost savings: 50-70% for repeated queries

Complete Working Example

#!/usr/bin/env python3
"""
Mini DeepResearch Agent
A simple but functional autonomous research agent
"""

import os
import requests
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

class MiniDeepResearch:
    def __init__(self):
        self.serp_key = os.getenv("SEARCHCANS_API_KEY")
        self.serp_url = "https://www.searchcans.com/api/search"
        self.reader_url = "https://www.searchcans.com/api/url"
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    def search(self, query, num=10):
        response = requests.get(
            self.serp_url,
            headers={"Authorization": f"Bearer {self.serp_key}"},
            params={"q": query, "engine": "google", "num": num}
        )
        return response.json().get("organic_results", [])
    
    def extract(self, url):
        try:
            response = requests.get(
                self.reader_url,
                headers={"Authorization": f"Bearer {self.serp_key}"},
                params={"url": url, "b": "true", "w": 2000},
                timeout=10
            )
            data = response.json()
            return data.get("markdown", "") or data.get("text", "")
        except:
            return ""
    
    def synthesize(self, prompt):
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content
    
    def research(self, question):
        # Search
        results = self.search(question, num=8)
        
        # Extract
        contents = []
        for r in results[:5]:
            content = self.extract(r['link'])
            if content:
                contents.append(f"SOURCE: {r['title']}\nURL: {r['link']}\n\n{content[:2000]}")
        
        # Synthesize
        context = "\n\n---\n\n".join(contents)
        report = self.synthesize(f"""
Based on these sources, answer: {question}

{context}

Provide a comprehensive answer with citations.
""")
        
        return report

# Usage
if __name__ == "__main__":
    agent = MiniDeepResearch()
    
    question = input("What would you like to research? ")
    print("\nResearching...\n")
    
    report = agent.research(question)
    print(report)
    
    # Save
    with open("report.md", "w") as f:
        f.write(f"# Research Report\n\n**Question**: {question}\n\n{report}")
    
    print("\n�?Report saved to report.md")

Testing Your Agent

python research_agent.py

Example questions to try:

“What are the top AI trends in 2025?”
“Compare React vs Vue.js for enterprise applications”
“Analyze the electric vehicle market in Europe”

Cost Estimation

For a typical research query:

SERP API: 10 searches × $0.56/1000 = $0.0056
Reader API: 5 extractions × $0.50/1000 = $0.0025
GPT-4: ~100K tokens × $30/1M = $3.00

Total per research: ~$3.01

Optimization tips:

Use GPT-3.5 for simple syntheses ($0.50 vs $3)
Cache search results
Limit max sources for quick research

Next Steps

Enhancements to add:

Source credibility scoring
Multi-language support
PDF export
Web interface (Flask/Streamlit)
Scheduled research (cron jobs)
Collaborative features

Troubleshooting

Common issues:

# Issue: API key not found
# Solution: Check .env file and load_dotenv()

# Issue: Timeout errors
# Solution: Increase timeout, add retry logic

def extract_with_retry(self, url, retries=3):
    for attempt in range(retries):
        try:
            return self.extract(url)
        except requests.Timeout:
            if attempt == retries - 1:
                return ""
            time.sleep(2 ** attempt)  # Exponential backoff

Full Code Repository

Get the complete code with additional features:

git clone https://github.com/searchcans/mini-deepresearch
cd mini-deepresearch
pip install -r requirements.txt
python research_agent.py

You now have a functional DeepResearch agent! Experiment, enhance, and build amazing research tools.

DeepResearch Series:

API Documentation:

Get Started:

Build your own DeepResearch agent with SearchCans APIs. Start free with $5 credits.

Build Your Own AI Research Agent from Scratch

What We’re Building

Prerequisites

Step 1: Project Setup

Step 2: Core Research Agent Class

Step 3: Multi-Step Research Logic

Step 4: Enhanced Features

Step 5: Usage Examples

Step 6: Advanced Optimizations

Parallel Processing

Caching

Complete Working Example

Testing Your Agent

Cost Estimation

Next Steps

Troubleshooting

Full Code Repository

Essential Resources & Guides

API Documentation

Pricing Plans

API Playground

Get Started Free

Popular Tutorials & Guides

Ready to try SearchCans?

Explore More

Pricing Plans

API Playground

More Articles

What We’re Building

Prerequisites

Step 1: Project Setup

Step 2: Core Research Agent Class

Step 3: Multi-Step Research Logic

Step 4: Enhanced Features

Step 5: Usage Examples

Step 6: Advanced Optimizations

Parallel Processing

Caching

Complete Working Example

Testing Your Agent

Cost Estimation

Next Steps

Troubleshooting

Full Code Repository

Related Resources

Essential Resources & Guides

API Documentation

Pricing Plans

API Playground

Get Started Free

Popular Tutorials & Guides

Trending Articles

Ready to try SearchCans?

Explore More

Pricing Plans

API Playground

More Articles