Let’s build a functional DeepResearch agent from scratch. By the end of this tutorial, you’ll have a working system that can autonomously research any topic and generate comprehensive reports.
What We’re Building
A Mini-DeepResearch agent that:
- Takes a research question as input
- Searches the web using SERP API
- Extracts content using Reader API
- Synthesizes findings with GPT-4
- Generates a cited research report
Time to build: 30 minutes
Experience level: Intermediate Python
Prerequisites
# Required accounts (all have free tiers)
1. SearchCans account (SERP + Reader API)
2. OpenAI account (GPT-4 API)
# Python packages
pip install requests openai python-dotenv
Get API Keys:
- SearchCans: Sign up free
- OpenAI: https://platform.openai.com/
Step 1: Project Setup
Create project structure:
mkdir deepresearch-agent
cd deepresearch-agent
touch research_agent.py .env
Create .env file:
SEARCHCANS_API_KEY=your_searchcans_key_here
OPENAI_API_KEY=your_openai_key_here
Step 2: Core Research Agent Class
# research_agent.py
import os
import requests
from openai import OpenAI
from dotenv import load_dotenv
from typing import List, Dict
load_dotenv()
class MiniDeepResearch:
def __init__(self):
self.serp_key = os.getenv("SEARCHCANS_API_KEY")
self.serp_url = "https://www.searchcans.com/api/search"
self.reader_url = "https://www.searchcans.com/api/url"
self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def search(self, query: str, num_results: int = 10) -> List[Dict]:
"""Search the web using SERP API"""
response = requests.get(
self.serp_url,
headers={"Authorization": f"Bearer {self.serp_key}"},
params={
"q": query,
"num": num_results,
"engine": "google"
}
)
response.raise_for_status()
return response.json().get("organic_results", [])
def extract_content(self, url: str) -> Dict:
"""Extract clean content from URL using Reader API"""
try:
response = requests.get(
self.reader_url,
headers={"Authorization": f"Bearer {self.serp_key}"},
params={"url": url, "b": "true", "w": 2000},
timeout=10
)
response.raise_for_status()
return response.json()
except Exception as e:
print(f"Failed to extract {url}: {e}")
return None
def synthesize(self, prompt: str) -> str:
"""Use GPT-4 to synthesize information"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return response.choices[0].message.content
Step 3: Multi-Step Research Logic
Add research orchestration:
class MiniDeepResearch:
# ... (previous methods)
def research(self, question: str, max_sources: int = 8) -> str:
"""
Main research method - orchestrates the full research process
"""
print(f"\n🔍 Researching: {question}\n")
# Step 1: Initial search
print("Step 1: Searching web...")
search_results = self.search(question, num_results=max_sources)
print(f"Found {len(search_results)} results\n")
# Step 2: Extract content from top results
print("Step 2: Extracting content...")
extracted_contents = []
for i, result in enumerate(search_results[:max_sources], 1):
print(f" [{i}/{max_sources}] {result['title'][:60]}...")
content = self.extract_content(result['link'])
if content and content.get('content'):
extracted_contents.append({
"url": result['link'],
"title": result['title'],
"domain": result.get('domain', ''),
"content": content['content'][:3000] # First 3000 chars
})
print(f"Successfully extracted {len(extracted_contents)} sources\n")
# Step 3: Analyze each source
print("Step 3: Analyzing sources...")
analyses = []
for i, source in enumerate(extracted_contents, 1):
print(f" [{i}/{len(extracted_contents)}] Analyzing {source['domain']}...")
[analysis_prompt](/blog/advanced-prompt-engineering-for-ai-agents/) = f"""
Analyze this content in relation to the question: "{question}"
Content from {source['domain']}:
{source['content']}
Extract:
1. Key facts and data relevant to the question
2. Important quotes or insights
3. Any contradictions or uncertainties
Format as: [Key Facts] ... [Quotes] ... [Notes] ...
"""
analysis = self.synthesize(analysis_prompt)
analyses.append({
"source": source,
"analysis": analysis
})
# Step 4: Synthesize final report
print("\nStep 4: Synthesizing final report...")
# Prepare context for final synthesis
context = "\n\n---\n\n".join([
f"SOURCE: {a['source']['title']} ({a['source']['url']})\n{a['analysis']}"
for a in analyses
])
final_prompt = f"""
You are a research analyst. Based on the following analyses from {len(analyses)} sources,
create a comprehensive research report answering: "{question}"
SOURCES AND ANALYSES:
{context}
Create a report with:
1. Executive Summary (2-3 paragraphs)
2. Key Findings (bullet points with citations)
3. Detailed Analysis (organized by themes)
4. Conclusion
5. Sources (numbered list)
Use citations like [1], [2], etc. to reference sources.
Be objective and note any conflicting information.
"""
final_report = self.synthesize(final_prompt)
# Append sources
sources_list = "\n".join([
f"[{i+1}] {a['source']['title']} - {a['source']['url']}"
for i, a in enumerate(analyses)
])
full_report = f"{final_report}\n\n## Sources\n\n{sources_list}"
print("�?Research complete!\n")
return full_report
Step 4: Enhanced Features
Add follow-up questions and multi-query research:
class MiniDeepResearch:
# ... (previous methods)
def generate_sub_questions(self, main_question: str) -> List[str]:
"""Generate sub-questions to research comprehensively"""
prompt = f"""
Given this research question: "{main_question}"
Generate 3-5 specific sub-questions that would help answer it comprehensively.
Return only the questions, one per line, without numbering.
"""
response = self.synthesize(prompt)
sub_questions = [q.strip() for q in response.strip().split('\n') if q.strip()]
return sub_questions[:5]
def deep_research(self, question: str) -> str:
"""
Enhanced research with sub-questions
"""
print(f"\n🔬 Deep Research Mode: {question}\n")
# Generate sub-questions
print("Generating sub-questions...")
sub_questions = self.generate_sub_questions(question)
print(f"Sub-questions:\n")
for i, sq in enumerate(sub_questions, 1):
print(f" {i}. {sq}")
print()
# Research each sub-question
all_findings = []
# Main question
main_findings = self.research(question, max_sources=5)
all_findings.append({
"question": question,
"findings": main_findings
})
# Sub-questions
for sq in sub_questions[:3]: # Limit to 3 to save API calls
findings = self.research(sq, max_sources=3)
all_findings.append({
"question": sq,
"findings": findings
})
# Synthesize everything
print("\n📊 Creating comprehensive report...\n")
combined_context = "\n\n".join([
f"# Research on: {f['question']}\n\n{f['findings']}"
for f in all_findings
])
final_prompt = f"""
Based on comprehensive research covering multiple aspects, create a final report
answering: "{question}"
RESEARCH FINDINGS:
{combined_context}
Create an authoritative report that:
- Synthesizes all research
- Identifies key themes
- Resolves contradictions
- Provides actionable insights
- Includes comprehensive citations
"""
final_report = self.synthesize(final_prompt)
return final_report
Step 5: Usage Examples
# Basic usage
def main():
agent = MiniDeepResearch()
# Simple research
report = agent.research("What is the SERP API market size in 2025?")
print(report)
# Save to file
with open("research_report.md", "w") as f:
f.write(report)
if __name__ == "__main__":
main()
Deep research mode:
# For comprehensive analysis
agent = MiniDeepResearch()
report = agent.deep_research("Analyze AI trends in healthcare 2025")
Batch research:
# Research multiple topics
questions = [
"AI in finance 2025 trends",
"Best CRM software for startups",
"Vector database comparison"
]
for q in questions:
report = agent.research(q)
filename = q.replace(" ", "_")[:50] + ".md"
with open(filename, "w") as f:
f.write(report)
Step 6: Advanced Optimizations
Parallel Processing
from concurrent.futures import ThreadPoolExecutor
class OptimizedDeepResearch(MiniDeepResearch):
def research_parallel(self, question: str, max_sources: int = 8) -> str:
"""Research with parallel content extraction"""
# Search
search_results = self.search(question, num_results=max_sources)
# Extract in parallel
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [
executor.submit(self.extract_content, result['link'])
for result in search_results[:max_sources]
]
extracted_contents = []
for future, result in zip(futures, search_results[:max_sources]):
content = future.result()
if content and content.get('content'):
extracted_contents.append({
"url": result['link'],
"title": result['title'],
"content": content['content'][:3000]
})
# Continue with synthesis...
return self._synthesize_report(question, extracted_contents)
Speed improvement: 3-5x faster
Caching
import hashlib
import json
import os
class CachedDeepResearch(MiniDeepResearch):
def __init__(self):
super().__init__()
self.cache_dir = "research_cache"
os.makedirs(self.cache_dir, exist_ok=True)
def _cache_key(self, text: str) -> str:
return hashlib.md5(text.encode()).hexdigest()
def cached_search(self, query: str, **kwargs) -> List[Dict]:
cache_key = self._cache_key(query)
cache_file = f"{self.cache_dir}/{cache_key}.json"
if os.path.exists(cache_file):
with open(cache_file, "r") as f:
return json.load(f)
results = self.search(query, **kwargs)
with open(cache_file, "w") as f:
json.dump(results, f)
return results
Cost savings: 50-70% for repeated queries
Complete Working Example
#!/usr/bin/env python3
"""
Mini DeepResearch Agent
A simple but functional autonomous research agent
"""
import os
import requests
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
class MiniDeepResearch:
def __init__(self):
self.serp_key = os.getenv("SEARCHCANS_API_KEY")
self.serp_url = "https://www.searchcans.com/api/search"
self.reader_url = "https://www.searchcans.com/api/url"
self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def search(self, query, num=10):
response = requests.get(
self.serp_url,
headers={"Authorization": f"Bearer {self.serp_key}"},
params={"q": query, "engine": "google", "num": num}
)
return response.json().get("organic_results", [])
def extract(self, url):
try:
response = requests.get(
self.reader_url,
headers={"Authorization": f"Bearer {self.serp_key}"},
params={"url": url, "b": "true", "w": 2000},
timeout=10
)
data = response.json()
return data.get("markdown", "") or data.get("text", "")
except:
return ""
def synthesize(self, prompt):
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
def research(self, question):
# Search
results = self.search(question, num=8)
# Extract
contents = []
for r in results[:5]:
content = self.extract(r['link'])
if content:
contents.append(f"SOURCE: {r['title']}\nURL: {r['link']}\n\n{content[:2000]}")
# Synthesize
context = "\n\n---\n\n".join(contents)
report = self.synthesize(f"""
Based on these sources, answer: {question}
{context}
Provide a comprehensive answer with citations.
""")
return report
# Usage
if __name__ == "__main__":
agent = MiniDeepResearch()
question = input("What would you like to research? ")
print("\nResearching...\n")
report = agent.research(question)
print(report)
# Save
with open("report.md", "w") as f:
f.write(f"# Research Report\n\n**Question**: {question}\n\n{report}")
print("\n�?Report saved to report.md")
Testing Your Agent
python research_agent.py
Example questions to try:
- “What are the top AI trends in 2025?”
- “Compare React vs Vue.js for enterprise applications”
- “Analyze the electric vehicle market in Europe”
Cost Estimation
For a typical research query:
SERP API: 10 searches × $0.56/1000 = $0.0056
Reader API: 5 extractions × $0.50/1000 = $0.0025
GPT-4: ~100K tokens × $30/1M = $3.00
Total per research: ~$3.01
Optimization tips:
- Use GPT-3.5 for simple syntheses ($0.50 vs $3)
- Cache search results
- Limit max sources for quick research
Next Steps
Enhancements to add:
- Source credibility scoring
- Multi-language support
- PDF export
- Web interface (Flask/Streamlit)
- Scheduled research (cron jobs)
- Collaborative features
Troubleshooting
Common issues:
# Issue: API key not found
# Solution: Check .env file and load_dotenv()
# Issue: Timeout errors
# Solution: Increase timeout, add retry logic
def extract_with_retry(self, url, retries=3):
for attempt in range(retries):
try:
return self.extract(url)
except requests.Timeout:
if attempt == retries - 1:
return ""
time.sleep(2 ** attempt) # Exponential backoff
Full Code Repository
Get the complete code with additional features:
git clone https://github.com/searchcans/mini-deepresearch
cd mini-deepresearch
pip install -r requirements.txt
python research_agent.py
You now have a functional DeepResearch agent! Experiment, enhance, and build amazing research tools.
Related Resources
DeepResearch Series:
API Documentation:
Get Started:
Build your own DeepResearch agent with SearchCans APIs. Start free with $5 credits.