Introduction
The Hook: Static RAG is dead. If you are building an AI application in 2026 relying solely on a pre-indexed vector database, your model is already hallucinating. The standard for AI has shifted from simple chatbots to Deep Research Agents—systems like Perplexity that can browse the live web, read multiple sources, and synthesize up-to-the-minute answers.
The Solution: You don’t need a PhD in Machine Learning to build this. The secret lies in the “Search + Read” architecture. By decoupling discovery (SERP) from consumption (Reading), you can build a robust, scalable agent pipeline.
The Roadmap: In this guide, we will move beyond basic LangChain Google Search tutorials. We will build a production-ready Python agent with three core capabilities:
Real-Time Web Search
Searches the web for real-time queries using SERP APIs.
Browser-Based Content Extraction
Reads the top results using a browser-based Reader API to handle JavaScript-heavy sites.
Markdown Synthesis
Synthesizes a cited answer using clean Markdown optimized for LLM context windows.
The “Search + Read” Architecture
Most developers fail at building Real-Time RAG because they try to force a single tool to do too much. A robust agent requires two distinct cognitive steps: Discovery and Extraction.
The Discovery Layer (Search API)
This is the “eyes” of your agent. It needs to query a search engine (like Google or Bing) to find relevant URLs. While legacy providers like SerpApi charge premium rates, modern alternatives allow for high-volume querying at a fraction of the cost. The goal here isn’t to get the content, but to get the map—the list of URLs that contain the answer.
The Extraction Layer (Reader API)
This is the “brain” of your ingestion pipeline. Once you have URLs, you cannot simply feed raw HTML to an LLM—it is too noisy and token-expensive. You need a Reader API that renders the JavaScript (handling dynamic sites like React/Next.js) and converts the DOM into Clean Markdown. This format is the Universal Language for AI, minimizing token usage while maximizing semantic understanding.
Why “Reader API” Beats Custom Scrapers
In our research of the current landscape, many tutorials suggest using libraries like BeautifulSoup or Puppeteer. For a production agent, this is a trap.
The Maintenance Nightmare
Building a custom scraper requires maintaining headless browsers, rotating proxies, and constantly updating selectors as websites change their layouts. This “Build vs Buy” calculation often results in hidden costs that derail product roadmaps.
The Context Window Problem
Raw HTML is bloated. A standard news article might be 100kb of HTML but only 5kb of actual text. Feeding HTML to your context window dilutes the signal-to-noise ratio. A dedicated Reader API acts as a filter, delivering only the high-value text, headers, and links formatted specifically for RAG.
Python Implementation: The Research Agent
Let’s build the agent. We will use SearchCans for both the Search and Read layers because it provides a unified developer experience (one API key for both) and is significantly more affordable than combining Serper + Firecrawl.
Prerequisites
Before running the script:
- Python 3.x installed
requestslibrary (pip install requests)- A SearchCans API Key
Python Implementation: Complete Research Agent Class
This script implements the full “Search + Read” loop. It searches for a topic, selects the top results, and then uses the Reader API (in browser mode) to extract the full content.
import requests
import json
import time
# ======= Configuration =======
# Get your API key: https://www.searchcans.com/register/
API_KEY = "YOUR_SEARCHCANS_API_KEY"
SEARCH_ENDPOINT = "https://www.searchcans.com/api/search"
READER_ENDPOINT = "https://www.searchcans.com/api/url"
# =============================
class ResearchAgent:
def __init__(self, api_key):
self.api_key = api_key
self.headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json"
}
def step_1_search(self, query, limit=3):
"""
Discovery Layer: Find relevant URLs using Google/Bing SERP.
"""
print(f"🔎 Searching for: '{query}'...")
payload = {
"s": query,
"t": "google", # or 'bing'
"d": 10000, # Timeout in ms
"p": 1
}
try:
response = requests.post(
SEARCH_ENDPOINT,
headers=self.headers,
json=payload
)
data = response.json()
if data.get("code") == 0:
results = data.get("data", [])[:limit]
print(f"✅ Found {len(results)} URLs.")
return [res["url"] for res in results if "url" in res]
else:
print(f"❌ Search Error: {data.get('msg')}")
return []
except Exception as e:
print(f"❌ Network Error: {str(e)}")
return []
def step_2_read(self, url):
"""
Extraction Layer: Convert URL to Clean Markdown.
Uses 'b': True to enable Headless Browser for dynamic sites.
"""
print(f"📖 Reading: {url}...")
payload = {
"s": url,
"t": "url",
"w": 2000, # Wait 2s for JS execution
"d": 30000, # Max timeout 30s
"b": True # Enable Browser Mode (Crucial for modern web)
}
try:
response = requests.post(
READER_ENDPOINT,
headers=self.headers,
json=payload
)
data = response.json()
if data.get("code") == 0:
# SearchCans returns structured data including markdown
content_data = data.get("data", {})
# Handle case where data might be a string or dict
if isinstance(content_data, dict):
return content_data.get("markdown", "")
elif isinstance(content_data, str):
return content_data
return None
except Exception as e:
print(f"❌ Read Error: {str(e)}")
return None
def run(self, topic):
"""
Orchestrates the Research Workflow.
"""
# Step 1: Discovery
urls = self.step_1_search(topic)
if not urls:
print("No sources found.")
return
context = []
# Step 2: Extraction (Search + Read)
for url in urls:
markdown = self.step_2_read(url)
if markdown:
# Truncate for demo purposes to fit context window
snippet = markdown[:1500]
context.append(f"Source: {url}\nContent:\n{snippet}\n---")
# Step 3: Synthesis (Mental Model)
# In a full app, you would send `full_context` to OpenAI/Claude here.
full_context = "\n".join(context)
print("\n🤖 === Agent Context Constructed === 🤖")
print(f"Total Sources: {len(context)}")
print(f"Total Characters: {len(full_context)}")
# Preview the context
print("\n--- Preview ---\n")
print(full_context[:500] + "...")
return full_context
if __name__ == "__main__":
agent = ResearchAgent(API_KEY)
# Example: A query that requires real-time data
agent.run("latest developments in solid state batteries 2026")
Optimizing for Production
The script above is a solid foundation. To scale this into a “Deep Research” product like the ones built by huge labs, you need to consider two critical factors: Concurrency and Parsing.
Parallel Execution
A linear loop (read URL 1, then URL 2…) is too slow for user-facing agents. In a production environment, you should use Python’s asyncio or ThreadPoolExecutor to hit the Reader API for all discovered URLs simultaneously. SearchCans supports high concurrency, allowing you to fetch 10+ pages in parallel, reducing total latency to under 5 seconds.
Markdown Parsing
The output from the Reader API is Markdown. This allows you to perform Intelligent Chunking. Instead of arbitrarily cutting text every 500 characters, you can split by headers (#, ##) to keep semantic sections together. This significantly improves the retrieval quality in the final RAG step.
Pro Tip: Context Window Engineering
When building production agents, implement a “relevance scoring” layer before sending content to your LLM. Use a lightweight embedding model to score each Markdown chunk’s relevance to the original query, then only send the top 5-10 chunks. This reduces token costs by 70% while maintaining answer quality.
Cost Comparison: SearchCans vs. The Rest
Building an AI Agent can get expensive if you stack multiple SaaS subscriptions. Here is how the unified approach stacks up against assembling disparate tools.
| Stack Strategy | Search Provider | Reader/Scraper Provider | Est. Cost per 10k Runs | Complexity |
|---|---|---|---|---|
| The Fragmented Stack | Serper ($50) | Firecrawl ($200+) | ~$250+ | High (2 APIs) |
| The Legacy Stack | SerpApi ($200+) | Zyte/ScrapingBee ($100+) | ~$300+ | High (2 APIs) |
| The Unified Stack | SearchCans | SearchCans (Included) | ~$5.60 | Low (1 API) |
Note: Pricing estimates based on standard tiers. SearchCans pricing model ($0.56/1k) applies to both search and read operations, offering significant volume savings.
Frequently Asked Questions
How does this compare to LangChain’s built-in tools?
LangChain provides convenient wrappers for search tools, but you still need to bring your own API provider like SerpApi or Tavily. The architecture shown here is framework-agnostic and can be integrated into LangChain, LlamaIndex, or any custom pipeline. The key advantage is using a unified API that handles both search and content extraction, reducing integration complexity and cost. For LangChain users, you can wrap the SearchCans API in a custom tool class.
Can I use this for non-English queries?
Absolutely, SearchCans supports multi-language queries through both Google and Bing search engines. The Reader API automatically detects the page language and preserves the original text in Markdown format. For optimal results with non-English content, ensure your LLM supports the target language. Many developers use this architecture for multilingual market research, monitoring news across different regions simultaneously.
What about rate limits and scaling?
SearchCans is designed for high-volume applications with no hard rate limits on the API side. For production deployments, implement exponential backoff retry logic and consider using a queue system like Celery or AWS SQS to manage burst traffic. The unified API model means you only need to manage one set of credentials and one rate-limiting strategy, unlike fragmented stacks where each provider has different limits.
Conclusion
The era of static knowledge bases is ending. To build an AI application that provides value in 2026, you must give it access to the live web.
By adopting the Search + Read architecture, you solve the two biggest hurdles in AI development: finding the right data and cleaning it for the model. Using a unified API for both steps not only simplifies your code but drastically reduces your infrastructure costs.
Ready to build your own Perplexity clone?
Get your API key and start building with 100 free credits at /register/.