The biggest limitation of standard RAG (Retrieval-Augmented Generation) is freshness. Your vector database is only as good as its last update.
If you ask your AI agent about an event that happened an hour ago, it hallucinates or fails.
To solve this, developers are moving towards Hybrid RAG: combining a static knowledge base with real-time web access.
However, existing solutions are often clunky. Using a standard WebBrowser tool in LangChain can be slow and resource-heavy. Scraping raw HTML fills your context window with noise.
In this tutorial, we will build a lightweight, fast Hybrid RAG pipeline using Python and SearchCans. We will fetch live search results and convert them into clean markdown for RAG, giving your LLM accurate, real-time “eyes.”
Prerequisites
We will use Python. You will need a SearchCans API key (which includes both SERP and Reader capabilities) and an OpenAI key.
pip install requests openai
The Concept: Search + Read
Our pipeline follows a simple logic that mimics human research:
- Search: Query Google via API to find relevant URLs.
- Read: Fetch the full content of those URLs in clean Markdown.
- Synthesize: Feed the Markdown into the LLM to answer the user’s question.
This approach solves the common “403 Forbidden” or blocking issues developers face when trying to scrape directly. By using SearchCans, you get both discovery (SERP) and extraction (Reader) in one reliable platform.
Implement the SearchCans Client
Instead of juggling multiple libraries (like BeautifulSoup or Selenium), we can handle both discovery and extraction with one API provider.
Create a file named rag_pipeline.py:
import requests
SEARCHCANS_API_KEY = "YOUR_KEY_HERE"
def get_realtime_context(query):
"""
1. Searches for the query.
2. Fetches clean markdown from the top result.
"""
# Phase 1: Search (SERP API)
print(f"🔍 Searching for: {query}...")
search_url = "https://www.searchcans.com/api/search"
search_params = {
"q": query,
"engine": "google",
"num": 3 # We only need top results for this demo
}
headers = {"Authorization": f"Bearer {SEARCHCANS_API_KEY}"}
try:
search_resp = requests.get(search_url, params=search_params, headers=headers)
results = search_resp.json().get("organic_results", [])
if not results:
return "No results found."
# Phase 2: Read (Reader API)
top_url = results[0]['link']
print(f"📖 Reading content from: {top_url}...")
reader_url = "https://www.searchcans.com/api/url"
reader_params = {
"url": top_url,
"b": "true", # Use headless browser for dynamic JS
"w": 2000 # Wait 2000ms for content to load
}
reader_resp = requests.get(reader_url, params=reader_params, headers=headers)
data = reader_resp.json()
content = data.get("markdown", "") or data.get("text", "")
# Limit context to prevent token overflow
return f"Source: {top_url}\n\nContent:\n{content[:8000]}"
except Exception as e:
return f"Error fetching data: {str(e)}"
Notice the key differences from raw scraping:
Authorization Header
We pass the API key via Authorization: Bearer (not as a URL parameter).
Reader API Format
The /api/url endpoint returns structured JSON with a markdown field.
Browser Mode
b=true tells SearchCans to render JavaScript before extracting content.
Feed Real-Time Data to ChatGPT
Now we connect this context to the LLM. This allows us to feed real-time data to ChatGPT dynamically.
from openai import OpenAI
client = OpenAI(api_key="YOUR_OPENAI_KEY")
def ask_hybrid_agent(user_question):
# 1. Get live context from the web
context = get_realtime_context(user_question)
# 2. Construct the prompt
system_prompt = """
You are a helpful AI assistant with real-time internet access.
Answer the user's question using the provided Context.
If the Context doesn't help, say so.
"""
user_message = f"""
Question: {user_question}
---
Context from Web:
{context}
"""
# 3. Generate Answer
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
)
return response.choices[0].message.content
# Run it
if __name__ == "__main__":
q = "What is the current stock price of NVIDIA and recent news?"
answer = ask_hybrid_agent(q)
print("\n🤖 Agent Answer:\n")
print(answer)
When you run this, the system will:
- Search Google for “current stock price of NVIDIA and recent news”
- Fetch the top result (likely a financial news site)
- Extract clean Markdown content
- Feed it to ChatGPT for synthesis
This entire flow takes less than 3 seconds.
Why Clean Markdown Matters for RAG
You might ask: Why not just dump the HTML text?
HTML is structurally messy. It contains scripts, styles, and navigation elements that confuse embedding models. Markdown, on the other hand, preserves the semantic structure (headers, lists, tables) that LLMs rely on to understand hierarchy.
By using the SearchCans Reader API, you convert 100kb of messy HTML into 5kb of semantic Markdown. This drastically reduces your OpenAI bill and improves answer accuracy.
If you’re building a production RAG system, this token efficiency matters. For more on this topic, read our Markdown vs HTML RAG Benchmark.
Advanced: Routing Logic
For a production Hybrid RAG system, you should add a Router that decides when to use the vector DB versus live search. This prevents unnecessary API calls for questions your internal docs can answer.
For a deep dive into this pattern, see our Adaptive RAG Router Architecture guide.
Cost Analysis
Let’s compare the economics of Hybrid RAG with SearchCans:
Traditional Approach
Scrape with Selenium + Parse with BeautifulSoup = 30 seconds per page + maintenance hell
SearchCans Approach
API call = 0.5 seconds + $0.00056 per request
For an agent making 1,000 queries per day, that’s $0.56/day ($17/month). Compare this to the cost of maintaining custom scrapers and you’ll see why API-first is the future.
For a detailed pricing comparison, check out our SERP API Pricing Index 2026.
Conclusion
Building a Real-Time Hybrid RAG system doesn’t require complex agent frameworks or headless browsers. With simple Python logic and the right APIs, you can give your application live internet access today.
SearchCans simplifies this by offering the Search (to find data) and the Reader (to clean data) in a single, high-performance platform.
Resources
Related Topics:
- URL to Markdown API Benchmark - Compare tools like Jina and Firecrawl
- AI Agent Internet Access Architecture - Deep dive into Agent architecture
- Self-Correcting RAG (CRAG) Tutorial - Building verification loops
- Context Window Engineering - Maximize token efficiency
- Deep Research Agent with LangGraph - Complex agent workflows
- Optimizing Vector Embeddings - Clean data for better RAG
Get Started:
- Free Trial - Get 100 free credits
- API Documentation - Technical reference
- Pricing - Transparent costs
- Playground - Test in browser
SearchCans provides real-time data for AI agents. Start building now →