SearchCans

Build a Real-Time Hybrid RAG Pipeline: Python & SearchCans Tutorial

Learn to build a Hybrid RAG pipeline that feeds real-time web data to ChatGPT. A step-by-step Python tutorial using SearchCans SERP and Reader APIs.

6 min read

The biggest limitation of standard RAG (Retrieval-Augmented Generation) is freshness. Your vector database is only as good as its last update.

If you ask your AI agent about an event that happened an hour ago, it hallucinates or fails.

To solve this, developers are moving towards Hybrid RAG: combining a static knowledge base with real-time web access.

However, existing solutions are often clunky. Using a standard WebBrowser tool in LangChain can be slow and resource-heavy. Scraping raw HTML fills your context window with noise.

In this tutorial, we will build a lightweight, fast Hybrid RAG pipeline using Python and SearchCans. We will fetch live search results and convert them into clean markdown for RAG, giving your LLM accurate, real-time “eyes.”

Prerequisites

We will use Python. You will need a SearchCans API key (which includes both SERP and Reader capabilities) and an OpenAI key.

pip install requests openai

The Concept: Search + Read

Our pipeline follows a simple logic that mimics human research:

  1. Search: Query Google via API to find relevant URLs.
  2. Read: Fetch the full content of those URLs in clean Markdown.
  3. Synthesize: Feed the Markdown into the LLM to answer the user’s question.

This approach solves the common “403 Forbidden” or blocking issues developers face when trying to scrape directly. By using SearchCans, you get both discovery (SERP) and extraction (Reader) in one reliable platform.

Implement the SearchCans Client

Instead of juggling multiple libraries (like BeautifulSoup or Selenium), we can handle both discovery and extraction with one API provider.

Create a file named rag_pipeline.py:

import requests

SEARCHCANS_API_KEY = "YOUR_KEY_HERE"

def get_realtime_context(query):
    """
    1. Searches for the query.
    2. Fetches clean markdown from the top result.
    """
    # Phase 1: Search (SERP API)
    print(f"🔍 Searching for: {query}...")
    search_url = "https://www.searchcans.com/api/search"
    search_params = {
        "q": query,
        "engine": "google",
        "num": 3  # We only need top results for this demo
    }
    headers = {"Authorization": f"Bearer {SEARCHCANS_API_KEY}"}
    
    try:
        search_resp = requests.get(search_url, params=search_params, headers=headers)
        results = search_resp.json().get("organic_results", [])
        
        if not results:
            return "No results found."

        # Phase 2: Read (Reader API)
        top_url = results[0]['link']
        print(f"📖 Reading content from: {top_url}...")
        
        reader_url = "https://www.searchcans.com/api/url"
        reader_params = {
            "url": top_url,
            "b": "true",  # Use headless browser for dynamic JS
            "w": 2000     # Wait 2000ms for content to load
        }
        
        reader_resp = requests.get(reader_url, params=reader_params, headers=headers)
        data = reader_resp.json()
        content = data.get("markdown", "") or data.get("text", "")
        
        # Limit context to prevent token overflow
        return f"Source: {top_url}\n\nContent:\n{content[:8000]}"
        
    except Exception as e:
        return f"Error fetching data: {str(e)}"

Notice the key differences from raw scraping:

Authorization Header

We pass the API key via Authorization: Bearer (not as a URL parameter).

Reader API Format

The /api/url endpoint returns structured JSON with a markdown field.

Browser Mode

b=true tells SearchCans to render JavaScript before extracting content.

Feed Real-Time Data to ChatGPT

Now we connect this context to the LLM. This allows us to feed real-time data to ChatGPT dynamically.

from openai import OpenAI

client = OpenAI(api_key="YOUR_OPENAI_KEY")

def ask_hybrid_agent(user_question):
    # 1. Get live context from the web
    context = get_realtime_context(user_question)
    
    # 2. Construct the prompt
    system_prompt = """
    You are a helpful AI assistant with real-time internet access. 
    Answer the user's question using the provided Context. 
    If the Context doesn't help, say so.
    """
    
    user_message = f"""
    Question: {user_question}
    
    ---
    Context from Web:
    {context}
    """

    # 3. Generate Answer
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ]
    )
    
    return response.choices[0].message.content

# Run it
if __name__ == "__main__":
    q = "What is the current stock price of NVIDIA and recent news?"
    answer = ask_hybrid_agent(q)
    print("\n🤖 Agent Answer:\n")
    print(answer)

When you run this, the system will:

  1. Search Google for “current stock price of NVIDIA and recent news”
  2. Fetch the top result (likely a financial news site)
  3. Extract clean Markdown content
  4. Feed it to ChatGPT for synthesis

This entire flow takes less than 3 seconds.

Why Clean Markdown Matters for RAG

You might ask: Why not just dump the HTML text?

HTML is structurally messy. It contains scripts, styles, and navigation elements that confuse embedding models. Markdown, on the other hand, preserves the semantic structure (headers, lists, tables) that LLMs rely on to understand hierarchy.

By using the SearchCans Reader API, you convert 100kb of messy HTML into 5kb of semantic Markdown. This drastically reduces your OpenAI bill and improves answer accuracy.

If you’re building a production RAG system, this token efficiency matters. For more on this topic, read our Markdown vs HTML RAG Benchmark.

Advanced: Routing Logic

For a production Hybrid RAG system, you should add a Router that decides when to use the vector DB versus live search. This prevents unnecessary API calls for questions your internal docs can answer.

For a deep dive into this pattern, see our Adaptive RAG Router Architecture guide.

Cost Analysis

Let’s compare the economics of Hybrid RAG with SearchCans:

Traditional Approach

Scrape with Selenium + Parse with BeautifulSoup = 30 seconds per page + maintenance hell

SearchCans Approach

API call = 0.5 seconds + $0.00056 per request

For an agent making 1,000 queries per day, that’s $0.56/day ($17/month). Compare this to the cost of maintaining custom scrapers and you’ll see why API-first is the future.

For a detailed pricing comparison, check out our SERP API Pricing Index 2026.

Conclusion

Building a Real-Time Hybrid RAG system doesn’t require complex agent frameworks or headless browsers. With simple Python logic and the right APIs, you can give your application live internet access today.

SearchCans simplifies this by offering the Search (to find data) and the Reader (to clean data) in a single, high-performance platform.


Resources

Related Topics:

Get Started:


SearchCans provides real-time data for AI agents. Start building now →

SearchCans Team

SearchCans Team

SearchCans Editorial Team

Global

The SearchCans editorial team consists of engineers, data scientists, and technical writers dedicated to helping developers build better AI applications with reliable data APIs.

API DevelopmentAI ApplicationsTechnical WritingDeveloper Tools
View all →

Trending articles will be displayed here.

Ready to try SearchCans?

Get 100 free credits and start using our SERP API today. No credit card required.