Building a Deep Research Agent with LangGraph & SearchCans

The era of “Chatbots” is ending. The era of “Deep Research Agents” has begun.

Recent releases from OpenAI and Gemini have shifted the focus from simple Q&A to autonomous investigation. A “Deep Research” agent doesn’t just answer a question; it forms a plan, executes multiple rounds of Google searches, reads dozens of pages, and synthesizes a comprehensive report.

To build this, you need two things:

A Cyclic Framework: LangGraph allows us to build stateful, looping workflows where the agent can “change its mind” and search again.
Unthrottled Vision: A research agent might trigger 50+ API calls in a minute. Standard SERP APIs with rate limits will crash your workflow. SearchCans provides the “No Rate Limit” infrastructure required for autonomous loops.

In this tutorial, we will build a simplified Deep Research Agent that can browse the web to answer complex questions.

The Architecture: Plan, Research, Review

Unlike a linear LangChain pipeline, our agent behaves like a state machine:

Planner Node: Breaks the user request into sub-queries.
Researcher Node (The “Eyes”): Uses SearchCans to search Google and read page content.
Reviewer Node: Checks if the gathered info is sufficient. If no, it loops back to the Researcher.

Step 1: Define the Agent State

First, we define the State that is passed between nodes. This acts as the agent’s short-term memory.

from typing import TypedDict, List

class AgentState(TypedDict):
    question: str
    plan: List[str]
    documents: List[str]
    final_answer: str

Step 2: The “Researcher” Tool (SearchCans Integration)

This is the critical component. The agent needs to see the web. We use SearchCans to combine Discovery (SERP) and Extraction (Reader) in one robust function.

import requests

class ResearchTool:
    def __init__(self, api_key):
        self.api_key = api_key
        self.search_url = "https://www.searchcans.com/api/search"
        self.reader_url = "https://www.searchcans.com/api/url"
        self.headers = {"Authorization": f"Bearer {api_key}"}

    def search_and_read(self, query):
        print(f"Researching: {query}")
        
        # 1. Search Google
        params = {"q": query, "engine": "google", "num": 1}
        try:
            resp = requests.get(self.search_url, params=params, headers=self.headers)
            results = resp.json().get("organic_results", [])
            
            if not results:
                return "No results found."
            
            # 2. Read the top result
            top_link = results[0]['link']
            return self._read_url(top_link)
            
        except Exception as e:
            return f"Error: {str(e)}"

    def _read_url(self, url):
        # Use headless browser for dynamic content
        params = {"url": url, "b": "true", "w": 2000}
        try:
            resp = requests.get(self.reader_url, params=params, headers=self.headers)
            data = resp.json()
            return data.get("markdown", "") or data.get("text", "")
        except Exception:
            return "Failed to read content."

Step 3: Building the Graph

Now we assemble the nodes. For brevity, we focus on the Researcher Node logic, which drives the external interaction.

from langgraph.graph import StateGraph, END

# Initialize Tool
researcher = ResearchTool(api_key="YOUR_SEARCHCANS_KEY")

def research_node(state: AgentState):
    # Get the next query from the plan
    current_query = state["plan"][0] 
    
    # Execute SearchCans lookup
    content = researcher.search_and_read(current_query)
    
    # Update State
    new_docs = state["documents"] + [content]
    new_plan = state["plan"][1:]  # Remove completed task
    
    return {"documents": new_docs, "plan": new_plan}

def should_continue(state: AgentState):
    if not state["plan"]:
        return "synthesize"  # No more steps, write answer
    return "research"  # Continue researching

# Build Graph
workflow = StateGraph(AgentState)
workflow.add_node("research", research_node)
# ... add planner and synthesizer nodes ...

workflow.set_entry_point("research")
workflow.add_conditional_edges("research", should_continue, {
    "research": "research",
    "synthesize": END
})

app = workflow.compile()

Step 4: The Planner Node

The planner breaks down complex questions into research steps:

from openai import OpenAI

client = OpenAI(api_key="YOUR_OPENAI_KEY")

def planner_node(state: AgentState):
    question = state["question"]
    
    prompt = f"""
    Break down this research question into 3-5 specific sub-questions
    that can be answered by web searches:
    
    Question: {question}
    
    Return as JSON: {{"plan": ["query1", "query2", ...]}}
    """
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )
    
    import json
    plan_data = json.loads(response.choices[0].message.content)
    
    return {"plan": plan_data["plan"]}

Step 5: The Synthesizer Node

After gathering all documents, synthesize the final answer:

def synthesizer_node(state: AgentState):
    question = state["question"]
    documents = state["documents"]
    
    context = "\n\n---\n\n".join(documents)
    
    prompt = f"""
    Based on the following research documents, provide a comprehensive
    answer to the question.
    
    Question: {question}
    
    Documents:
    {context}
    
    Provide a detailed, well-structured answer with citations.
    """
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return {"final_answer": response.choices[0].message.content}

Complete Workflow

Putting it all together:

# Build the complete graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("planner", planner_node)
workflow.add_node("research", research_node)
workflow.add_node("synthesize", synthesizer_node)

# Define edges
workflow.set_entry_point("planner")
workflow.add_edge("planner", "research")
workflow.add_conditional_edges("research", should_continue, {
    "research": "research",
    "synthesize": "synthesize"
})
workflow.add_edge("synthesize", END)

# Compile
app = workflow.compile()

# Run the agent
result = app.invoke({"question": "What are the latest trends in AI agent development?"})
print(result["final_answer"])

Why “Deep Research” Needs SearchCans

When you run this agent, it might decide to read 5 different technical papers to answer one question.

Concurrency

It loops fast. A rate-limited API will fail after the 3rd step. SearchCans handles the loop seamlessly.

Context Quality

The _read_url function returns Markdown. This is crucial. Feeding raw HTML into your AgentState will overflow the token limit very quickly. SearchCans optimizes the “signal-to-noise” ratio for you.

Production Enhancements

For production deployments, consider:

Parallel Research: Use asyncio to search multiple queries simultaneously
Result Caching: Cache recent searches to avoid duplicate API calls
Budget Limits: Set maximum number of research iterations
Error Handling: Gracefully handle failed searches

# Example: Budget-limited research
MAX_ITERATIONS = 5

def should_continue_with_budget(state: AgentState):
    if not state["plan"]:
        return "synthesize"
    
    if len(state["documents"]) >= MAX_ITERATIONS:
        return "synthesize"  # Budget exhausted
    
    return "research"

Conclusion

Building a Deep Research Agent is the ultimate test of your retrieval infrastructure. It requires an API that is fast, unlimited, and capable of understanding web content, not just finding links.

With LangGraph managing the logic and SearchCans managing the vision, you can build autonomous researchers that work while you sleep.

Resources

Related Topics:

Self-Correcting RAG (CRAG) - Improve agent accuracy
Adaptive RAG Router - Optimize RAG costs
AI Agent Internet Access - Tool-use patterns
Hybrid RAG Tutorial - Combining search methods
Advanced Prompt Engineering - LLM agent techniques

Get Started:

Free Trial - Get 100 free credits
API Documentation - Technical reference
Pricing - Transparent costs
Playground - Test in browser

SearchCans provides real-time data for AI agents. Start building now →

Build an Autonomous Deep Research Agent: LangGraph Python Tutorial

The Architecture: Plan, Research, Review

Step 1: Define the Agent State

Step 2: The “Researcher” Tool (SearchCans Integration)

Step 3: Building the Graph

Step 4: The Planner Node

Step 5: The Synthesizer Node

Complete Workflow

Why “Deep Research” Needs SearchCans

Concurrency

Context Quality

Production Enhancements

Conclusion

Resources

Essential Resources & Guides

API Documentation

Pricing Plans

API Playground

Get Started Free

Popular Tutorials & Guides

Ready to try SearchCans?

Explore More

Pricing Plans

API Playground

More Articles

The Architecture: Plan, Research, Review

Step 1: Define the Agent State

Step 2: The “Researcher” Tool (SearchCans Integration)

Step 3: Building the Graph

Step 4: The Planner Node

Step 5: The Synthesizer Node

Complete Workflow

Why “Deep Research” Needs SearchCans

Concurrency

Context Quality

Production Enhancements

Conclusion

Resources

Essential Resources & Guides

API Documentation

Pricing Plans

API Playground

Get Started Free

Popular Tutorials & Guides

Trending Articles

Ready to try SearchCans?

Explore More

Pricing Plans

API Playground

More Articles