SearchCans

Build an Autonomous Deep Research Agent: LangGraph Python Tutorial

Recreate OpenAI's Deep Research using LangGraph and Python. Learn to build a recursive research agent with SearchCans for unlimited web access.

4 min read

The era of “Chatbots” is ending. The era of “Deep Research Agents” has begun.

Recent releases from OpenAI and Gemini have shifted the focus from simple Q&A to autonomous investigation. A “Deep Research” agent doesn’t just answer a question; it forms a plan, executes multiple rounds of Google searches, reads dozens of pages, and synthesizes a comprehensive report.

To build this, you need two things:

  1. A Cyclic Framework: LangGraph allows us to build stateful, looping workflows where the agent can “change its mind” and search again.
  2. Unthrottled Vision: A research agent might trigger 50+ API calls in a minute. Standard SERP APIs with rate limits will crash your workflow. SearchCans provides the “No Rate Limit” infrastructure required for autonomous loops.

In this tutorial, we will build a simplified Deep Research Agent that can browse the web to answer complex questions.

The Architecture: Plan, Research, Review

Unlike a linear LangChain pipeline, our agent behaves like a state machine:

  1. Planner Node: Breaks the user request into sub-queries.
  2. Researcher Node (The “Eyes”): Uses SearchCans to search Google and read page content.
  3. Reviewer Node: Checks if the gathered info is sufficient. If no, it loops back to the Researcher.

Step 1: Define the Agent State

First, we define the State that is passed between nodes. This acts as the agent’s short-term memory.

from typing import TypedDict, List

class AgentState(TypedDict):
    question: str
    plan: List[str]
    documents: List[str]
    final_answer: str

Step 2: The “Researcher” Tool (SearchCans Integration)

This is the critical component. The agent needs to see the web. We use SearchCans to combine Discovery (SERP) and Extraction (Reader) in one robust function.

import requests

class ResearchTool:
    def __init__(self, api_key):
        self.api_key = api_key
        self.search_url = "https://www.searchcans.com/api/search"
        self.reader_url = "https://www.searchcans.com/api/url"
        self.headers = {"Authorization": f"Bearer {api_key}"}

    def search_and_read(self, query):
        print(f"Researching: {query}")
        
        # 1. Search Google
        params = {"q": query, "engine": "google", "num": 1}
        try:
            resp = requests.get(self.search_url, params=params, headers=self.headers)
            results = resp.json().get("organic_results", [])
            
            if not results:
                return "No results found."
            
            # 2. Read the top result
            top_link = results[0]['link']
            return self._read_url(top_link)
            
        except Exception as e:
            return f"Error: {str(e)}"

    def _read_url(self, url):
        # Use headless browser for dynamic content
        params = {"url": url, "b": "true", "w": 2000}
        try:
            resp = requests.get(self.reader_url, params=params, headers=self.headers)
            data = resp.json()
            return data.get("markdown", "") or data.get("text", "")
        except Exception:
            return "Failed to read content."

Step 3: Building the Graph

Now we assemble the nodes. For brevity, we focus on the Researcher Node logic, which drives the external interaction.

from langgraph.graph import StateGraph, END

# Initialize Tool
researcher = ResearchTool(api_key="YOUR_SEARCHCANS_KEY")

def research_node(state: AgentState):
    # Get the next query from the plan
    current_query = state["plan"][0] 
    
    # Execute SearchCans lookup
    content = researcher.search_and_read(current_query)
    
    # Update State
    new_docs = state["documents"] + [content]
    new_plan = state["plan"][1:]  # Remove completed task
    
    return {"documents": new_docs, "plan": new_plan}

def should_continue(state: AgentState):
    if not state["plan"]:
        return "synthesize"  # No more steps, write answer
    return "research"  # Continue researching

# Build Graph
workflow = StateGraph(AgentState)
workflow.add_node("research", research_node)
# ... add planner and synthesizer nodes ...

workflow.set_entry_point("research")
workflow.add_conditional_edges("research", should_continue, {
    "research": "research",
    "synthesize": END
})

app = workflow.compile()

Step 4: The Planner Node

The planner breaks down complex questions into research steps:

from openai import OpenAI

client = OpenAI(api_key="YOUR_OPENAI_KEY")

def planner_node(state: AgentState):
    question = state["question"]
    
    prompt = f"""
    Break down this research question into 3-5 specific sub-questions
    that can be answered by web searches:
    
    Question: {question}
    
    Return as JSON: {{"plan": ["query1", "query2", ...]}}
    """
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )
    
    import json
    plan_data = json.loads(response.choices[0].message.content)
    
    return {"plan": plan_data["plan"]}

Step 5: The Synthesizer Node

After gathering all documents, synthesize the final answer:

def synthesizer_node(state: AgentState):
    question = state["question"]
    documents = state["documents"]
    
    context = "\n\n---\n\n".join(documents)
    
    prompt = f"""
    Based on the following research documents, provide a comprehensive
    answer to the question.
    
    Question: {question}
    
    Documents:
    {context}
    
    Provide a detailed, well-structured answer with citations.
    """
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return {"final_answer": response.choices[0].message.content}

Complete Workflow

Putting it all together:

# Build the complete graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("planner", planner_node)
workflow.add_node("research", research_node)
workflow.add_node("synthesize", synthesizer_node)

# Define edges
workflow.set_entry_point("planner")
workflow.add_edge("planner", "research")
workflow.add_conditional_edges("research", should_continue, {
    "research": "research",
    "synthesize": "synthesize"
})
workflow.add_edge("synthesize", END)

# Compile
app = workflow.compile()

# Run the agent
result = app.invoke({"question": "What are the latest trends in AI agent development?"})
print(result["final_answer"])

Why “Deep Research” Needs SearchCans

When you run this agent, it might decide to read 5 different technical papers to answer one question.

Concurrency

It loops fast. A rate-limited API will fail after the 3rd step. SearchCans handles the loop seamlessly.

Context Quality

The _read_url function returns Markdown. This is crucial. Feeding raw HTML into your AgentState will overflow the token limit very quickly. SearchCans optimizes the “signal-to-noise” ratio for you.

Production Enhancements

For production deployments, consider:

  • Parallel Research: Use asyncio to search multiple queries simultaneously
  • Result Caching: Cache recent searches to avoid duplicate API calls
  • Budget Limits: Set maximum number of research iterations
  • Error Handling: Gracefully handle failed searches
# Example: Budget-limited research
MAX_ITERATIONS = 5

def should_continue_with_budget(state: AgentState):
    if not state["plan"]:
        return "synthesize"
    
    if len(state["documents"]) >= MAX_ITERATIONS:
        return "synthesize"  # Budget exhausted
    
    return "research"

Conclusion

Building a Deep Research Agent is the ultimate test of your retrieval infrastructure. It requires an API that is fast, unlimited, and capable of understanding web content, not just finding links.

With LangGraph managing the logic and SearchCans managing the vision, you can build autonomous researchers that work while you sleep.


Resources

Related Topics:

Get Started:


SearchCans provides real-time data for AI agents. Start building now →

Sarah Wang

Sarah Wang

AI Integration Specialist

Seattle, WA

Software engineer with focus on LLM integration and AI applications. 6+ years experience building AI-powered products and developer tools.

AI/MLLLM IntegrationRAG Systems
View all →

Trending articles will be displayed here.

Ready to try SearchCans?

Get 100 free credits and start using our SERP API today. No credit card required.