When I was at Google working on LLM research, we had one major problem: language models only know what they were trained on. Ask GPT-4 about something that happened yesterday, and it’s clueless.
The solution everyone’s building now? Connect LLMs to search APIs for real-time information.
This post explains how it works and shows you the exact code to build it yourself.
Practical Guides: Build AI Agent Tutorial | Advanced Integration | API Docs
The Knowledge Cut-off Problem
Every LLM has a training cut-off date. GPT-4’s knowledge ends in April 2023 (for the base model). Ask it:
“What’s the current weather in San Francisco?”
It can’t answer. The information doesn’t exist in its training data.
Traditional solutions were clunky:
- Fine-tune the model (expensive, slow)
- Maintain a massive up-to-date knowledge base (engineering nightmare)
- Accept the limitations (bad user experience)
The Better Solution: Retrieval Augmented Generation (RAG)
RAG is a fancy term for a simple idea:
- User asks a question
- Search for relevant information
- Feed that information to the LLM as context
- LLM generates answer using both its training AND the fresh data
Search APIs make step #2 trivial.
The Basic Architecture
Here’s the simplest implementation:
import requests
import openai
def answer_with_search(question):
"""Answer questions using LLM + search API"""
# Step 1: Search for relevant information
search_response = requests.get(
'https://www.searchcans.com/api/search',
headers={'Authorization': f'Bearer {SEARCHCANS_KEY}'},
params={'q': question, 'engine': 'google', 'num': 5}
)
search_results = search_response.json()
# Step 2: Extract relevant snippets
context = "\n\n".join([
f"{result['title']}\n{result['snippet']}"
for result in search_results.get('organic_results', [])[:3]
])
# Step 3: Feed to LLM with context
prompt = f"""
Answer the following question using the provided context.
Context from recent search:
{context}
Question: {question}
Answer:"""
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Test it
answer = answer_with_search("What happened at OpenAI in November 2024?")
print(answer)
This works surprisingly well. The LLM now has access to current information from the search results.
Production-Grade Implementation
The basic version has issues:
- No error handling
- Wastes tokens on irrelevant context
- No source attribution
- Can’t handle complex queries
Here’s how we build it properly:
import requests
import openai
from typing import List, Dict
class AISearchAgent:
def __init__(self, search_api_key, openai_api_key):
self.search_key = search_api_key
self.openai_key = openai_api_key
openai.api_key = openai_api_key
def search(self, query: str, num_results: int = 5) -> List[Dict]:
"""Execute search and return structured results"""
try:
response = requests.get(
'https://www.searchcans.com/api/search',
headers={'Authorization': f'Bearer {self.search_key}'},
params={'q': query, 'engine': 'google', 'num': num_results},
timeout=10
)
response.raise_for_status()
return response.json().get('organic_results', [])
except Exception as e:
print(f"Search failed: {e}")
return []
def is_relevant(self, result: Dict, original_query: str) -> bool:
"""Use LLM to check if search result is relevant"""
prompt = f"""
Query: {original_query}
Result Title: {result['title']}
Result Snippet: {result['snippet']}
Is this result relevant to answering the query?
Answer with just YES or NO.
"""
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", # Use cheaper model for filtering
messages=[{"role": "user", "content": prompt}],
max_tokens=10
)
return "YES" in response.choices[0].message.content.upper()
def answer_question(self, question: str) -> Dict:
"""Answer question with sources"""
# Search
results = self.search(question, num_results=10)
if not results:
return {
'answer': "I couldn't find relevant information to answer this.",
'sources': []
}
# Filter relevant results
relevant = [r for r in results if self.is_relevant(r, question)][:3]
# Build context
context = "\n\n".join([
f"Source {i+1}: {r['title']}\n{r['snippet']}\nURL: {r['link']}"
for i, r in enumerate(relevant)
])
# Generate answer
prompt = f"""
Using the following sources, answer the question.
Cite sources using [1], [2], [3] notation.
If the sources don't contain the answer, say so.
Sources:
{context}
Question: {question}
Answer:"""
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3 # Lower temperature for factual responses
)
return {
'answer': response.choices[0].message.content,
'sources': [{'title': r['title'], 'url': r['link']}
for r in relevant]
}
# Usage
agent = AISearchAgent(
search_api_key="your_searchcans_key",
openai_api_key="your_openai_key"
)
result = agent.answer_question("Who won the 2024 World Series?")
print(result['answer'])
print("\nSources:")
for source in result['sources']:
print(f"- {source['title']}: {source['url']}")
This version:
- Filters irrelevant results using a cheap LLM call
- Provides source attribution
- Handles errors gracefully
- Uses appropriate models for different tasks
LangChain Integration
If you’re using LangChain (and you probably should be), integration is even simpler:
from langchain.tools import Tool
from langchain.agents import initialize_agent, AgentType
from langchain.llms import OpenAI
import requests
def search_tool_func(query: str) -> str:
"""Search function for LangChain"""
response = requests.get(
'https://www.searchcans.com/api/search',
headers={'Authorization': f'Bearer {API_KEY}'},
params={'q': query, 'engine': 'google', 'num': 10}
)
results = response.json().get('organic_results', [])[:3]
return "\n\n".join([
f"{r['title']}: {r['snippet']}"
for r in results
])
# Create the tool
search_tool = Tool(
name="Web Search",
func=search_tool_func,
description="Useful for finding current information on the internet"
)
# Initialize agent
llm = OpenAI(temperature=0)
agent = initialize_agent(
[search_tool],
llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True
)
# Use it
result = agent.run("What are the latest developments in AI regulation?")
print(result)
LangChain handles the decision-making: when to search, how to interpret results, and when to respond.
Real-World Use Cases
1. Customer Support Bots
A SaaS company built a support bot that:
- Searches their documentation for answers
- Falls back to web search for general questions
- Escalates to humans when unsure
Results: 60% of tickets handled automatically, 4-minute average resolution time.
2. Research Assistants
Academic researchers use AI + search to:
- Find relevant papers quickly
- Summarize current research on topics
- Identify experts and institutions
One research team cut literature review time from 2 weeks to 3 days using this approach.
3. News Monitoring
A hedge fund built an AI agent that:
- Monitors news for portfolio companies
- Summarizes key developments
- Alerts on significant events
The system processes 10,000+ articles daily and sends 5-10 high-priority alerts.
Cost Analysis: What It Actually Costs
Let’s calculate the economics for 10,000 user queries/month:
Search API Costs:
- 10,000 queries × 1 search each = 10,000 API calls
- At $0.50/1K = $5/month
OpenAI Costs:
- Relevance filtering: 10,000 × 3 results × $0.001 = $30/month
- Answer generation: 10,000 × $0.03 = $300/month
- Total OpenAI: $330/month
Total: $335/month for 10,000 enhanced queries
Compare to maintaining your own real-time knowledge base:
- Web crawler infrastructure: $500/month
- Database: $200/month
- Maintenance: 20 hours × $100 = $2,000/month
- Total: $2,700/month
The API approach costs 12% of DIY and requires zero maintenance.
Advanced Patterns
Multi-Step Research
For complex questions, use multiple searches:
def deep_research(question: str) -> str:
"""Multi-step research for complex questions"""
# Step 1: Identify sub-questions
sub_questions = decompose_question(question)
# Step 2: Research each sub-question
sub_answers = []
for sq in sub_questions:
results = search(sq)
answer = synthesize_answer(sq, results)
sub_answers.append(answer)
# Step 3: Combine into final answer
final_answer = synthesize_final(question, sub_answers)
return final_answer
This handles questions like “Compare the economic policies of the last three US presidents” that require multiple research steps.
Source Quality Scoring
Not all search results are equally trustworthy:
def score_source_quality(url: str) -> float:
"""Score source reliability"""
trusted_domains = {
'nytimes.com': 0.95,
'wikipedia.org': 0.85,
'reuters.com': 0.95,
'nature.com': 0.98,
# ... more domains
}
domain = url.split('/')[2]
return trusted_domains.get(domain, 0.5) # Default: medium trust
# Use in context building
weighted_results = sorted(
results,
key=lambda r: score_source_quality(r['link']),
reverse=True
)
This prioritizes authoritative sources in the context sent to the LLM.
The Prompt Engineering That Matters
The quality of your prompts determines output quality. Here’s what works:
Bad Prompt:
Answer: {question}
Context: {context}
Good Prompt:
You are a helpful assistant that answers questions accurately.
Use ONLY the provided context to answer.
If the context doesn't contain the answer, say "I don't have enough information."
Cite sources using [1], [2] notation.
Context:
{context}
Question: {question}
Answer:
The good prompt:
- Sets clear role expectations
- Constrains answers to provided context
- Prevents hallucination
- Requests source attribution
Common Pitfalls
Pitfall 1: Sending too much context
GPT-4 has an 8K token limit. If you send 10 full articles as context, you’ll hit the limit and truncate important information.
Solution: Send only snippets and titles. Let the LLM decide if it needs more detail.
Pitfall 2: Not handling search failures
Search APIs can fail or return no results. Your code needs graceful degradation.
Solution: Always check for empty results and have a fallback response.
Pitfall 3: Trusting everything
LLMs will confidently present information even if it’s wrong.
Solution: Always show sources so users can verify claims.
The Future: Autonomous Agents
The next evolution is AI agents that:
- Decide when to search
- Choose what to search for
- Determine if they need more information
- Execute multi-step plans
We’re building agents using the ReAct pattern:
Thought: I need to find recent news about AI regulation
Action: Search for "AI regulation 2024"
Observation: [search results]
Thought: I found relevant information, let me synthesize it
Action: Respond to user
These agents are getting remarkably good at research tasks that used to require human analysts.
Getting Started Today
If you want to build AI applications with search:
Day 1: Build the basic RAG system (50 lines of code)
Day 2: Add relevance filtering and error handling
Day 3: Implement source attribution
Day 4: Test with real user queries and iterate
Don’t overthink it. The basic pattern works well. You can optimize later.
About the Author: Dr. Emily Zhang completed her PhD in NLP at Stanford and worked on LLM research at Google Brain. She now consults with companies building AI-powered products.
Related Resources
AI Development:
- Build AI Agent with SERP API - Step-by-step tutorial
- AI Agent Integration Guide - Advanced patterns
- SERP API Documentation - Technical reference
Implementation:
- Integration Best Practices - Production tips
- What is SERP API? - Beginner’s guide
- URL Content Extraction - Extract full content
Get Started:
- Free registration - 100 credits
- View pricing - From $0.33/1K
- Try playground - Test instantly
Want to add search capabilities to your AI application? Start with 100 free credits to build your first prototype.