Tutorial 16 min read

Migrate LLM Grounding from Bing Search API: Alternatives for 2026

Learn how to migrate your LLM grounding from the discontinued Bing Search API. Discover reliable alternatives and avoid common pitfalls in 2026.

3,162 words

Migrating LLM grounding from the Bing Search API isn’t just a simple swap; it’s a strategic decision that often feels like pulling a thread on a sweater. I’ve seen too many projects hit a wall because they underestimated the nuances of data quality and API reliability when moving away from a familiar, albeit shifting, ecosystem. The abrupt discontinuation of the traditional Bing Search API has left developers scrambling, forcing a complete rethink of how they acquire real-time web data for their AI applications.

import requests
import os

old_bing_endpoint = "https://api.bing.microsoft.com/v7.0/search"
old_headers = {"Ocp-Apim-Subscription-Key": "your_old_bing_key"}
old_params = {"q": "What's the weather like?", "mkt": "en-US"}

try:
    # This call now likely fails with a 403 or similar
    response = requests.get(old_bing_endpoint, headers=old_headers, params=old_params, timeout=15)
    response.raise_for_status()
    print("Bing Search API still working? That's impossible!")
except requests.exceptions.RequestException as e:
    print(f"Well, that's gone. Error: {e}")
    print("Time to find a new solution for LLM grounding.")

This little snippet above, which used to be a reliable workhorse for many, now mostly serves as a tombstone. It’s a painful reminder that relying on a single vendor for core infrastructure can be a real footgun when they change their strategy without much warning.

Key Takeaways

  • Microsoft discontinued the traditional Bing Search API on August 11, 2025, forcing developers to find alternatives for LLM grounding.
  • The recommended migration to Azure OpenAI Agent with Bing grounding is often cost-prohibitive and overly complex for simple API needs.
  • Alternative search APIs offer different strengths, from AI-optimized snippets to full-page content extraction, crucial for effective LLM grounding.
  • Implementing new search APIs requires careful consideration of data quality, parsing, rate limits, and cost to avoid future migration headaches.

LLM Grounding is/refers to the process of providing large language models with up-to-date, external information to reduce hallucinations and improve factual accuracy. This typically involves feeding real-time data from external sources, often search APIs, into the model’s context window, which can reduce factual errors by up to 80% in various benchmarks.

Why Are Developers Migrating LLM Grounding from Bing Search API?

The primary driver for developers migrating their LLM grounding strategies from the Bing Search API is simple: Microsoft decided to sunset the service on August 11, 2025. This strategic shift affects an estimated 70% of developers who previously relied on these legacy APIs for real-time web search capabilities in their AI applications. The move was less about improving the existing API and more about pushing developers towards Microsoft’s broader Azure AI ecosystem.

Honestly, it felt like a rug pull for many of us. One day you have a stable, if not always perfect, API for pulling search results, and the next you’re told it’s gone. Microsoft’s official stance centered around "strategic alignment" with its Azure OpenAI Agent services, which, while powerful, aren’t a drop-in replacement for a simple web search API. For developers who just needed a programmatic way to get search results, this meant an unexpected and often complex re-architecture. The Bing Search API was a cornerstone for many applications needing up-to-date information, news aggregation, and general web data for RAG (Retrieval-Augmented Generation) pipelines. Its absence creates a significant void, especially for those building agents that need to quickly understand current events or specific product details to answer user queries accurately.

The recommended replacement, "Grounding with Bing Search as part of Azure AI Agents," comes with significant baggage. It’s not a standalone search API; instead, it’s designed to function as an add-on feature for select Microsoft products, demanding deeper integration into Azure’s enterprise ecosystem. Such integration often means creating new Azure AI Agents, configuring resource groups, and handling a dashboard that’s far more complex than just making a REST API call with a simple key. Such platform lock-in makes independent, flexible development much harder. Many developers found themselves needing simple, clean text from web pages for their LLMs, but the effort required to get it post-Bing API became a disproportionate amount of work compared to the actual value it delivered. If you’re looking to clean and parse content efficiently for your LLMs, understanding how to select the right tools for this task is critical. You might find this resource on Pdf Parser Selection Rag Extraction helpful in navigating complex document types.

Which Alternative Search APIs Best Support LLM Grounding?

Over 15 alternative search APIs exist on the market, but only a handful offer the specific features and scalability required for solid LLM grounding, focusing on real-time data access and AI-friendly output formats. Selecting the right replacement involves weighing factors like cost, data quality, result format, and the ability to extract clean content beyond just snippets.

When the Bing Search API went dark, I immediately started looking for replacements. What became clear quickly was that "search API" isn’t a monolithic term. Some alternatives focus on delivering raw SERP data, while others aim to provide pre-processed, AI-optimized snippets or even full-page content. For LLM grounding, raw snippets are often insufficient; you usually need the actual content from the linked pages. The key is finding a service that either directly offers full content extraction or pairs well with another service that does. Many established players exist, each with its own quirks and pricing models. Some are quite pricey, like SerpApi, while newer entrants focus on AI-native outputs. It’s not just about getting search results anymore; it’s about getting data that an LLM can actually use without a ton of extra yak shaving. To explore further alternatives, you can read more about Replace Bing Search Llm Grounding Alternatives.

Here’s a quick comparison of some prominent alternatives for LLM grounding:

Feature/Provider SearchCans Firecrawl Exa Tavily SerpApi Microsoft Azure AI Agent
Search Engine Support Google, Bing Multiple (incl. proprietary) Proprietary Multiple Google, Bing, 25+ Bing (via Azure)
Output Format Structured JSON (SERP), Markdown (Reader) Structured JSON, Markdown Semantic results AI-optimized snippets Structured JSON (SERP) AI-powered summaries
Content Extraction YES (integrated Reader API) YES (Search + Scrape) LIMITED (semantic data) LIMITED (snippets) NO (SERP only) YES (integrated)
Real-time Data YES YES YES YES YES YES
AI-Optimized High (Markdown output) High (structured full page) High (semantic) Medium (snippets) Medium (raw SERP) High (agent-driven)
Pricing Model (approx. per 1K queries) From $0.56/1K (Ultimate) ~$5-10/1K ~$2-5/1K ~$1-3/1K ~$10/1K $35/1K
Complexity for Grounding Low (single API for search+extract) Medium (single API) Medium Low High (needs separate extractor) High (Azure ecosystem)
Dual-Engine Value YES (SERP + Reader) YES NO NO NO YES (integrated)

As you can see, the space is diverse. Some services, like Firecrawl, aim to provide a combined search and scrape solution, similar to SearchCans. Others, like Tavily, focus specifically on short, AI-optimized snippets for quick RAG. My experience tells me that for serious LLM grounding, you need more than just a list of links or a short snippet; you need the actual content, cleaned and ready for processing. That’s why the dual-engine approach is becoming so attractive. It cuts down on the toolchain complexity and the number of API calls you have to manage. Getting LLM grounding data at $0.56 per 1,000 credits on volume plans is a significant cost reduction compared to many alternatives.

How Do You Implement LLM Grounding with a New Search API?

Implementing LLM grounding with a new search API typically involves three core steps: formulating an effective query, retrieving relevant search results, and then extracting clean, LLM-ready content from those results. Careful API integration and data parsing are required to ensure the information fed to your LLM is high quality and free from noise.

My initial thought when replacing the Bing Search API was that it would be a direct one-for-one swap, but that’s rarely the case. Each API has its own quirks for query parameters, response formats, and rate limits. The real challenge comes in bridging the gap between getting a list of URLs and actually having clean, relevant text to pass to an LLM. Many search APIs give you snippets, but those are rarely enough for deep LLM grounding. You end up needing a second tool, a web scraper or reader API, to go out and fetch the content from those URLs. This two-step process means more code, more API keys, and more potential points of failure. That’s where the yak shaving really begins.

The ideal solution minimizes this complexity by combining both search and extraction into a single, cohesive workflow. SearchCans, for example, offers exactly this with its integrated SERP and Reader APIs. You perform a search to find relevant URLs, and then you use the same platform to extract the full content from those pages, transformed into clean Markdown. It’s a pragmatic approach that reduces the overhead of juggling multiple services and simplifies your codebase significantly. For example, if you’re struggling to scrape content effectively for your LLMs, you can check out resources like Scrape Llm Friendly Data Jina for insights into specialized extraction tools.

Here’s the core logic I use to streamline LLM grounding with SearchCans:

import requests
import os
import time

api_key = os.environ.get("SEARCHCANS_API_KEY", "your_api_key")

if api_key == "your_api_key":
    print("Warning: Using placeholder API key. Set SEARCHCANS_API_KEY environment variable for production.")

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

def search_and_extract_for_llm(query: str, num_results: int = 3) -> list[str]:
    """
    Performs a search and extracts LLM-ready markdown from top results.
    """
    extracted_content = []
    
    # Step 1: Search with SERP API (1 credit per request)
    # Production-grade requests should always include timeout and error handling.
    for attempt in range(3): # Simple retry mechanism
        try:
            search_resp = requests.post(
                "https://www.searchcans.com/api/search",
                json={"s": query, "t": "google"},
                headers=headers,
                timeout=15 # Important: set a timeout for network requests
            )
            search_resp.raise_for_status() # Raises HTTPError for bad responses (4xx or 5xx)
            urls = [item["url"] for item in search_resp.json()["data"][:num_results]]
            break # Exit retry loop on success
        except requests.exceptions.RequestException as e:
            print(f"Search API request failed (attempt {attempt+1}/3): {e}")
            if attempt < 2:
                time.sleep(2 ** attempt) # Exponential backoff
            else:
                return [] # No URLs to process after retries
    else: # This block executes if the for loop completes without a 'break'
        print("Failed to get search results after multiple retries.")
        return []

    # Step 2: Extract each URL with Reader API (2 credits per standard request)
    for url in urls:
        for attempt in range(3): # Simple retry mechanism for reader API
            try:
                read_resp = requests.post(
                    "https://www.searchcans.com/api/url",
                    json={"s": url, "t": "url", "b": True, "w": 5000, "proxy": 0}, # b: True for browser mode, w: wait time
                    headers=headers,
                    timeout=15 # Important: set a timeout for network requests
                )
                read_resp.raise_for_status()
                markdown = read_resp.json()["data"]["markdown"]
                extracted_content.append(markdown)
                break # Exit retry loop on success
            except requests.exceptions.RequestException as e:
                print(f"Reader API request for {url} failed (attempt {attempt+1}/3): {e}")
                if attempt < 2:
                    time.sleep(2 ** attempt) # Exponential backoff
                else:
                    print(f"Skipping {url} after multiple retries.")
                    break # Give up on this URL and try the next one

    return extracted_content

if __name__ == "__main__":
    llm_query = "latest AI developments in medical imaging"
    grounding_data = search_and_extract_for_llm(llm_query, num_results=2)

    if grounding_data:
        for i, content in enumerate(grounding_data):
            print(f"\n--- Extracted Content {i+1} for: '{llm_query}' ---")
            print(content[:1000]) # Print first 1000 characters for brevity
            print("...")
    else:
        print("No grounding data could be extracted.")

This code demonstrates how to execute a search and extract content using SearchCans. It brings together two separate steps that are often handled by disparate services into a single, clean API integration. Effectively, this approach solves the bottleneck of efficiently extracting clean, LLM-ready content from search results after identifying relevant URLs, significantly reducing the "yak shaving" typically involved in data preparation. If you’re building out an application that depends on this kind of real-time web data for LLM grounding, having a single API for both steps is a pretty big deal. You can find the full API documentation, including more advanced usage patterns, right here on our API documentation page. This dual-engine workflow for search and extraction starts as low as $0.56/1K credits on volume plans, offering an efficient way to get LLM grounding data.

What Are the Key Challenges in Migrating LLM Grounding?

Migrating LLM grounding from a deprecated API like Bing Search API presents several key challenges, including ensuring data quality and relevance, managing API rate limits and costs, and accurately parsing extracted content for LLM consumption. Without careful planning, developers can quickly run into unexpected technical and financial hurdles.

In my experience, moving from one search API to another for LLM grounding isn’t just about updating endpoint URLs. The underlying data structures are almost always different. What was item["snippet"] might now be item["content"], or it might be a nested JSON object you have to dig into. Then there’s the content itself. Different search engines prioritize different types of results, and the quality of web pages at the top of the SERP can vary wildly. Some pages are loaded with ads, pop-ups, and extraneous JavaScript that make raw content extraction a nightmare. A common footgun for developers is assuming that any HTML will work, when in reality, LLMs need clean, semantically structured text to perform well. A badly parsed page can introduce noise or even factual errors into your LLM’s responses, defeating the whole purpose of grounding. The choice of a replacement API can significantly impact the success of your agent. Many developers are looking at alternatives post-Bing, and you can learn more about why Developers Select Serp Api Post Bing here.

Here are some common pitfalls and strategies to address them:

  1. Ensuring Data Quality and Relevance:
    • Challenge: Different search providers have different indexing and ranking algorithms, which means your original queries might yield less relevant results. Plus, raw HTML from web pages is messy, full of navigation, ads, and other elements that confuse LLMs.
    • Strategy: Test your queries rigorously across the new API. Opt for services that provide full-page content extraction, ideally in a clean, structured format like Markdown, which is far easier for LLMs to process than raw HTML. Such a step-by-step approach ensures you’re not just getting any data, but good data.
  2. Managing API Rate Limits and Costs:
    • Challenge: New APIs often come with different pricing models and rate limits. What was free or cheap on the old API might become prohibitively expensive on a new one, especially if you’re hitting multiple endpoints (search then extract).
    • Strategy: Understand the credit system, evaluate the number of requests per component (e.g., 1 credit for search, 2 for extraction), and look for platforms with transparent, pay-as-you-go billing and high concurrency. A service with Parallel Lanes can significantly improve throughput without hidden hourly caps. For instance, SearchCans offers plans with up to 68 Parallel Lanes, enabling rapid data acquisition for high-volume LLM grounding tasks.
  3. Content Parsing and Pre-processing:
    • Challenge: Turning a raw webpage into something an LLM can digest is non-trivial. Headers, lists, paragraphs, and code blocks all need to be correctly identified and formatted. Doing this yourself is a huge yak shaving task.
    • Strategy: Choose an API that delivers content already formatted for LLMs, such as Markdown. This bypasses the need for custom scraping and parsing logic, saving countless hours of development and maintenance. The Reader API, for example, converts any URL into clean Markdown, drastically simplifying the pre-processing stage.

Ultimately, successful migration hinges on selecting a partner that understands the specific needs of LLM grounding and offers a streamlined solution. SearchCans, for example, is built specifically for AI agents, providing both SERP data and clean Markdown content extraction within a single, unified platform. It processes millions of requests with up to 68 Parallel Lanes, achieving high throughput without hourly limits.

Common Questions About LLM Grounding API Migration?

Q: Why is Microsoft shifting away from traditional Bing Search APIs for LLM grounding?

A: Microsoft’s discontinuation of the traditional Bing Search API on August 11, 2025, was a strategic move to integrate search functionality more deeply within its Azure OpenAI Agent ecosystem. The shift directs developers towards a platform-centric approach, using Bing Search as a grounding tool within Azure AI services, rather than offering it as a standalone, general-purpose web search API. The change impacts a large number of developers, with an estimated 70% of previous users now needing to adapt their data retrieval strategies.

Q: How do alternative search APIs compare in terms of cost and performance for LLM grounding?

A: Alternative search APIs vary significantly in cost and performance for LLM grounding. Some providers, like SerpApi, can cost around $10 per 1,000 requests for SERP data alone. In contrast, services like SearchCans offer dual-engine capabilities (search and clean content extraction) from as low as $0.56/1K credits on volume plans, representing savings of up to 18x. Performance also differs, with some APIs providing only snippets, while others offer full-page Markdown, which is crucial for effective LLM grounding.

Q: What are common pitfalls when integrating a new search API for LLM RAG?

A: Common pitfalls when integrating a new search API for LLM grounding include dealing with inconsistent data formats, managing new rate limits, and handling the often-messy content from raw web pages. Many developers underestimate the "yak shaving" involved in parsing HTML into LLM-ready text, leading to poor grounding results or inflated costs. Choosing an API that provides structured, clean data directly, such as Markdown output, can drastically reduce these integration challenges and maintenance overhead. For more detailed insights on advanced data extraction methods, explore Extract Advanced Google Serp Data using modern APIs.

Q: Can I use different proxy types for LLM grounding data extraction?

A: Yes, many advanced reader APIs offer multi-tier proxy pools, allowing you to select different proxy types for data extraction. For instance, SearchCans’ Reader API provides options for Shared (+2 credits), Datacenter (+5 credits), and Residential (+10 credits) proxies, in addition to its standard 0-credit proxy pool. These proxy options are independent of browser rendering mode and can be specified per request to ensure optimal access and data integrity from various target websites.

Stop grappling with fragmented APIs and sky-high costs for your AI agents. With SearchCans, you get real-time SERP data and LLM-grounding ready Markdown content in a single, unified platform, all starting as low as $0.56/1K credits on volume plans. Kick off your projects with 100 free credits and see the difference in your LLM grounding accuracy and development speed. Sign up for free today and get started.

Tags:

Tutorial LLM RAG API Development Python Integration
SearchCans Team

SearchCans Team

SERP API & Reader API Experts

The SearchCans engineering team builds high-performance search APIs serving developers worldwide. We share practical tutorials, best practices, and insights on SERP data, web scraping, RAG pipelines, and AI integration.

Ready to build with SearchCans?

Get started with our SERP API & Reader API. Starting at $0.56 per 1,000 queries. No credit card required for your free trial.