The External Web Search stage integrates Exa’s neural search API to augment your results with real-time web content. This enables hybrid retrieval combining your indexed documents with fresh web results.
Stage Category : APPLY (Enriches pipeline with web results)Transformation : N documents → N + M documents (web results added)
When to Use
Use Case Description Knowledge augmentation Supplement internal docs with web content Real-time information Access current events, news, updates Research expansion Broaden search beyond your corpus Competitive intelligence Include competitor content in results
When NOT to Use
Scenario Recommended Alternative Internal-only search Skip this stage Sensitive/confidential queries Use only indexed content Low-latency requirements Web search adds 200-500ms
Parameters
Parameter Type Default Description querystring Required Search query (supports templates) num_resultsinteger 10Number of web results to retrieve include_domainsarray nullWhitelist of domains to search exclude_domainsarray nullBlacklist of domains to exclude start_published_datestring nullFilter by publish date (ISO 8601) categorystring nullContent category filter use_autopromptboolean trueLet Exa optimize the query contentsobject {}Content extraction options
Available Categories
Category Description companyCompany websites and profiles research_paperAcademic and research content newsNews articles pdfPDF documents githubGitHub repositories tweetTwitter/X content personal_sitePersonal websites and blogs
Configuration Examples
Basic Web Search
Domain-Filtered Search
Recent News Search
Research Papers
GitHub Code Search
{
"stage_type" : "apply" ,
"stage_id" : "external_web_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"num_results" : 10
}
}
Option Type Description textboolean Extract full page text highlightsboolean Extract relevant snippets summaryboolean Generate content summary
Output Schema
Web results are added to the document set with a source: "web" marker:
{
"document_id" : "web_abc123" ,
"source" : "web" ,
"url" : "https://example.com/article" ,
"title" : "Article Title" ,
"content" : "Full extracted text content..." ,
"highlights" : [ "Relevant snippet 1" , "Relevant snippet 2" ],
"published_date" : "2024-03-15T10:30:00Z" ,
"author" : "John Doe" ,
"score" : 0.95 ,
"metadata" : {
"domain" : "example.com" ,
"category" : "news"
}
}
Exa Neural Search
Exa uses neural search rather than keyword matching:
Feature Description Semantic understanding Understands query intent Neural ranking ML-based relevance scoring Content extraction Automatic text extraction Autoprompt Query optimization for better results
Enable use_autoprompt (default) for natural language queries. Disable it when you need exact phrase matching or have already optimized your query.
Metric Value Latency 200-500ms Rate limits Based on Exa plan Parallel execution Concurrent with pipeline Caching Results cached for 1 hour
Common Pipeline Patterns
Internal + Web Hybrid Search
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 20
}
},
{
"stage_type" : "apply" ,
"stage_id" : "external_web_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"num_results" : 10
}
},
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 10
}
}
]
Web-Augmented RAG
[
{
"stage_type" : "filter" ,
"stage_id" : "hybrid_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 30
}
},
{
"stage_type" : "apply" ,
"stage_id" : "external_web_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"num_results" : 5 ,
"category" : "news" ,
"start_published_date" : "{{INPUT.date_filter}}"
}
},
{
"stage_type" : "apply" ,
"stage_id" : "rag_prepare" ,
"parameters" : {
"max_tokens" : 8000 ,
"output_mode" : "single_context"
}
}
]
Error Handling
Error Behavior API rate limit Retry with backoff Network timeout Stage fails gracefully, no web results Invalid domain Ignored, other domains searched No results found Empty web result set
Web search results may include content from untrusted sources. Consider filtering or validating web content before using in sensitive applications.