Skip to main content
External Web Search stage showing Exa API integration for web results
The External Web Search stage integrates Exa’s neural search API to augment your results with real-time web content. This enables hybrid retrieval combining your indexed documents with fresh web results.
Stage Category: APPLY (Enriches pipeline with web results)Transformation: N documents → N + M documents (web results added)

When to Use

Use CaseDescription
Knowledge augmentationSupplement internal docs with web content
Real-time informationAccess current events, news, updates
Research expansionBroaden search beyond your corpus
Competitive intelligenceInclude competitor content in results

When NOT to Use

ScenarioRecommended Alternative
Internal-only searchSkip this stage
Sensitive/confidential queriesUse only indexed content
Low-latency requirementsWeb search adds 200-500ms

Parameters

ParameterTypeDefaultDescription
querystringRequiredSearch query (supports templates)
num_resultsinteger10Number of web results to retrieve
include_domainsarraynullWhitelist of domains to search
exclude_domainsarraynullBlacklist of domains to exclude
start_published_datestringnullFilter by publish date (ISO 8601)
categorystringnullContent category filter
use_autopromptbooleantrueLet Exa optimize the query
contentsobject{}Content extraction options

Available Categories

CategoryDescription
companyCompany websites and profiles
research_paperAcademic and research content
newsNews articles
pdfPDF documents
githubGitHub repositories
tweetTwitter/X content
personal_sitePersonal websites and blogs

Configuration Examples

{
  "stage_type": "apply",
  "stage_id": "external_web_search",
  "parameters": {
    "query": "{{INPUT.query}}",
    "num_results": 10
  }
}

Content Extraction Options

OptionTypeDescription
textbooleanExtract full page text
highlightsbooleanExtract relevant snippets
summarybooleanGenerate content summary

Output Schema

Web results are added to the document set with a source: "web" marker:
{
  "document_id": "web_abc123",
  "source": "web",
  "url": "https://example.com/article",
  "title": "Article Title",
  "content": "Full extracted text content...",
  "highlights": ["Relevant snippet 1", "Relevant snippet 2"],
  "published_date": "2024-03-15T10:30:00Z",
  "author": "John Doe",
  "score": 0.95,
  "metadata": {
    "domain": "example.com",
    "category": "news"
  }
}
Exa uses neural search rather than keyword matching:
FeatureDescription
Semantic understandingUnderstands query intent
Neural rankingML-based relevance scoring
Content extractionAutomatic text extraction
AutopromptQuery optimization for better results
Enable use_autoprompt (default) for natural language queries. Disable it when you need exact phrase matching or have already optimized your query.

Performance

MetricValue
Latency200-500ms
Rate limitsBased on Exa plan
Parallel executionConcurrent with pipeline
CachingResults cached for 1 hour

Common Pipeline Patterns

[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 20
    }
  },
  {
    "stage_type": "apply",
    "stage_id": "external_web_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "num_results": 10
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  }
]

Web-Augmented RAG

[
  {
    "stage_type": "filter",
    "stage_id": "hybrid_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 30
    }
  },
  {
    "stage_type": "apply",
    "stage_id": "external_web_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "num_results": 5,
      "category": "news",
      "start_published_date": "{{INPUT.date_filter}}"
    }
  },
  {
    "stage_type": "apply",
    "stage_id": "rag_prepare",
    "parameters": {
      "max_tokens": 8000,
      "output_mode": "single_context"
    }
  }
]

Error Handling

ErrorBehavior
API rate limitRetry with backoff
Network timeoutStage fails gracefully, no web results
Invalid domainIgnored, other domains searched
No results foundEmpty web result set
Web search results may include content from untrusted sources. Consider filtering or validating web content before using in sensitive applications.