Analytics Categories
Retrieval Performance
Track latency, cache hit rates, and stage-level breakdowns for retrievers.
Infrastructure Metrics
Monitor API response times, Engine throughput, and inference service health.
Feature Extraction
Measure extractor execution time, failure rates, and batch processing efficiency.
User Signals
Analyze interaction patterns (clicks, long views, feedback) to tune relevance.
Key Metrics Explained
Retrieval Metrics
| Metric | API Endpoint | Use Case |
|---|---|---|
| Retriever Performance | GET /v1/analytics/retrievers/{retriever_id}/performance | Overall latency percentiles (p50, p95, p99), cache hit rate, error rate |
| Stage Breakdown | GET /v1/analytics/retrievers/{retriever_id}/stages | Per-stage execution time to identify slow stages (e.g., LLM generation vs vector search) |
| Retriever Signals | GET /v1/analytics/retrievers/{retriever_id}/signals | User interactions aggregated by result position, document, or session |
| Slowest Queries | GET /v1/analytics/retrievers/{retriever_id}/slowest | Identify outlier executions with full input payloads for debugging |
Infrastructure Metrics
| Metric | API Endpoint | Use Case |
|---|---|---|
| API Performance | GET /v1/analytics/api/performance | Request latency by endpoint and status code |
| Engine Performance | GET /v1/analytics/engine/performance | Ray task execution time, queue depth, worker utilization |
| Inference Performance | GET /v1/analytics/inference/performance | Model latency (embedding, LLM, classification) and throughput |
Extraction Metrics
| Metric | API Endpoint | Use Case |
|---|---|---|
| Extractor Performance | GET /v1/analytics/extractors/{extractor_name}/performance | Avg processing time per object, failure rate, retry count |
| Batch Efficiency | Derived from extractor + engine metrics | Compare batch sizes vs throughput to optimize batching strategy |
Cache Metrics
| Metric | API Endpoint | Use Case |
|---|---|---|
| Cache Performance | GET /v1/analytics/cache/performance | Hit rate, average TTL, eviction rate, memory usage |
Usage & Cost Metrics
| Metric | API Endpoint | Use Case |
|---|---|---|
| Usage Summary | GET /v1/analytics/usage/summary | Credit consumption by resource type (API calls, inference, storage) |
| Org Usage | GET /v1/organizations/usage | Per-org breakdown for billing and quota enforcement |
| API Key Usage | GET /v1/organizations/usage/api-keys/{key_id} | Attribute costs to specific services or teams |
Optimization Workflows
1. Diagnose Slow Retrievers
Symptoms: High p95/p99 latency, user complaints about wait times Steps:- Call
/analytics/retrievers/{retriever_id}/performanceto get overall latency distribution - Check
cache_hit_rate– if low, consider increasingcache_config.ttl_secondsor caching more stages - Call
/analytics/retrievers/{retriever_id}/stagesto identify the bottleneck stage - Common culprits:
- LLM generation stages – consider smaller models, prompt caching, or async processing
- Large limit values – reduce
limitin early stages and use rerankers - Complex filters – move structured filters before expensive search stages
- Call
/analytics/retrievers/{retriever_id}/slowestto inspect outlier queries with full inputs
2. Improve Cache Hit Rates
Target: >70% cache hit rate for common queries Strategies:- Enable caching at the retriever level:
cache_config.enabled = true - Increase TTL for stable queries:
cache_config.ttl_seconds = 3600 - Use
cache_stage_namesto selectively cache expensive stages (e.g., web search, LLM) - Normalize inputs (lowercase, trim whitespace) before hashing for cache keys
- Monitor
/analytics/cache/performanceto track hit rate trends
3. Optimize Feature Extraction
Symptoms: Batch processing takes hours, high extractor failure rate Steps:- Check
/analytics/extractors/{extractor_name}/performancefor avg execution time - Compare across extractors – identify if one is significantly slower
- Strategies:
- Batch sizing – 100-1000 objects optimal; adjust based on object size
- Parallelism – increase Ray workers or use GPU instances for vision/LLM extractors
- Retry logic – review
__missing_featuresin documents to identify flaky extractors - Model selection – swap heavy models for distilled versions (e.g.,
distilbertvsbert-large)
- Monitor
/analytics/engine/performancefor worker saturation
4. Tune Inference Budgets
Goal: Balance cost and latency Steps:- Review
/analytics/usage/summaryto see inference credit consumption by model - Identify high-volume, low-value queries (e.g., exploratory searches)
- Apply budget limits at the retriever level:
- Use cheaper models for initial ranking (e.g.,
multilingual-e5-base) and expensive rerankers only for top-K results
5. Leverage User Signals for Relevance Tuning
Workflow:- Instrument your app to send interactions via
/v1/retrievers/{retriever_id}/interactionsclick,long_view,positive_feedback,negative_feedback
- Query
/analytics/retrievers/{retriever_id}/signalsto see:- Click-through rate by result position
- Documents with high negative feedback
- Queries with zero engagement
- Use insights to:
- Adjust stage ordering (e.g., move filters earlier to reduce noise)
- Update taxonomy mappings or cluster definitions
- Fine-tune reranker prompts or scoring functions
- A/B test changes by creating retriever variants and comparing signal distributions
Recommended Dashboards
Production Health Dashboard
Metrics to display:- API p95 latency (target: <500ms)
- Cache hit rate (target: >70%)
- Error rate by endpoint (target: <1%)
- Engine queue depth (alert if >100 pending tasks)
- Inference quota remaining (% of plan)
Retriever Performance Dashboard
Metrics to display:- Per-retriever p95 latency (grouped by retriever_id)
- Cache hit rate by retriever
- Top 5 slowest queries (with input previews)
- Stage breakdown for critical retrievers
- Click-through rate trends (weekly aggregation)
Cost Optimization Dashboard
Metrics to display:- Inference cost per retriever execution (derived:
inference_calls * model_cost) - Storage growth rate (documents + features + cache)
- API key usage breakdown (top consumers)
- Batch processing cost per object (extractor time * worker cost)
Alerting Recommendations
Critical: API Error Rate >5%
Critical: API Error Rate >5%
Trigger:
Action: Check health endpoint
/analytics/api/performance shows error rate >5% over 5 minAction: Check health endpoint
/v1/health, inspect logs for Qdrant/Mongo connectivity issuesWarning: Retriever p95 Latency >2s
Warning: Retriever p95 Latency >2s
Trigger: Specific retriever p95 >2000ms for 10 min
Action: Review stage breakdown, disable non-critical stages temporarily, increase cache TTL
Action: Review stage breakdown, disable non-critical stages temporarily, increase cache TTL
Warning: Cache Hit Rate <50%
Warning: Cache Hit Rate <50%
Trigger: Cache hit rate drops below 50% for 30 min
Action: Review cache config, check if query patterns changed (seasonality, A/B tests)
Action: Review cache config, check if query patterns changed (seasonality, A/B tests)
Info: High Inference Quota Usage
Info: High Inference Quota Usage
Trigger: 80% of monthly inference quota consumed
Action: Review top consumers via
Action: Review top consumers via
/analytics/usage/summary, throttle exploratory retrieversCritical: Engine Queue Depth >500
Critical: Engine Queue Depth >500
Trigger: Ray queue depth exceeds 500 pending tasks
Action: Scale Ray workers, pause batch submissions, investigate slow extractors
Action: Scale Ray workers, pause batch submissions, investigate slow extractors
Analytics API Patterns
Time-Windowed Queries
Most endpoints acceptstart_time and end_time filters (ISO 8601):
Aggregation Levels
Control granularity withgroup_by:
Tuning Recommendations
The/analytics/analyze-for-tuning endpoint provides automated suggestions:
- Recommended stage order changes
- Cache config suggestions (TTL, stage-level caching)
- Budget limit recommendations
- Model swap suggestions (cost vs latency trade-offs)
Best Practices
- Baseline before optimizing – capture 7 days of metrics before making changes
- Change one variable at a time – isolate the impact of each optimization
- Monitor post-deployment – use execution IDs to compare before/after distributions
- Set SLOs early – define p95 latency and cache hit targets per retriever tier
- Correlate signals with changes – timestamp config updates and overlay on metric charts
- Automate reporting – schedule weekly summaries via webhooks or export to your BI tool
Integration with External Tools
Export to Datadog / Grafana
Use the Analytics API to pull metrics into your existing observability stack:Webhook Alerts
Configure webhooks to push anomaly alerts:Next Steps
- Explore all analytics endpoints in the API Reference
- Learn how to record Interactions for signal tracking
- Review Caching Strategies for hit rate optimization
- Set up Webhooks for automated alerting

