Ingestion Issues
Objects Not Processing
Symptoms:- Batch submitted but documents never appear in collection
- Task shows
COMPLETEDbut__fully_enriched: false
1. Invalid Input Mappings
1. Invalid Input Mappings
Problem: Extractor expects field that doesn’t exist in object metadata.Fix: Correct input mapping to match object schema:
2. Missing Blob URLs
2. Missing Blob URLs
Problem: Object references S3 URLs that don’t exist or are inaccessible.Fix:
- Verify S3 URLs are accessible from Engine
- Check S3 connection credentials
- Ensure bucket CORS/permissions allow Engine access
3. Extractor Failures
3. Extractor Failures
Problem: Feature extractor crashed or timed out.Check: Fix:
__missing_features field in documents:- Check Engine logs for OOM errors (reduce batch size)
- Verify model is available in registry
- Retry processing for affected objects
4. Schema Validation Failures
4. Schema Validation Failures
Problem: Object doesn’t match bucket schema.Check:Fix: Update object metadata or relax bucket schema
Slow Batch Processing
Symptoms:- Batches take hours to process
- Task stuck in
PROCESSINGstate
| Cause | Solution |
|---|---|
| Large batch size | Reduce to 100-1000 objects per batch |
| Heavy extractors | Switch to lighter models (e.g., whisper-base vs large) |
| Ray worker saturation | Scale workers or process during off-peak |
| S3 download bottleneck | Increase max_concurrent_downloads in Engine config |
| GPU unavailable | Check Ray cluster GPU allocation |
Retrieval Issues
Zero Results
Symptoms:- Retriever returns empty results for valid queries
- Works in one namespace but not another
1. No Documents in Collection
1. No Documents in Collection
Check: Collection is empty or batch processing incomplete.Fix:
- Verify batch was submitted and completed
- Check
__fully_enricheddocument count
2. Overly Restrictive Filters
2. Overly Restrictive Filters
Problem: Filters exclude all documents.Fix: Remove filters temporarily to test, then adjust filter values.
3. Feature Address Mismatch
3. Feature Address Mismatch
Problem: Retriever queries feature that doesn’t exist.Fix: Verify feature address matches collection output schema:
4. Namespace Mismatch
4. Namespace Mismatch
Problem: Querying wrong namespace.Check: Ensure
X-Namespace header matches collection namespace.Poor Result Quality
Symptoms:- Irrelevant results appear at top
- Known-good documents missing from results
| Issue | Solution |
|---|---|
| Semantic mismatch | Use hybrid search (dense + sparse) |
| Wrong model | A/B test with different embedding models |
| Missing reranking | Add rerank stage after initial search |
| Stale cache | Invalidate retriever cache after reindexing |
Low limit value | Increase limit in search stage |
Authentication & Authorization
401 Unauthorized
Symptoms:- All API calls return 401
- “Invalid API key” error
- Missing
Bearerprefix - Expired or deleted key
- Key belongs to different organization
- Using test key in production
403 Forbidden
Symptoms:- Some operations work, others return 403
- “Insufficient permissions” error
429 Too Many Requests
Symptoms:- Requests start failing after sustained traffic
Retry-Afterheader present
Performance Issues
High Latency
Symptoms:- p95 latency >2s
- Users complain about slow searches
| Stage | Typical Latency | If Slow, Check |
|---|---|---|
knn_search | <50ms | Qdrant health, index size, limit value |
filter | <10ms | Filter complexity, indexed fields |
rerank | 50-200ms | Reranking model, top_k value |
llm_generation | 500-2000ms | Model size, max_tokens, prompt length |
web_search | 200-500ms | External API latency, cache hit rate |
- Enable caching (TTL=300s+)
- Reduce
limitvalues in early stages - Use smaller/faster models
- Move filters before expensive stages
- Scale Qdrant replicas
Cache Not Working
Symptoms:- Cache hit rate <10% despite repetitive queries
cache_hit: falsein all responses
1. Cache Disabled
1. Cache Disabled
Check:
cache_config.enabled: falseFix:2. High Query Variability
2. High Query Variability
Problem: Every query is unique (includes session IDs, timestamps, etc.)Fix: Exclude variable fields from cache key:
3. TTL Too Short
3. TTL Too Short
Problem: Cache entries expire before reuse.Fix: Increase TTL based on update frequency:
Data Integrity Issues
Missing Documents
Symptoms:- Documents created but not queryable
- Lineage broken (can’t find source object)
- Ensure
internal_idfilter is correct - Check namespace isolation
- Verify Qdrant collection wasn’t manually deleted
- Reprocess affected objects
Duplicate Documents
Symptoms:- Same content appears multiple times
source_object_iddiffers but content identical
Integration Issues
S3 Connection Failures
Symptoms:- Objects fail to process
- “Unable to download blob” errors
- Verify AWS credentials (access key, secret key)
- Check bucket CORS policy allows Engine IP
- Ensure bucket region matches connection config
- For LocalStack (dev), verify endpoint URL
Webhook Not Triggering
Symptoms:- Events occur but webhook not called
- No webhook delivery logs
- Webhook URL unreachable from Mixpeek (firewall, DNS)
- Endpoint returns non-2xx status (webhook disabled after failures)
- Event type filter too restrictive
- HTTPS certificate issues
Getting Help
If issues persist:-
Check service health:
-
Collect diagnostics:
- Task ID
- Execution ID
- Collection/Retriever IDs
- Request payloads (sanitized)
-
Contact support:
- Use “Talk to Engineers” CTA
- Include error messages and timestamps
- Share curl commands (with API key redacted)
Next Steps
- Review Errors for error code reference
- Check Limits for quota constraints
- Monitor with Analytics
- Optimize with Best Practices

