Skip to main content
POST
/
v1
/
collections
/
{collection_identifier}
/
documents
/
list
List documents.
curl --request POST \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier}/documents/list \
  --header 'Content-Type: application/json' \
  --data '
{
  "filters": {
    "AND": [
      {
        "field": "name",
        "operator": "eq",
        "value": "John"
      },
      {
        "field": "age",
        "operator": "gte",
        "value": 30
      }
    ],
    "OR": [
      {
        "field": "status",
        "operator": "eq",
        "value": "active"
      },
      {
        "field": "role",
        "operator": "eq",
        "value": "admin"
      }
    ],
    "NOT": [
      {
        "field": "department",
        "operator": "eq",
        "value": "HR"
      },
      {
        "field": "location",
        "operator": "eq",
        "value": "remote"
      }
    ],
    "case_sensitive": true
  },
  "sort": {
    "field": "created_at",
    "direction": "desc"
  },
  "search": "<string>",
  "cursor": "<string>",
  "return_url": false,
  "return_vectors": false,
  "group_by": "source_object_id",
  "select": [
    "metadata.title",
    "content"
  ]
}
'
{
  "pagination": {
    "total": 123,
    "page": 123,
    "page_size": 123,
    "total_pages": 123,
    "next_page": "<string>",
    "previous_page": "<string>",
    "next_cursor": "<string>"
  },
  "results": [
    {
      "document_id": "<string>",
      "collection_id": "<string>",
      "_internal": {
        "collection_id": "col_articles",
        "created_at": "2025-10-31T10:00:00Z",
        "document_id": "doc_f8966ff29c",
        "internal_id": "org_abc123",
        "lineage": {
          "path": "bkt_content/col_articles",
          "root_bucket_id": "bkt_content",
          "root_object_id": "obj_article_001",
          "source_object_id": "obj_article_001",
          "source_type": "bucket"
        },
        "metadata": {
          "ingestion_status": "COMPLETED"
        },
        "modality": "text",
        "namespace_id": "ns_xyz789",
        "updated_at": "2025-10-31T10:00:00Z"
      }
    }
  ],
  "groups": [
    {
      "group_key": "<unknown>",
      "documents": [
        {
          "document_id": "<string>",
          "collection_id": "<string>",
          "_internal": {
            "collection_id": "col_articles",
            "created_at": "2025-10-31T10:00:00Z",
            "document_id": "doc_f8966ff29c",
            "internal_id": "org_abc123",
            "lineage": {
              "path": "bkt_content/col_articles",
              "root_bucket_id": "bkt_content",
              "root_object_id": "obj_article_001",
              "source_object_id": "obj_article_001",
              "source_type": "bucket"
            },
            "metadata": {
              "ingestion_status": "COMPLETED"
            },
            "modality": "text",
            "namespace_id": "ns_xyz789",
            "updated_at": "2025-10-31T10:00:00Z"
          }
        }
      ],
      "count": 123
    }
  ],
  "stats": {
    "total_documents": 0,
    "avg_blobs_per_document": 0,
    "total_groups": 123,
    "avg_documents_per_group": 123
  },
  "group_by_field": "source_object_id"
}

Headers

Authorization
string

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer YOUR_API_KEY"

"Bearer YOUR_STRIPE_API_KEY"

X-Namespace
string

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

collection_identifier
string
required

The ID of the collection to list documents from.

Query Parameters

limit
integer | null
Required range: 1 <= x <= 1000
offset
integer | null
Required range: 0 <= x <= 10000
cursor
string | null
include_total
boolean
default:false

Body

application/json

Request model for listing documents.

Supports two pagination strategies:

Offset-based (default): Use query params ?page=2&page_size=10

  • Simple and familiar
  • Works well for shallow pagination (first ~100 pages)
  • Less efficient for deep pagination with sorting

Cursor-based (optional): Pass cursor from previous response's next_cursor

  • More efficient for deep pagination (page 100+)
  • Required for consistent results when sorting large datasets
  • When cursor is provided, offset is ignored
filters
LogicalOperator · object

Filters to apply.

sort
SortOption · object

Sort options.

search
string | null

Search term.

cursor
string | null

OPTIONAL cursor for efficient deep pagination. Pass the 'pagination.next_cursor' value from a previous response to fetch the next page. When cursor is provided, the page/offset query params are ignored. Use cursor-based pagination when: (1) paginating beyond page ~100, (2) sorting large datasets, or (3) you need consistent iteration. Use offset-based pagination (default) for: simple use cases, random page access, or when page numbers are needed in the UI.

return_url
boolean | null
default:false

Whether to return presigned URLs for object keys.

return_vectors
boolean | null
default:false

Whether to return vector embeddings in the document results.

group_by
string | null

OPTIONAL. Field to group documents by. Supports dot notation for nested fields (e.g., 'metadata.category', 'source_type'). When specified, documents are grouped by the field value and returned as grouped results. Requires a payload index on the field in Qdrant for optimal performance. If no index exists, the operation will fail with a validation error. Common groupable fields: 'source_object_id', 'root_object_id', 'collection_id', 'metadata.category'.

Example:

"source_object_id"

select
string[] | null

OPTIONAL. List of fields to include in the response. Supports dot notation for nested fields (e.g., 'metadata.title', 'content'). When specified, only the selected fields will be returned in the document results, reducing response size. System fields like '_id' and 'document_id' are always included. Use this to optimize response size when working with large documents.

Example:
["metadata.title", "content"]

Response

Successful Response

Response model for listing documents.

Supports both regular document lists and grouped results based on the group_by parameter. When group_by is specified, results are returned as groups instead of a flat list.

Pagination strategies:

  • Offset-based (default): Use pagination.page and pagination.page_size
  • Cursor-based (optional): Use pagination.next_cursor for efficient deep pagination
pagination
PaginationResponse · object
required

Pagination information. Includes next_cursor for cursor-based pagination. When group_by is used, pagination applies to groups (not individual documents). total_count reflects total number of groups, not total documents.

results
DocumentResponse · object[] | null

List of documents when group_by is NOT specified. Contains flat list of documents with pagination applied. Mutually exclusive with 'groups' field.

groups
DocumentGroup · object[] | null

List of document groups when group_by IS specified. Each group contains documents sharing the same field value. Pagination applies to groups, not individual documents. Mutually exclusive with 'results' field.

stats
DocumentListStats · object

Aggregate statistics across all documents in the result

group_by_field
string | null

The field that was used for grouping when group_by was specified. None for non-grouped results. Useful for clients to understand the grouping structure.

Example:

"source_object_id"