Skip to main content
POST
/
v1
/
collections
/
{collection_identifier}
/
documents
/
list
List documents.
curl --request POST \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier}/documents/list \
  --header 'Content-Type: application/json' \
  --data '
{
  "filters": {
    "AND": [
      {
        "field": "name",
        "operator": "eq",
        "value": "John"
      },
      {
        "field": "age",
        "operator": "gte",
        "value": 30
      }
    ],
    "OR": [
      {
        "field": "status",
        "operator": "eq",
        "value": "active"
      },
      {
        "field": "role",
        "operator": "eq",
        "value": "admin"
      }
    ],
    "NOT": [
      {
        "field": "department",
        "operator": "eq",
        "value": "HR"
      },
      {
        "field": "location",
        "operator": "eq",
        "value": "remote"
      }
    ],
    "case_sensitive": true
  },
  "sort": {
    "field": "created_at",
    "direction": "desc"
  },
  "search": "<string>",
  "cursor": "<string>",
  "return_url": false,
  "return_vectors": false,
  "group_by": "source_object_id"
}
'
{
  "pagination": {
    "total": 123,
    "page": 123,
    "page_size": 123,
    "total_pages": 123,
    "next_page": "<string>",
    "previous_page": "<string>",
    "next_cursor": "<string>"
  },
  "results": [
    {
      "document_id": "<string>",
      "collection_id": "<string>",
      "_internal": {
        "collection_id": "col_articles",
        "created_at": "2025-10-31T10:00:00Z",
        "document_id": "doc_f8966ff29c",
        "internal_id": "org_abc123",
        "lineage": {
          "path": "bkt_content/col_articles",
          "root_bucket_id": "bkt_content",
          "root_object_id": "obj_article_001",
          "source_object_id": "obj_article_001",
          "source_type": "bucket"
        },
        "metadata": {
          "ingestion_status": "COMPLETED"
        },
        "modality": "text",
        "namespace_id": "ns_xyz789",
        "updated_at": "2025-10-31T10:00:00Z"
      }
    }
  ],
  "groups": [
    {
      "group_key": "<unknown>",
      "documents": [
        {
          "document_id": "<string>",
          "collection_id": "<string>",
          "_internal": {
            "collection_id": "col_articles",
            "created_at": "2025-10-31T10:00:00Z",
            "document_id": "doc_f8966ff29c",
            "internal_id": "org_abc123",
            "lineage": {
              "path": "bkt_content/col_articles",
              "root_bucket_id": "bkt_content",
              "root_object_id": "obj_article_001",
              "source_object_id": "obj_article_001",
              "source_type": "bucket"
            },
            "metadata": {
              "ingestion_status": "COMPLETED"
            },
            "modality": "text",
            "namespace_id": "ns_xyz789",
            "updated_at": "2025-10-31T10:00:00Z"
          }
        }
      ],
      "count": 123
    }
  ],
  "stats": {
    "total_documents": 0,
    "avg_blobs_per_document": 0,
    "total_groups": 123,
    "avg_documents_per_group": 123
  },
  "group_by_field": "source_object_id"
}

Headers

Authorization
string

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer YOUR_API_KEY"

"Bearer YOUR_STRIPE_API_KEY"

X-Namespace
string

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

collection_identifier
string
required

The ID of the collection to list documents from.

Query Parameters

limit
integer | null
Required range: 1 <= x <= 100
offset
integer | null
Required range: x >= 0
cursor
string | null

Body

application/json

Request model for listing documents.

Supports two pagination strategies:

Offset-based (default): Use query params ?page=2&page_size=10

  • Simple and familiar
  • Works well for shallow pagination (first ~100 pages)
  • Less efficient for deep pagination with sorting

Cursor-based (optional): Pass cursor from previous response's next_cursor

  • More efficient for deep pagination (page 100+)
  • Required for consistent results when sorting large datasets
  • When cursor is provided, offset is ignored
filters
LogicalOperator · object

Filters to apply.

sort
SortOption · object

Sort options.

search
string | null

Search term.

cursor
string | null

OPTIONAL cursor for efficient deep pagination. Pass the 'pagination.next_cursor' value from a previous response to fetch the next page. When cursor is provided, the page/offset query params are ignored. Use cursor-based pagination when: (1) paginating beyond page ~100, (2) sorting large datasets, or (3) you need consistent iteration. Use offset-based pagination (default) for: simple use cases, random page access, or when page numbers are needed in the UI.

return_url
boolean | null
default:false

Whether to return presigned URLs for object keys.

return_vectors
boolean | null
default:false

Whether to return vector embeddings in the document results.

group_by
string | null

OPTIONAL. Field to group documents by. Supports dot notation for nested fields (e.g., 'metadata.category', 'source_type'). When specified, documents are grouped by the field value and returned as grouped results. Requires a payload index on the field in Qdrant for optimal performance. If no index exists, the operation will fail with a validation error. Common groupable fields: 'source_object_id', 'root_object_id', 'collection_id', 'metadata.category'.

Example:

"source_object_id"

Response

Successful Response

Response model for listing documents.

Supports both regular document lists and grouped results based on the group_by parameter. When group_by is specified, results are returned as groups instead of a flat list.

Pagination strategies:

  • Offset-based (default): Use pagination.page and pagination.page_size
  • Cursor-based (optional): Use pagination.next_cursor for efficient deep pagination
pagination
PaginationResponse · object
required

Pagination information. Includes next_cursor for cursor-based pagination. When group_by is used, pagination applies to groups (not individual documents). total_count reflects total number of groups, not total documents.

results
DocumentResponse · object[] | null

List of documents when group_by is NOT specified. Contains flat list of documents with pagination applied. Mutually exclusive with 'groups' field.

groups
DocumentGroup · object[] | null

List of document groups when group_by IS specified. Each group contains documents sharing the same field value. Pagination applies to groups, not individual documents. Mutually exclusive with 'results' field.

stats
DocumentListStats · object

Aggregate statistics across all documents in the result

group_by_field
string | null

The field that was used for grouping when group_by was specified. None for non-grouped results. Useful for clients to understand the grouping structure.

Example:

"source_object_id"