POST
/
extract
curl --location 'https://api.mixpeek.com/extract' \
--header 'Authorization: Bearer API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "file_url": "https://example.com/document.pdf",
    "should_chunk": true
}'
{
  "output": [
    {}
  ],
  "metadata": {}
}

Request

file_url
string

The URL of the document from which content is to be extracted. Provide a direct link to the document you wish to process, allowing Mixpeek to access and analyze its contents. Either file_url or contents must be provided, but not both.

contents
string

The direct textual or base64-encoded content of the document for extraction. Use this field if you prefer to upload document content directly rather than via a URL. Either file_url or contents must be provided, but not both.

should_chunk
boolean

Indicates whether the text should be divided into manageable chunks. This can be useful for processing large documents or for applications requiring segmented analysis.

other_parameters
string

Includes additional processing parameters such as clean_text, max_characters_per_chunk, extract_tags, summarize, and format-specific settings (pdf_settings, html_settings, etc.), tailored to enhance the extraction based on the document’s modality and content.

Response

output
array
required

An array of elements extracted from the document. Each element may represent a piece of text, an image description, or other relevant content blocks, along with their metadata like page numbers, languages, and file type.

metadata
object
required

Metadata related to the extraction process, detailing the document type, processing time, and any other pertinent information about the extracted content.