Extract
Extract and index content from documents provided through a URL or direct content input on the Mixpeek platform. This endpoint is designed to process documents in various formats, making their content searchable and accessible. It’s perfect for extracting valuable insights and information with precision.
Request
The URL of the document from which content is to be extracted. Provide a direct link to the document you wish to process, allowing Mixpeek to access and analyze its contents. Either file_url
or contents
must be provided, but not both.
The direct textual or base64-encoded content of the document for extraction. Use this field if you prefer to upload document content directly rather than via a URL. Either file_url
or contents
must be provided, but not both.
Indicates whether the text should be divided into manageable chunks. This can be useful for processing large documents or for applications requiring segmented analysis.
Includes additional processing parameters such as clean_text
, max_characters_per_chunk
, extract_tags
, summarize
, and format-specific settings (pdf_settings
, html_settings
, etc.), tailored to enhance the extraction based on the document’s modality and content.
Response
An array of elements extracted from the document. Each element may represent a piece of text, an image description, or other relevant content blocks, along with their metadata like page numbers, languages, and file type.
Metadata related to the extraction process, detailing the document type, processing time, and any other pertinent information about the extracted content.