Document Chunk
Introduction
Lettria Parser API split the document's content into several chunks by analyzing text indentation, fonts, text size, text style, page composition (columns, list, etc.), tables and specific characters. The all document is returned as a list of chunks, each chunk corresponding to a paragraph.
Format
A document chunk has the following format:
Key | Type | Description |
---|---|---|
id | int | The chunk identifier. |
type | string | The type of chunk : "text" or "table". |
content | depending on the type key. | A string for text chunks, and a list or a dictionary for table chunks (depending on the detected type of table). |
metadata | dictionary | A dictionnary containing additionnal informations on the chunk, depending on the motor that extracted the chunk (OCR, STT, etc.) |