Extract Table#

class sycamore.transforms.extract_table.MissingS3UploadPath[source]#

Bases: Exception

Raised when an S3 upload path is needed but one wasn’t provided

class sycamore.transforms.extract_table.TextractTableExtractor(profile_name: str | None = None, region_name: str | None = None, kms_key_id: str = '', s3_upload_root: str = '')[source]#

Bases: TableExtractor

TextractTableExtractor utilizes Amazon Textract to extract tables from documents.

This class inherits from TableExtractor and is designed for extracting tables from documents using Amazon Textract, a cloud-based document text and data extraction service from AWS.

  • profile_name – The AWS profile name to use for authentication. Default is None.

  • region_name – The AWS region name where the Textract service is available.

  • kms_key_id – The AWS Key Management Service (KMS) key ID for encryption.


table_extractor = TextractTableExtractor(profile_name="my-profile", region_name="us-east-1")

context = sycamore.init()
pdf_docset = context.read.binary(paths, binary_format="pdf")
    .partition(partitioner=UnstructuredPdfPartitioner(), table_extractor=table_extractor)