Summarize Images

class sycamore.transforms.summarize_images.LLMImageSummarizer(llm: LLM, prompt: str | None = None, include_context: bool = True)[source]

Bases: object

Image Summarizer that uses an LLM to summarize the specified image.

The image is passed to the LLM along with a text prompt and optionally the text elements immediately preceding and following the image.

Parameters:
  • llm -- The LLM to use.

  • prompt -- The prompt to use to pass to the model, as a string.

  • include_context -- Whether to include the immediately preceding and following text elements as context.

Example

The following code demonstrates how to partition a pdf DocSet and summarize the images it contains. This version uses a Claude model via Bedrock.

summarize_image(image: Image, context: str | None) str[source]

Summarize the image using the LLM. Helper method to use this class without creating an instance.

Parameters:
  • image -- The image to summarize.

  • context -- The context to use for summarization.

Returns:

The summarized image as a string.

class sycamore.transforms.summarize_images.GeminiImageSummarizer(gemini_model: Gemini | None = None, prompt: str | None = None, include_context: bool = True)[source]

Bases: LLMImageSummarizer

Implementation of the LLMImageSummarizer for Gemini models.

Parameters:
  • gemini_model -- The Gemini instance to use. If not set, one will be created.

  • prompt -- The prompt to use to pass to the model, as a string.

  • include_context -- Whether to include the immediately preceding and following text elements as context.

class sycamore.transforms.summarize_images.OpenAIImageSummarizer(openai_model: OpenAI | None = None, client_wrapper: OpenAIClientWrapper | None = None, prompt: str | None = None, include_context: bool = True)[source]

Bases: LLMImageSummarizer

Implementation of the LLMImageSummarizer for OpenAI models.

Parameters:
  • openai_model -- The OpenAI instance to use. If not set, one will be created.

  • client_wrapper -- The OpenAIClientWrapper to use when creating an OpenAI instance. Not used if openai_model is set.

  • prompt -- The prompt to use to pass to the model, as a string.

  • include_context -- Whether to include the immediately preceding and following text elements as context.

class sycamore.transforms.summarize_images.SummarizeImages(child: ~sycamore.plan_nodes.Node, summarizer=<sycamore.transforms.summarize_images.OpenAIImageSummarizer object>, **resource_args)[source]

Bases: CompositeTransform

SummarizeImages is a transform for summarizing context into text using an LLM.

Parameters:
  • child -- The source node for the transform.

  • summarizer -- The class to use for summarization. The default uses OpenAI gpt-4-turbo.

  • resource_args -- Additional resource-related arguments that can be passed to the underlying runtime.

Example

context = sycamore.init()
doc = context.read.binary(paths=paths, binary_format="pdf")                              .partition(partitioner=SycamorePartitioner(extract_images=True))                              .transform(SummarizeImages)                              .show()