Summarize Images#

class sycamore.transforms.summarize_images.OpenAIImageSummarizer(openai_model: OpenAI | None = None, client_wrapper: OpenAIClientWrapper | None = None, prompt: str | None = None, include_context: bool = True)[source]#

Bases: object

Image Summarizer that uses OpenAI GPT-4 Turbo to summarize the specified image.

The image is passed to OpenAI along with a text prompt and optionally the text elements immediately preceding and following the image.

Parameters:
  • openai_model -- The OpenAI instance to use. If not set, one will be created.

  • client_wrapper -- The OpenAIClientWrapper to use when creating an OpenAI instance. Not used if openai_model is set.

  • prompt -- The prompt to use to pass to the model, as a string.

  • include_context -- Whether to include the immediately preceding and following text elements as context.

Example

The following code demonstrates how to partition a pdf DocSet and summarize the images it contains. This version uses the default prompt and disables passing additional text context.

context = sycamore.init()
doc = context.read.binary(paths=paths, binary_format="pdf")                              .partition(partitioner=SycamorePartitioner(extract_images=True))                              .transform(SummarizeImages(summarizer=OpenAIImageSummarizer(include_context=False)))                              .show()
class sycamore.transforms.summarize_images.SummarizeImages(child: ~sycamore.plan_nodes.Node, summarizer=<sycamore.transforms.summarize_images.OpenAIImageSummarizer object>, **resource_args)[source]#

Bases: Map

SummarizeImages is a transform for summarizing context into text using an LLM.

Parameters:
  • child -- The source node for the transform.

  • summarizer -- The class to use for summarization. The default uses OpenAI gpt-4-turbo.

  • resource_args -- Additional resource-related arguments that can be passed to the underlying runtime.

Example

context = sycamore.init()
doc = context.read.binary(paths=paths, binary_format="pdf")                              .partition(partitioner=SycamorePartitioner(extract_images=True))                              .transform(SummarizeImages)                              .show()