Prompts

class sycamore.llms.prompts.prompts.RenderedPrompt(messages: list[RenderedMessage], response_format: None | dict[str, Any] | type[BaseModel] = None)[source]

Bases: object

Represents a prompt to be sent to the LLM per the LLM messages interface

Parameters:
  • messages -- the list of messages to be sent to the LLM

  • response_format -- optional output schema, speicified as pydict/json or a pydantic model. Can only be used with modern OpenAI models.

class sycamore.llms.prompts.prompts.RenderedMessage(role: str, content: str, images: list[Image] | None = None)[source]

Bases: object

Represents a message per the LLM messages interface - i.e. a role and a content string

Parameters:
  • role -- the role of this message. e.g. for OpenAI should be one of "user", "system", "assistant"

  • content -- the content of this message

  • images -- optional list of images to include in this message.

sycamore.llms.prompts.prompts.ResponseFormat = typing.Union[NoneType, dict[str, typing.Any], type[pydantic.main.BaseModel]]

Type:    _UnionGenericAlias

class sycamore.llms.prompts.prompts.SycamorePrompt[source]

Bases: object

Base class/API for all Sycamore LLM Prompt objects. Sycamore Prompts convert sycamore objects (Document, Element) into RenderedPrompts

fork(**kwargs: Any) SycamorePrompt[source]

Create a new prompt with some fields changed.

Parameters:
  • ignore_none -- bool. do not set any kwargs with value None. This is not in the method signature because mypy sucks. https://github.com/python/mypy/issues/17642

  • **kwargs -- any keyword arguments will get set as fields in the resulting prompt

Returns:

A new SycamorePrompt with updated fields.

Example

p = StaticPrompt(system="hello", user="world")
p.render_document(Document())
# [
#     {"role": "system", "content": "hello"},
#     {"role": "user", "content": "world"}
# ]
p2 = p.set(user="bob")
p2.render_document(Document())
# [
#     {"role": "system", "content": "hello"},
#     {"role": "user", "content": "bob"}
# ]
render_document(doc: Document) RenderedPrompt[source]

Render this prompt, given this document as context. Used in llm_map

Parameters:

doc -- The document to use to populate the prompt

Returns:

A fully rendered prompt that can be sent to an LLM for inference

render_element(elt: Element, doc: Document) RenderedPrompt[source]

Render this prompt, given this element and its parent document as context. Used in llm_map_elements

Parameters:

elt -- The element to use to populate the prompt

Returns:

A fully rendered prompt that can be sent to an LLM for inference

render_multiple_documents(docs: list[Document]) RenderedPrompt[source]

Render this prompt, given a list of documents as context. Used in llm_reduce

Parameters:

docs -- The list of documents to use to populate the prompt

Returns:

A fully rendered prompt that can be sent to an LLM for inference

class sycamore.llms.prompts.prompts.JinjaPrompt(*, system: str | None = None, user: None | str | list[str] = None, response_format: None | dict[str, Any] | type[BaseModel] = None, **kwargs)[source]

Bases: SycamorePrompt

A prompt that uses the Jinja templating system to render documents, with a system and user prompt.

Parameters:
  • system -- The system prompt template, using Jinja syntax.

  • user -- The user prompt template or prompt templates, using Jinja syntax.

  • response_format -- Optional constraint on the format of the model output

  • kwargs -- Additional key-value pairs that will be made available to the rendering engine.

Example

prompt = JinjaPrompt(
    system="You are a helpful entity extractor that extracts a json object or list to"
            " populate a data processing system",
    user='''Below, you will be given a series of segments of an NTSB report and a question.
Your job is to provide the answer to the question based on the value provided.
Your response should ONLY contain the answer to the question. If you are not able
to extract the new field given the information, respond with "None". The type
of your response should be a JSON list of strings.
Field value:
{% for elt in doc.elements[:10] %}
ELEMENT {{ elt.element_index }}: {{ elt.field_to_value(field) }}
{% endfor %}
Answer the question "{{ question }}":''',
    question="What aircraft parts were damaged in this report?",
    field="text_representation",
)
ds.llm_map(prompt, output_field="damaged_parts", llm=OpenAI(OpenAIModels.GPT_4O))
render_document(doc: Document) RenderedPrompt[source]

Render this document using Jinja's template rendering system. The template gets references to:

  • doc: the document

  • **self.kwargs: other keyword arguments held by this prompt are

    available by name.

Parameters:

doc -- The document to render

Returns:

A rendered prompt containing information from the document.

class sycamore.llms.prompts.prompts.JinjaElementPrompt(*, system: str | None = None, user: None | str | list[str] = None, include_image: bool = False, response_format: None | dict[str, Any] | type[BaseModel] = None, **kwargs)[source]

Bases: SycamorePrompt

A prompt that uses the Jinja templating system to render elements, with a system and user prompt.

Parameters:
  • system -- The system prompt template, using Jinja syntax.

  • user -- The user prompt template or prompt templates, using Jinja syntax.

  • include_image -- Whether to include the image of the element in the rendered prompt. Default is False

  • response_format -- Optional response format constraint for the LLM.

  • kwargs -- Additional key-value pairs that will be made available to the rendering engine.

Example

prompt = JinjaElementPrompt(
    system="You are a helpful entity extractor that extracts a json object or list to"
            " populate a data processing system",
    user='''Below, you will be given a segment of an NTSB report and a question.
Your job is to provide the answer to the question based on the value provided.
Your response should ONLY contain the answer to the question. If you are not able
to extract the new field given the information, respond with "None". The type
of your response should be a JSON list of strings.
Field value:
ELEMENT {{ elt.element_index }}: {{ elt.field_to_value(field) }}

Answer the question "{{ question }}":''',
    question="What aircraft parts were damaged in this report?",
    field="text_representation",
)
ds.llm_map(prompt, output_field="damaged_parts", llm=OpenAI(OpenAIModels.GPT_4O))
render_element(elt: Element, doc: Document) RenderedPrompt[source]

Render this document using Jinja's template rendering system. The template gets references to:

  • elt: the element

  • doc: the document containing the element

  • **self.kwargs: other keyword arguments held by this prompt are

    available by name.

Parameters:
  • elt -- The element to render

  • doc -- The document containing the element

Returns:

A rendered prompt containing information from the element.

class sycamore.llms.prompts.prompts.StaticPrompt(*, system: str | None = None, user: None | str | list[str] = None, **kwargs)[source]

Bases: SycamorePrompt

A prompt that always renders the same regardless of the Document or Elements passed in as context.

Parameters:
  • system -- the system prompt string. Use {} to reference names to be interpolated. Interpolated names only come from kwargs.

  • user -- the user prompt string. Use {} to reference names to be interpolated. Interpolated names only come from kwargs.

  • **kwargs -- keyword arguments to interpolate.

Example

prompt = StaticPrompt(system="static", user = "prompt - {number}", number=7)
prompt.render_document(Document())
# [
#   { "role": "system", "content": "static" },
#   { "role": "user", "content": "prompt - 7" },
# ]
render_document(doc: Document) RenderedPrompt[source]

Render this prompt, given this document as context. Used in llm_map

Parameters:

doc -- The document to use to populate the prompt

Returns:

A fully rendered prompt that can be sent to an LLM for inference

render_element(elt: Element, doc: Document) RenderedPrompt[source]

Render this prompt, given this element and its parent document as context. Used in llm_map_elements

Parameters:

elt -- The element to use to populate the prompt

Returns:

A fully rendered prompt that can be sent to an LLM for inference

render_multiple_documents(docs: list[Document]) RenderedPrompt[source]

Render this prompt, given a list of documents as context. Used in llm_reduce

Parameters:

docs -- The list of documents to use to populate the prompt

Returns:

A fully rendered prompt that can be sent to an LLM for inference

class sycamore.llms.prompts.prompts.ElementPrompt(*, system: str | None = None, user: None | str | list[str] = None, include_element_image: bool = False, capture_parent_context: Callable[[Document, Element], dict[str, Any]] | None = None, **kwargs)[source]

Bases: SycamorePrompt

A prompt for rendering an element with utilities for capturing information from the element's parent document, with a system and user prompt.

Parameters:
  • system -- The system prompt string. Use {} to reference names to be interpolated. Defaults to None

  • user -- The user prompt string. Use {} to reference names to be interpolated. Defaults to None

  • include_element_image -- Whether to include an image of the element in the rendered user message. Only works if the parent document is a PDF. Defaults to False (no image)

  • capture_parent_context -- Function to gather context from the element's parent document. Should return {"key": value} dictionary, which will be made available as interpolation keys. Defaults to returning {}

  • **kwargs -- other keyword arguments are stored and can be used as interpolation keys

Example

prompt = ElementPrompt(
    system = "You know everything there is to know about {custom_kwarg}, {name}",
    user = "Summarize the information on page {elt_property_page}. \nTEXT: {elt_text}",
    capture_parent_context = lambda doc, elt: {"custom_kwarg": doc.properties["path"]},
    name = "Frank Sinatra",
)
prompt.render_element(doc.elements[0], doc)
# [
#   {"role": "system", "content": "You know everything there is to know
#          about /path/to/doc.pdf, Frank Sinatra"},
#   {"role": "user", "content": "Summarize the information on page 1. \nTEXT: <element text>"}
# ]
render_element(elt: Element, doc: Document) RenderedPrompt[source]

Render this prompt for this element; also take the parent document if there is context in that to account for as well. Rendering is done using pythons str.format() method. The keys passed into format are as follows:

  • self.kwargs: the additional kwargs specified when creating this prompt.

  • self.capture_parent_content(doc, elt): key-value pairs returned by the

    context-capturing function.

  • elt_text: elt.text_representation (the text representation of the element)

  • elt_property_<property name>: each property name in elt.properties is

    prefixed with 'elt_property_'. So if elt.properties = {'k1': 0, 'k2': 3}, you get elt_property_k1 = 0, elt_property_k2 = 3.

Parameters:
  • elt -- The element used as context for rendering this prompt.

  • doc -- The element's parent document; used to add additional context.

Returns:

A two-message rendered prompt containing self.system.format() and self.user.format() using the format keys as specified above. If self.include_element_image is true, crop out the image from the page of the PDF it's on and attach it to the last message (user message if there is one, o/w system message).