Summarize#

Similar to the extract entity transform, the summarize transform generates summaries of documents or elements. The LLMElementTextSummarizer summarizes a subset of the elements from each Document. It takes an LLM implementation and a callable specifying the subset of elements to summarize. The following example shows how to use this transform to summarize elements that are longer than a certain length.

def filter_elements_on_length(
    document: Document,
    minimum_length: int = 10,
) -> list[Element]:
    def filter_func(element: Element):
        if element.text_representation is not None:
            return len(element.text_representation) > minimum_length

    return filter_elements(document, filter_func)

llm = OpenAI(OpenAIModels.GPT_3_5_TURBO.value)

docset = docset.summarize(LLMElementTextSummarizer(llm, filter_elements_on_length))