Map#

class sycamore.transforms.map.FlatMap(child: Node, *, f: Callable[[Document], list[Document]], **resource_args)[source]#

Bases: UnaryNode

FlatMap is a transformation class for applying a callable function to each document in a dataset and flattening the resulting list of documents.

Example

def custom_flat_mapping_function(document: Document) -> list[Document]:
    # Custom logic to transform the document and return a list of documents
    return [transformed_document_1, transformed_document_2]

flat_map_transformer = FlatMap(input_dataset_node, f=custom_flat_mapping_function)
flattened_dataset = flat_map_transformer.execute()
class sycamore.transforms.map.Map(child: Node, *, f: Callable[[Document], Document], **resource_args)[source]#

Bases: UnaryNode

Map is a transformation class for applying a callable function to each document in a dataset.

Example

def custom_mapping_function(document: Document) -> Document:
    # Custom logic to transform the document
    return transformed_document

map_transformer = Map(input_dataset_node, f=custom_mapping_function)
transformed_dataset = map_transformer.execute()
class sycamore.transforms.map.MapBatch(child: Node, *, f: Callable[[list[Document]], list[Document]], f_args: Iterable[Any] | None = None, f_kwargs: dict[str, Any] | None = None, f_constructor_args: Iterable[Any] | None = None, f_constructor_kwargs: dict[str, Any] | None = None, **resource_args)[source]#

Bases: UnaryNode

The MapBatch transform is similar to Map, except that it processes a list of documents and returns a list of documents. MapBatches is ideal for transformations that get performance benefits from batching.

Example

def custom_map_batch_function(documents: list[Document]) -> list[Document]:
    # Custom logic to transform the documents
    return transformed_documents

map_transformer = Map(input_dataset_node, f=custom_map_batch_function)
transformed_dataset = map_transformer.execute()