Map¶

class sycamore.transforms.map.FlatMap(child: Node | None, *, f: Callable[[Document], list[Document]], **kwargs)[source]¶

Bases: BaseMapTransform

FlatMap is a transformation class for applying a callable function to each document in a dataset and flattening the resulting list of documents.

See Map for additional arguments that can be specified and the option for the type of f.

Example

def custom_flat_mapping_function(document: Document) -> list[Document]:
    # Custom logic to transform the document and return a list of documents
    return [transformed_document_1, transformed_document_2]

flat_map_transformer = FlatMap(input_dataset_node, f=custom_flat_mapping_function)
flattened_dataset = flat_map_transformer.execute()

class sycamore.transforms.map.Map(child: Node | None, *, f: Any, **kwargs)[source]¶

Bases: BaseMapTransform

Map is a transformation class for applying a callable function to each document in a dataset.

If f is a class type, constructor_args and constructor_kwargs can be used to provide arguments when initializing the class

Use args, kwargs to pass additional args to the function call. The following 2 are equivalent:

# option 1: docset.map(lambda f_wrapped: f(*my_args, **my_kwargs))

# option 2: docset.map(f, args=my_args, kwargs=my_kwargs)

If f is a class type, when using ray execution, the class will be mapped to an agent that will be instantiated a fixed number of times. By default that will be once, but you can change that with:

ctx.map(ExampleClass, parallelism=num_instances)

Example

def custom_mapping_function(document: Document) -> Document:
    # Custom logic to transform the document
    return transformed_document

map_transformer = Map(input_dataset_node, f=custom_mapping_function)
transformed_dataset = map_transformer.execute()

class sycamore.transforms.map.MapBatch(child: Node | None, *, f: Callable[[list[Document]], list[Document]], f_args: Iterable[Any] | None = None, f_kwargs: dict[str, Any] | None = None, f_constructor_args: Iterable[Any] | None = None, f_constructor_kwargs: dict[str, Any] | None = None, **kwargs)[source]¶

Bases: BaseMapTransform

The MapBatch transform is similar to Map, except that it processes a list of documents and returns a list of documents. MapBatches is ideal for transformations that get performance benefits from batching.

See Map for additional arguments that can be specified and the option for the type of f.

Example

def custom_map_batch_function(documents: list[Document]) -> list[Document]:
    # Custom logic to transform the documents
    return transformed_documents

map_transformer = Map(input_dataset_node, f=custom_map_batch_function)
transformed_dataset = map_transformer.execute()