Map#
- class sycamore.transforms.map.FlatMap(child: Node | None, *, f: Callable[[Document], list[Document]], **kwargs)[source]#
Bases:
BaseMapTransform
FlatMap is a transformation class for applying a callable function to each document in a dataset and flattening the resulting list of documents.
See
Map
for additional arguments that can be specified and the option for the type of f.Example
def custom_flat_mapping_function(document: Document) -> list[Document]: # Custom logic to transform the document and return a list of documents return [transformed_document_1, transformed_document_2] flat_map_transformer = FlatMap(input_dataset_node, f=custom_flat_mapping_function) flattened_dataset = flat_map_transformer.execute()
- class sycamore.transforms.map.Map(child: Node | None, *, f: Any, **kwargs)[source]#
Bases:
BaseMapTransform
Map is a transformation class for applying a callable function to each document in a dataset.
If f is a class type, constructor_args and constructor_kwargs can be used to provide arguments when initializing the class
Use args, kwargs to pass additional args to the function call. The following 2 are equivalent:
# option 1: docset.map(lambda f_wrapped: f(*my_args, **my_kwargs))
# option 2: docset.map(f, args=my_args, kwargs=my_kwargs)
If f is a class type, when using ray execution, the class will be mapped to an agent that will be instantiated a fixed number of times. By default that will be once, but you can change that with:
ctx.map(ExampleClass, parallelism=num_instances)
Example
def custom_mapping_function(document: Document) -> Document: # Custom logic to transform the document return transformed_document map_transformer = Map(input_dataset_node, f=custom_mapping_function) transformed_dataset = map_transformer.execute()
- class sycamore.transforms.map.MapBatch(child: Node | None, *, f: Callable[[list[Document]], list[Document]], f_args: Iterable[Any] | None = None, f_kwargs: dict[str, Any] | None = None, f_constructor_args: Iterable[Any] | None = None, f_constructor_kwargs: dict[str, Any] | None = None, **kwargs)[source]#
Bases:
BaseMapTransform
The MapBatch transform is similar to Map, except that it processes a list of documents and returns a list of documents. MapBatches is ideal for transformations that get performance benefits from batching.
See
Map
for additional arguments that can be specified and the option for the type of f.Example
def custom_map_batch_function(documents: list[Document]) -> list[Document]: # Custom logic to transform the documents return transformed_documents map_transformer = Map(input_dataset_node, f=custom_map_batch_function) transformed_dataset = map_transformer.execute()