Basics#
- class sycamore.transforms.basics.Filter(child: Node, *, f: Callable[[Document], bool], **resource_args)[source]#
Bases:
MapBatch
Filter is a transformation that applies a user-defined filter function to a dataset.
- Parameters:
child -- The source node or component that provides the dataset to be filtered.
f -- A callable function that takes a Document object and returns a boolean indicating whether the document should be included in the filtered dataset.
resource_args -- Additional resource-related arguments that can be passed to the filtering operation.
Example
source_node = ... # Define a source node or component that provides a dataset. def custom_filter(doc: Document) -> bool: # Define your custom filtering logic here. return doc.some_property == some_value filter_transform = Filter(child=source_node, f=custom_filter) filtered_dataset = filter_transform.execute()
- class sycamore.transforms.basics.Limit(child: Node, limit: int)[source]#
Bases:
NonCPUUser
,NonGPUUser
,Transform
Limit is a transformation that restricts the size of a dataset to a specified number of records.
- Parameters:
child -- The source node or component that provides the dataset to be limited.
limit -- The maximum number of records to include in the resulting dataset.
Example
source_node = ... # Define a source node or component that provides a dataset. limit_transform = Limit(child=source_node, limit=100) limited_dataset = limit_transform.execute()