Split Elements#
- class sycamore.transforms.split_elements.SplitElements(child: Node, tokenizer: Tokenizer, maximum: int, **kwargs)[source]#
Bases:
SingleThreadUser
,NonGPUUser
,Map
The SplitElements transform recursively divides elements such that no Element exceeds a maximum number of tokens.
- Parameters:
child -- The source node or component that provides the elements to be split
tokenizer -- The tokenizer to use in counting tokens, should match embedder
maximum -- Maximum tokens allowed in any Element
Example
node = ... # Define a source node or component that provides hierarchical documents. xform = SplitElements(child=node, tokenizer=tokenizer, 512) dataset = xform.execute()