Regex Replace#
- class sycamore.transforms.regex_replace.RegexReplace(child: Node, spec: list[tuple[str, str]], **kwargs)[source]#
Bases:
SingleThreadUser
,NonGPUUser
,Map
The RegexReplace transform modifies the text_representation in each Element in every Document.
- Parameters:
child -- The source node or component that provides the documents
spec -- A list of tuples of regular expressions and substitutions, to be executed in order via re.sub()
kwargs -- Additional resource-related arguments that can be passed to the operation
Example
rr = RegexReplace(child=node, spec=[(r"\s+", " "), (r"^ ", "")]) dataset = rr.execute()