Regex Replace#

class sycamore.transforms.regex_replace.RegexReplace(child: Node, spec: list[tuple[str, str]], **kwargs)[source]#

Bases: SingleThreadUser, NonGPUUser, Transform

The RegexReplace transform modifies the text_representation in each Element in every Document.

Parameters:
  • child – The source node or component that provides the documents

  • spec – A list of tuples of regular expressions and substitutions, to be executed in order via re.sub()

  • kwargs – Additional resource-related arguments that can be passed to the operation

Example

rr = RegexReplace(child=node, spec=[(r"\s+", " "), (r"^ ", "")])
dataset = rr.execute()