Bbox Merge#

class sycamore.transforms.bbox_merge.MarkBreakByColumn(child: Node, **resource_args)[source]#

Bases: SingleThreadUser, NonGPUUser, Transform

MarkBreakByColumn is a transform that marks ‘_break’ where two-column layout changes to full-width layout. Ranges of two- column Elements are also re-sorted left to right. Elements must already be sorted top-to-bottom.

Parameters:

child – The source Node or component that provides the Elements

Example

source_node = ...
marker = MarkBreakByColumn(child=source_node)
dataset = marker.execute()
class sycamore.transforms.bbox_merge.MarkDropHeaderFooter(child: Node, top: float = 0.05, bottom: float | None = None, **resource_args)[source]#

Bases: SingleThreadUser, NonGPUUser, Transform

MarkDropHeaderFooter is a transform to add the ‘_drop’ data attribute to each Element at the top or bottom X fraction of the page. Requires the ‘bbox’ attribute.

Parameters:
  • child – The source Node or component that provides the Elements

  • top – The fraction of the page to exclude from the top (def 0.05)

  • bottom – The fraction of the page to exclude from the bottom (0.05)

Example

source_node = ...
marker = MarkDropHeaderFooter(child=source_node, top=0.05)
dataset = marker.execute()
class sycamore.transforms.bbox_merge.SortByPageBbox(child: Node, **resource_args)[source]#

Bases: SingleThreadUser, NonGPUUser, Transform

SortByPageBbox is a transform to add reorder the Elements in ‘natural order’, top to bottom using page_number and bbox.

Parameters:

child – The source Node or component that provides the Elements

Example

source_node = ...
sorter = SortByPageBbox(child=source_node)
dataset = sorter.execute()