The node module

This module defines a Node class, to build simple tree structures based on Python lists.

There is also a Range class, that defines a fragment of a node tree from one Node upto and including another (or from/to the beginning or end), and allows iterating over the tree fragment.

class Common[source]

Bases: object

Mixin class implementing the methods shared by Node and Range.

static filter_descendants(predicate, generator)[source]

Call predicate on all nodes returned by generator.

Yield the node if the predicate returns True, and send False to the generator, causing it to not yield descendant nodes for that node.

This can be used with Node.descendants(), Node.forward(), Node.backward(), Range.descendants(), Range.forward(), Range.backward() and Range.nodes().

filter(predicate)[source]

Call predicate on all descendants.

If the predicate returns True for a node, that node is yielded and its descendants are skipped.

instances_of(cls)[source]

Iterate over the descendants of the current node and yield all nodes that are an instance of cls.

When a node is yielded, its descendants are not searched anymore.

The cls parameter may also be a tuple of more classes, just like the standard Python isinstance().

class Node(*children)[source]

Bases: Common, list

Node implements a simple tree type, based on Python list.

You can inherit of Node and add your own attributes and methods.

A node can have child nodes and a parent. The parent is referred to with a weak reference, so a node tree does not contain circular references. (This also means that you need to keep a reference to a tree’s root node, otherwise it will be garbage collected.)

Adding nodes to a node sets the parent of the nodes; but removing nodes doesn’t unset the parent of the removed nodes; and adding nodes does not automatically remove them from their previous parent; you should take care of that yourself. Unset the parent of a node by setting the parent attribute to None.

Iterating over a node yields the child nodes, just like the underlying Python list. Unlike Python’s list, a node always evaluates to True, even if there are no children.

To find the index of a node in its parent, use:

>>> index = node.parent.index(node)

This is safe, because it uses the nodes’ identity, but potentially slow because it uses a linear search, so don’t use it for every node in an iterative operation.

Besides the usual methods, Node defines six special query operators: /, //, <<, >, < and ^. All these expect a Node (sub)class (or instance) as argument, and iterate in different ways over selected Nodes:

  • The / operator iterates over the children that are instances of the specified class:

    for n in node / MyClass:
        # do_something with n, which is a child of node and
        # an instance of MyClass
    
  • The // operator iterates over all descendants in document order:

    for n in node // MyClass:
        # n is a descendant of node and an instance of MyClass
    
  • The << operator iterates over the ancestors:

    for n in node << Header:
        n.blabla    # n is the youngest ancestor that is a Header instance
    
  • The > operator iterates forward() from the node, starting with the right sibling:

    for n in node > MyClass:
        # do something on the MyClass instance that comes after the node
        # (and is not a child)
    
  • The < operator iterates backward() from the node, starting with the left sibling:

    for n in node < MyClass:
        # do something on the MyClass instance that precedes the node
        # (and is not a child)
    
  • The ^ operator iterates over the children that are not an instance of the specified class(es):

    node[:] = (node ^ MyClass)
    # deletes all children that are an instance of MyClass
    

Instead of a subclass, a class instance or a tuple of more than one class may also be given. If a class instance is given, body_equals() must return true for the compared nodes. (Child nodes are not compared when using a class instance to compare with.)

__bool__()[source]

Always True.

property parent

The parent Node or None; uses a weak reference.

root()[source]

Return the root node.

trail()[source]

Return the list of indices of the node and its ancestors in their parents.

The node’s own index is at the end. Using trails you can determine arbitrary ranges in a DOM document, and you can compare trails to see which node comes first in the document order.

Given the following hypothetical tree structure (all classes are subclasses of Node):

A(
    B(
        C(),
        D()),
    E(
        F(),
        G(
            H())))

the trail of the F instance would be [1, 0], because its parent E has index 1 in A, while F has index 0 in E.

If you have two nodes, e.g. the D and H instance, in resp. d and h, you can quickly see which one is earlier in the tree: d.trail() < h.trail(), which would evaluate to True in this case.

copy(with_children=True)[source]

Return a copy of this Node.

If with_children is True (the default), child nodes are also copied.

append(node)[source]

Append node to this node; the parent is set to this node.

extend(nodes)[source]

Append nodes to this node; the parent is set to this node.

insert(index, node)[source]

Insert node in this node; the parent is set to this node.

replace_with(node)[source]

Replace this node in its parent with another node.

Fails if called on the root node.

take(start=0, end=None)[source]

Like list.pop(), but takes out and returns a slice(start, end).

equals(other)[source]

Return True if we and other are equivalent.

This is the case when we and the other have the same class, the same amount of children, body_equals() returns True, and finally for all the children this method returns True.

Before this method is called on all the children, this method calls body_equals(); implement that method if you want to add more tests, e.g. for certain instance attributes.

body_equals(other)[source]

Implement this to add more equals() tests, before all the children are compared.

The default implementation returns True.

is_last()[source]

Return True if this is the last node. Fails if no parent.

is_first()[source]

Return True if this is the first node. Fails if no parent.

is_root()[source]

Return True if this node has no parent.

common_ancestor(other)[source]

Return the common ancestor, if any.

When one node appears to be an ancestor of the other, that node is returned.

common_ancestor_with_trail(other)[source]

Return a three-tuple(ancestor, trail_self, trail_other) or None.

The ancestor node is the common ancestor such as returned by common_ancestor(). The trail_self is a list of indices from the common ancestor upto self, and trail_other is a list of indices from the same ancestor upto the other Node.

If one of the trails is empty, the corresponding node is an ancestor of the other.

If there is no common ancestor, (None, None, None) is returned. But normally, all nodes share the root node, so that will normally be the upmost common ancestor.

ancestors()[source]

Yield the parent, then the parent’s parent, etcetera.

ancestors_with_index()[source]

Yield the ancestors and the index of each node in the parent.

descendants(reverse=False)[source]

Iterate over all the descendants of this node.

If reverse is set to True, yields all descendants in backward direction.

When you send() False to this generator, child nodes of the just yielded node will not be yielded.

right_siblings()[source]

Iterate over the right siblings of this node.

left_siblings()[source]

Iterate backwards over the left siblings of this node.

right_sibling()[source]

The right sibling, if any.

left_sibling()[source]

The left sibling, if any.

forward(upto=None)[source]

Iterate forward from this Node, starting with the right sibling.

If you specify an ancestor node upto, will not go outside that node.

When you send() False to this generator, child nodes of the just yielded node will not be yielded.

backward(upto=None)[source]

Iterate backward from this Node, starting with the left sibling.

If you specify an ancestor node upto, will not go outside that node.

When you send() False to this generator, child nodes of the just yielded node will not be yielded.

depth()[source]

Return the number of ancestors.

height()[source]

Return the height of the tree (the longest distance to a descendant).

range(other, from_root=False)[source]

Return a Range from self to the other node.

dump(file=None, style=None, depth=0)[source]

Display a graphical representation of the node and its contents.

The file object defaults to stdout, and the style to “round”. You can choose any style that’s in the DUMP_STYLES dictionary.

property ls

List the children, for debugging purposes.

You can also narrow down the list of children to certain types, using the / or ^ operator:

>>> node.ls / Class

Only the children of node that are an (or are not) instance of Class are shown.

property pwd

Show the ancestry, for debugging purposes.

class Range(ancestor, start_trail=None, end_trail=None)[source]

Bases: Common

Defines a range within a node tree.

A range is defined by a starting node (the ancestor), and optional start and end trails. A trail is a list of indices defining the path to the first and/or last node of the range. If both trails are None, the range encompasses the full node.

A Range can be constructed directly, specifying the ancestor node and the trails, or using the from_nodes() method, specifying the start and/or end node. This method is also used when you call the Node.range() method. The range includes both the start node and the end node (if specified).

A Range is empty when the end node is before the start node, or when the ancestor node itself is empty. An empty Range evaluates to False.

Having created a Range object, you can do a lot:

  • Use the node in range syntax to see whether a Node is inside a Range. A node is in a range when all its children are also in the range. (Only the end node can have children outside the range.)

  • Use the range & node syntax to see whether a Node intersects with a Range. A node intersects with a Range when it has child nodes in the Range, but it can also have childnodes outside the range.

  • Use extract_tree() to get a copy of the node tree from the ancestor, including only the range.

  • Iterate over the Range to get the nodes at the current iteration level. Iterating starts at the bottom node of the range, and walks over the children of the node that are within the range. If you want to iterate over the children of a node that are in the range (and possibly further down), you can use enter(), and, after iterating the children, leave(). Or use children(), which fits more in a recursive approach.

When iterating, the Range object keeps track of the position in the tree range. The current node is accessible via the node attribute.

Range support the same search operators as Node: /, //, <<, >, < and ^. Note that they set the current node, so it you don’t exhaust them fully, restarted or other searches may start from a different node than you expect. Use with care! :-)

Of course you can modify the tree directly, but if you change the number of children of the parent of the current node, use the item methods (like range[index] = value) of the iterator, so that it adjust the range boundaries and its iteraton to the changes.

There is a subtle difference between how the start trail is handled and how the end trail is handled. The children of the start node are considered to be in the range, but the children of the end node not. If you want those to be in the range, be sure you specify the last child node you want to be in the range.

If you replace a node (using the item methods of Range) that’s on a range boundary, the boundary trail can be modified. For example, if you replace a node that lies on the start trail, the start trail is truncated to that node, so that all its children are within the range (because the old end of the start trail can be invalid for the new node contents). Also the end trail is truncated when a node on the end boundary is replaced, causing the children of the new node to fall outside of the range. Inserting nodes in another node that crosses the end boundary causes the end trail to be adjusted, so that the range boundary is still valid.

ancestor()[source]

The bottom node of this range.

start_trail

The start trail (path to first node), can be None. (Do not modify from the outside.)

end_trail

The end trail (path to the last node), can be None. (Do not modify from the outside.)

classmethod from_nodes(start_node=None, end_node=None, from_root=False)[source]

Create a Range object from a start node to an end node.

One of them (but not both) may be None. If the start_node is None, the range starts at the very beginning; if the end_node is None, the range starts at the start node and walks to the very end.

If from_root is True, the ancestor of the range will always be the root node; even if the trails to start and end node are partially the same. If False, the ancestor will be the youngest common ancestor.

Raises a ValueError if both nodes are None, or when the nodes do not belong to the same tree.

start_node()[source]

The start node, if specified.

end_node()[source]

The end node, if specified.

copy()[source]

Return a clean copy of this Range, initialized at the same ancestor.

from_here()[source]

Return a new Range, starting at the current node.

This is easier to use if you prefer a recursive approach.

__bool__()[source]

False if the range is empty.

__contains__(node)[source]

Return True if the node is completely within this range.

Returns False if a node (or any of its descendants) is partially outside of the range. (The ancestor itself is completely in the range if there is no start_trail and no end_trail.)

intersects(node)[source]

Return True if the node is partially or completely in our range.

A node is partially in the range if it has descendant nodes in the range and other descendants outside.

A shorthand for this method is range & node.

Use node in range to know whether a node is completely in the range.

is_full()[source]

True if there is no start and end boundary, i.e. all descendants of the ancestor are in the range.

reversed()[source]

Iterate the current iteration level in reverse order.

property node

The current node.

This is the last yielded Node when iterating. Setting the property modifies the tree: it replaces the node in its parent with the specified node, or raises a ValueError when you try to replace the ancestor (i.e. not yet iterated over the range, or the range is empty).

property index

The index of the current (last yielded) node in its parent.

Returns -1 if the ancestor is the current node or the range is empty.

Setting the property selects the child node of the node at the current iteration level. Raises a ValueError when the index you specify is outside the range.

Note

The allowed range for the current iteration level is the range from start() upto and including end(). Note that, when you already enter()-ed a node but not yet iterated over it, the displayed index value still relates to the current node, while the values returned by the start() and end() methods, and setting the property, already relate to the current (newly entered) iteration level.

start()[source]

The start index of the range in the current interation level’s node.

end()[source]

The end index of the range in the current interation level’s node.

trail()[source]

Trail to the current (last yielded) node.

depth()[source]

The current iteration depth (0 or higher).

in_range()[source]

True if the current node and its descendants completely fall within the range.

goto(node)[source]

Navigate to another node, returns True if that succeeded.

Returns False if the node is not in our tree, or completely outside our range. The specified node may be partially outside the tree, which is the case when it crosses a range boundary.

reset()[source]

Reset iteration to the ancestor node, as if we are just instantiated.

enter()[source]

Start iterating the current child node.

Returns the depth after entering (which is always 1 or higher). If you store this value, and compare with the value returned by leave(), you are back where you started when both values are the same.

leave()[source]

Stop iterating the current child node and pop back to the parent.

Returns the depth before leaving, i.e. the same depth as the corresponding call to enter(). Returns 0 if you try to leave the outermost node.

extract_tree()[source]

Return a copy of the node tree, starting at the current node, including only the range.

nodes()[source]

Yield all nodes from start node upto and including the end node.

If there is no start boundary, the result is the same as descendants().

When you send() False to this generator, child nodes of the just yielded node will not be yielded.

forward(upto=None)[source]

Iterate forward from the current node, starting with the right sibling.

If you specify an ancestor node upto, will not go outside that node.

When you send() False to this generator, child nodes of the just yielded node will not be yielded.

backward(upto=None)[source]

Iterate backward from the current node, starting with the left sibling.

If you specify an ancestor node upto, will not go outside that node.

When you send() False to this generator, child nodes of the just yielded node will not be yielded.

ancestors()[source]

Iterate over the ancestors of the current node that are in the range.

Ends with the ancestor().

children(reverse=False)[source]

Iterate over the children of the current node that are in the range.

If reverse is set to True, yields the children in backward direction.

descendants(reverse=False)[source]

Iterate over all descendants of the current node that are in the range.

If reverse is set to True, yields all descendants in backward direction.

When you send() False to this generator, child nodes of the just yielded node will not be yielded.

property ls

List the nodes of the current iteration level that are within the range, for debugging purposes.

The first column shows the index of the nodes in their parent. If iteration is active, the current node is highlighted with an arrow.

Colons are displayed if start and/or end boundaries of the range cross any descendants of this node, i.e. the node is not fully in_range().

property pwd

Show the ancestry of the current node within the range, for debugging purposes.