The node module
This module defines a Node
class, to build simple tree structures
based on Python lists.
There is also a Range
class, that defines a fragment of a node tree
from one Node upto and including another (or from/to the beginning or end), and
allows iterating over the tree fragment.
- class Common[source]
Bases:
object
Mixin class implementing the methods shared by
Node
andRange
.- static filter_descendants(predicate, generator)[source]
Call predicate on all nodes returned by generator.
Yield the node if the predicate returns True, and send False to the generator, causing it to not yield descendant nodes for that node.
This can be used with
Node.descendants()
,Node.forward()
,Node.backward()
,Range.descendants()
,Range.forward()
,Range.backward()
andRange.nodes()
.
- filter(predicate)[source]
Call predicate on all descendants.
If the predicate returns True for a node, that node is yielded and its descendants are skipped.
- instances_of(cls)[source]
Iterate over the descendants of the current node and yield all nodes that are an instance of
cls
.When a node is yielded, its descendants are not searched anymore.
The
cls
parameter may also be a tuple of more classes, just like the standard Pythonisinstance()
.
- class Node(*children)[source]
-
Node implements a simple tree type, based on Python
list
.You can inherit of Node and add your own attributes and methods.
A node can have child nodes and a
parent
. The parent is referred to with a weak reference, so a node tree does not contain circular references. (This also means that you need to keep a reference to a tree’s root node, otherwise it will be garbage collected.)Adding nodes to a node sets the parent of the nodes; but removing nodes doesn’t unset the parent of the removed nodes; and adding nodes does not automatically remove them from their previous parent; you should take care of that yourself. Unset the parent of a node by setting the
parent
attribute toNone
.Iterating over a node yields the child nodes, just like the underlying Python list. Unlike Python’s list, a node always evaluates to True, even if there are no children.
To find the index of a node in its parent, use:
>>> index = node.parent.index(node)
This is safe, because it uses the nodes’ identity, but potentially slow because it uses a linear search, so don’t use it for every node in an iterative operation.
Besides the usual methods, Node defines six special query operators:
/
,//
,<<
,>
,<
and^
. All these expect a Node (sub)class (or instance) as argument, and iterate in different ways over selected Nodes:The
/
operator iterates over the children that are instances of the specified class:for n in node / MyClass: # do_something with n, which is a child of node and # an instance of MyClass
The
//
operator iterates over all descendants in document order:for n in node // MyClass: # n is a descendant of node and an instance of MyClass
The
<<
operator iterates over the ancestors:for n in node << Header: n.blabla # n is the youngest ancestor that is a Header instance
The
>
operator iteratesforward()
from the node, starting with the right sibling:for n in node > MyClass: # do something on the MyClass instance that comes after the node # (and is not a child)
The
<
operator iteratesbackward()
from the node, starting with the left sibling:for n in node < MyClass: # do something on the MyClass instance that precedes the node # (and is not a child)
The
^
operator iterates over the children that are not an instance of the specified class(es):node[:] = (node ^ MyClass) # deletes all children that are an instance of MyClass
Instead of a subclass, a class instance or a tuple of more than one class may also be given. If a class instance is given,
body_equals()
must return true for the compared nodes. (Child nodes are not compared when using a class instance to compare with.)- property parent
The parent Node or None; uses a weak reference.
- trail()[source]
Return the list of indices of the node and its ancestors in their parents.
The node’s own index is at the end. Using trails you can determine arbitrary ranges in a DOM document, and you can compare trails to see which node comes first in the document order.
Given the following hypothetical tree structure (all classes are subclasses of Node):
A( B( C(), D()), E( F(), G( H())))
the trail of the F instance would be
[1, 0]
, because its parent E has index 1 in A, while F has index 0 in E.If you have two nodes, e.g. the D and H instance, in resp.
d
andh
, you can quickly see which one is earlier in the tree:d.trail() < h.trail()
, which would evaluate to True in this case.
- copy(with_children=True)[source]
Return a copy of this Node.
If
with_children
is True (the default), child nodes are also copied.
- replace_with(node)[source]
Replace this node in its parent with another node.
Fails if called on the root node.
- equals(other)[source]
Return True if we and other are equivalent.
This is the case when we and the other have the same class, the same amount of children,
body_equals()
returns True, and finally for all the children this method returns True.Before this method is called on all the children, this method calls
body_equals()
; implement that method if you want to add more tests, e.g. for certain instance attributes.
- body_equals(other)[source]
Implement this to add more
equals()
tests, before all the children are compared.The default implementation returns True.
- common_ancestor(other)[source]
Return the common ancestor, if any.
When one node appears to be an ancestor of the other, that node is returned.
- common_ancestor_with_trail(other)[source]
Return a three-tuple(
ancestor
,trail_self
,trail_other
) or None.The
ancestor
node is the common ancestor such as returned bycommon_ancestor()
. Thetrail_self
is a list of indices from the common ancestor upto self, andtrail_other
is a list of indices from the same ancestor upto the other Node.If one of the trails is empty, the corresponding node is an ancestor of the other.
If there is no common ancestor, (None, None, None) is returned. But normally, all nodes share the root node, so that will normally be the upmost common ancestor.
- descendants(reverse=False)[source]
Iterate over all the descendants of this node.
If
reverse
is set to True, yields all descendants in backward direction.When you
send()
False to this generator, child nodes of the just yielded node will not be yielded.
- forward(upto=None)[source]
Iterate forward from this Node, starting with the right sibling.
If you specify an ancestor node
upto
, will not go outside that node.When you
send()
False to this generator, child nodes of the just yielded node will not be yielded.
- backward(upto=None)[source]
Iterate backward from this Node, starting with the left sibling.
If you specify an ancestor node
upto
, will not go outside that node.When you
send()
False to this generator, child nodes of the just yielded node will not be yielded.
- dump(file=None, style=None, depth=0)[source]
Display a graphical representation of the node and its contents.
The file object defaults to stdout, and the style to “round”. You can choose any style that’s in the
DUMP_STYLES
dictionary.
- property ls
List the children, for debugging purposes.
You can also narrow down the list of children to certain types, using the
/
or^
operator:>>> node.ls / Class
Only the children of node that are an (or are not) instance of Class are shown.
- property pwd
Show the ancestry, for debugging purposes.
- class Range(ancestor, start_trail=None, end_trail=None)[source]
Bases:
Common
Defines a range within a node tree.
A range is defined by a starting node (the ancestor), and optional start and end trails. A
trail
is a list of indices defining the path to the first and/or last node of the range. If both trails are None, the range encompasses the full node.A Range can be constructed directly, specifying the ancestor node and the trails, or using the
from_nodes()
method, specifying the start and/or end node. This method is also used when you call theNode.range()
method. The range includes both the start node and the end node (if specified).A Range is empty when the end node is before the start node, or when the ancestor node itself is empty. An empty Range evaluates to False.
Having created a Range object, you can do a lot:
Use the
node in range
syntax to see whether a Node is inside a Range. A node is in a range when all its children are also in the range. (Only the end node can have children outside the range.)Use the
range & node
syntax to see whether a Node intersects with a Range. A node intersects with a Range when it has child nodes in the Range, but it can also have childnodes outside the range.Use
extract_tree()
to get a copy of the node tree from the ancestor, including only the range.Iterate over the Range to get the nodes at the current iteration level. Iterating starts at the bottom node of the range, and walks over the children of the node that are within the range. If you want to iterate over the children of a node that are in the range (and possibly further down), you can use
enter()
, and, after iterating the children,leave()
. Or usechildren()
, which fits more in a recursive approach.
When iterating, the Range object keeps track of the position in the tree range. The current node is accessible via the
node
attribute.Range support the same search operators as
Node
:/
,//
,<<
,>
,<
and^
. Note that they set the current node, so it you don’t exhaust them fully, restarted or other searches may start from a different node than you expect. Use with care! :-)Of course you can modify the tree directly, but if you change the number of children of the parent of the current node, use the item methods (like
range[index] = value
) of the iterator, so that it adjust the range boundaries and its iteraton to the changes.There is a subtle difference between how the start trail is handled and how the end trail is handled. The children of the start node are considered to be in the range, but the children of the end node not. If you want those to be in the range, be sure you specify the last child node you want to be in the range.
If you replace a node (using the item methods of Range) that’s on a range boundary, the boundary trail can be modified. For example, if you replace a node that lies on the start trail, the start trail is truncated to that node, so that all its children are within the range (because the old end of the start trail can be invalid for the new node contents). Also the end trail is truncated when a node on the end boundary is replaced, causing the children of the new node to fall outside of the range. Inserting nodes in another node that crosses the end boundary causes the end trail to be adjusted, so that the range boundary is still valid.
- start_trail
The start trail (path to first node), can be None. (Do not modify from the outside.)
- end_trail
The end trail (path to the last node), can be None. (Do not modify from the outside.)
- classmethod from_nodes(start_node=None, end_node=None, from_root=False)[source]
Create a Range object from a start node to an end node.
One of them (but not both) may be None. If the
start_node
is None, the range starts at the very beginning; if theend_node
is None, the range starts at the start node and walks to the very end.If
from_root
is True, the ancestor of the range will always be the root node; even if the trails to start and end node are partially the same. If False, the ancestor will be the youngest common ancestor.Raises a ValueError if both nodes are None, or when the nodes do not belong to the same tree.
- from_here()[source]
Return a new Range, starting at the current
node
.This is easier to use if you prefer a recursive approach.
- __contains__(node)[source]
Return True if the node is completely within this range.
Returns False if a node (or any of its descendants) is partially outside of the range. (The ancestor itself is completely in the range if there is no start_trail and no end_trail.)
- intersects(node)[source]
Return True if the node is partially or completely in our range.
A node is partially in the range if it has descendant nodes in the range and other descendants outside.
A shorthand for this method is
range & node
.Use
node in range
to know whether a node is completely in the range.
- is_full()[source]
True if there is no start and end boundary, i.e. all descendants of the ancestor are in the range.
- property node
The current node.
This is the last yielded Node when iterating. Setting the property modifies the tree: it replaces the node in its parent with the specified node, or raises a ValueError when you try to replace the ancestor (i.e. not yet iterated over the range, or the range is empty).
- property index
The index of the current (last yielded) node in its parent.
Returns -1 if the ancestor is the current node or the range is empty.
Setting the property selects the child node of the node at the current iteration level. Raises a ValueError when the index you specify is outside the range.
Note
The allowed range for the current iteration level is the range from
start()
upto and includingend()
. Note that, when you alreadyenter()
-ed a node but not yet iterated over it, the displayedindex
value still relates to the current node, while the values returned by thestart()
andend()
methods, and setting the property, already relate to the current (newly entered) iteration level.
- goto(node)[source]
Navigate to another node, returns True if that succeeded.
Returns False if the node is not in our tree, or completely outside our range. The specified node may be partially outside the tree, which is the case when it crosses a range boundary.
- enter()[source]
Start iterating the current child node.
Returns the depth after entering (which is always 1 or higher). If you store this value, and compare with the value returned by
leave()
, you are back where you started when both values are the same.
- leave()[source]
Stop iterating the current child node and pop back to the parent.
Returns the depth before leaving, i.e. the same depth as the corresponding call to
enter()
. Returns 0 if you try to leave the outermost node.
- extract_tree()[source]
Return a copy of the node tree, starting at the current
node
, including only the range.
- nodes()[source]
Yield all nodes from start node upto and including the end node.
If there is no start boundary, the result is the same as
descendants()
.When you
send()
False to this generator, child nodes of the just yielded node will not be yielded.
- forward(upto=None)[source]
Iterate forward from the current
node
, starting with the right sibling.If you specify an ancestor node
upto
, will not go outside that node.When you
send()
False to this generator, child nodes of the just yielded node will not be yielded.
- backward(upto=None)[source]
Iterate backward from the current
node
, starting with the left sibling.If you specify an ancestor node
upto
, will not go outside that node.When you
send()
False to this generator, child nodes of the just yielded node will not be yielded.
- ancestors()[source]
Iterate over the ancestors of the current
node
that are in the range.Ends with the
ancestor()
.
- children(reverse=False)[source]
Iterate over the children of the current
node
that are in the range.If
reverse
is set to True, yields the children in backward direction.
- descendants(reverse=False)[source]
Iterate over all descendants of the current
node
that are in the range.If
reverse
is set to True, yields all descendants in backward direction.When you
send()
False to this generator, child nodes of the just yielded node will not be yielded.
- property ls
List the nodes of the current iteration level that are within the range, for debugging purposes.
The first column shows the index of the nodes in their parent. If iteration is active, the current node is highlighted with an arrow.
Colons are displayed if start and/or end boundaries of the range cross any descendants of this node, i.e. the node is not fully
in_range()
.
- property pwd
Show the ancestry of the current node within the range, for debugging purposes.