The dom.htm module

Elements needed for Html or Xml text.

This module is called htm to keep its name short and to avoid confusion with the language modules parce.lang.html and quickly.lang.html.

Note

Although parce is perfectly capable of parsing CSS style and JavaScript script tags and attributes, this module does not implement node types for that.

This means that although you can construct full Html documents with JavaScript, CSS, (and LilyPond and Scheme of course), it is not possible to write such documents back to text from the DOM, because parce neatly parses the CSS and JavaScript, but the our transformer in quickly.lang.html does not transform those to htm elements, so they get lost when writing back a DOM document that was read from text, back to text.

But you can construct html style and script elements manually of course and write them out to text.

Every Html tag maps to an Element node, which normally has one or more chilren: either one SingleTag node, or an OpenTag node at the beginning and a CloseTag at the end, and in between other Elements, Text or EntityRef nodes.

The tag nodes inherit BlockElement, have the delimiters (<, </, > or />) in their head and tail, and have a TagName child and possibly one or more Attribute children.

An attribute node normally has three children: an AttrName, an EqualSign and a DqString or SqString value.

Because entity references can appear both in generic text and in attribute strings, those strings nodes are block elements with the quotes in their head and tail, and childnodes containing the Text or EntityRef elements.

How LilyPond fits in

LilyPond nodes (lily.Document) can appear as contents in an Element that has a lilypond open tag (like: <lilypond staffsize=2> { c d e f g } </lilypond>); or within the SingleTag node of an Element that has no further contents (this happens when the short form like <lilypond : { c d e f g } /> is used).

For the <lilypondfile> and <musicxmlfile> tags, the attributes are handled supporting the specialy LilyPond attributes (with or without value). The filename is in the Text contents.

Some examples

Short LilyPond notation, the LilyPond node is a child of the lilypond SingleTag element:

>>> from quickly.lang.html import Html
>>> from parce.transform import transform_text
>>> d = transform_text(Html.root, '<html><h1>Title</h1><p>Some music...</p><lilypond staffsize=2: { c d e f g } /></html>')
>>> d.dump()
<htm.Document (1 child)>
 ╰╴<htm.Element (5 children)>
    ├╴<htm.OpenTag (1 child) [0:6]>
    │  ╰╴<htm.TagName 'html' [1:5]>
    ├╴<htm.Element (3 children)>
    │  ├╴<htm.OpenTag (1 child) [6:10]>
    │  │  ╰╴<htm.TagName 'h1' [7:9]>
    │  ├╴<htm.Text 'Title' [10:15]>
    │  ╰╴<htm.CloseTag (1 child) [15:20]>
    │     ╰╴<htm.TagName 'h1' [17:19]>
    ├╴<htm.Element (3 children)>
    │  ├╴<htm.OpenTag (1 child) [20:23]>
    │  │  ╰╴<htm.TagName 'p' [21:22]>
    │  ├╴<htm.Text 'Some music...' [23:36]>
    │  ╰╴<htm.CloseTag (1 child) [36:40]>
    │     ╰╴<htm.TagName 'p' [38:39]>
    ├╴<htm.Element (1 child)>
    │  ╰╴<htm.SingleTag (4 children) [40:79]>
    │     ├╴<htm.TagName 'lilypond' [41:49]>
    │     ├╴<htm.Attribute (3 children)>
    │     │  ├╴<htm.AttrName 'staffsize' [50:59]>
    │     │  ├╴<htm.EqualSign [59:60]>
    │     │  ╰╴<htm.Number 2 [60:61]>
    │     ├╴<htm.Colon [61:62]>
    │     ╰╴<lily.Document (1 child)>
    │        ╰╴<lily.MusicList (5 children) [63:76]>
    │           ├╴<lily.Note 'c' [65:66]>
    │           ├╴<lily.Note 'd' [67:68]>
    │           ├╴<lily.Note 'e' [69:70]>
    │           ├╴<lily.Note 'f' [71:72]>
    │           ╰╴<lily.Note 'g' [73:74]>
    ╰╴<htm.CloseTag (1 child) [79:86]>
       ╰╴<htm.TagName 'html' [81:85]>

LilyPond tag notation, the LilyPond node is a child of the lilypond element:

>>> d = transform_text(Html.root, '<html><h1>Title</h1><p>Some music...</p><lilypond staffsize=2> { c d e f g } </lilypond></html>')
>>> d.dump()
<htm.Document (1 child)>
 ╰╴<htm.Element (5 children)>
    ├╴<htm.OpenTag (1 child) [0:6]>
    │  ╰╴<htm.TagName 'html' [1:5]>
    ├╴<htm.Element (3 children)>
    │  ├╴<htm.OpenTag (1 child) [6:10]>
    │  │  ╰╴<htm.TagName 'h1' [7:9]>
    │  ├╴<htm.Text 'Title' [10:15]>
    │  ╰╴<htm.CloseTag (1 child) [15:20]>
    │     ╰╴<htm.TagName 'h1' [17:19]>
    ├╴<htm.Element (3 children)>
    │  ├╴<htm.OpenTag (1 child) [20:23]>
    │  │  ╰╴<htm.TagName 'p' [21:22]>
    │  ├╴<htm.Text 'Some music...' [23:36]>
    │  ╰╴<htm.CloseTag (1 child) [36:40]>
    │     ╰╴<htm.TagName 'p' [38:39]>
    ├╴<htm.Element (3 children)>
    │  ├╴<htm.OpenTag (2 children) [40:62]>
    │  │  ├╴<htm.TagName 'lilypond' [41:49]>
    │  │  ╰╴<htm.Attribute (3 children)>
    │  │     ├╴<htm.AttrName 'staffsize' [50:59]>
    │  │     ├╴<htm.EqualSign [59:60]>
    │  │     ╰╴<htm.Number 2 [60:61]>
    │  ├╴<lily.Document (1 child)>
    │  │  ╰╴<lily.MusicList (5 children) [63:76]>
    │  │     ├╴<lily.Note 'c' [65:66]>
    │  │     ├╴<lily.Note 'd' [67:68]>
    │  │     ├╴<lily.Note 'e' [69:70]>
    │  │     ├╴<lily.Note 'f' [71:72]>
    │  │     ╰╴<lily.Note 'g' [73:74]>
    │  ╰╴<htm.CloseTag (1 child) [77:88]>
    │     ╰╴<htm.TagName 'lilypond' [79:87]>
    ╰╴<htm.CloseTag (1 child) [88:95]>
       ╰╴<htm.TagName 'html' [90:94]>
class Element(*children, **attrs)[source]

Bases: Element

An Xml or Html element.

Has an OpenTag child, then contents (Text or Element), and then a CloseTag child. Or alternatively, has only a SingleTag child.

to_plaintext(entity_resolver=None)[source]

Return all text contents of all children as a concatenated string.

The entity resolver, if given, is a callable used to resolve entities and may return a string or an Element node tree which is then traversed. But by default, html.unescape() is used.

class Document(*children, **attrs)[source]

Bases: Document, Element

An Html document, normally has one Element child, but could contain more elements or text.

space_between = ''

whitespace between children

class Text(head, *children, **attrs)[source]

Bases: TextElement

Html/Xml text contents (Text or Whitespace).

write_head()[source]

Return the textual output that represents our head value.

The default implementation just returns the head attribute, assuming it is text.

class Comment(head, *children, **attrs)[source]

Bases: MultilineComment

A Html/Xml comment node.

class EntityRef(head, *children, **attrs)[source]

Bases: TextElement

An entity reference like &euml;, &#123; or &#xed;.

The head value is the part between the & and the ;.

classmethod read_head(origin)[source]

Return the value as computed from the specified origin Tokens.

The default implementation concatenates the text from all tokens.

write_head()[source]

Return the textual output that represents our head value.

The default implementation just returns the head attribute, assuming it is text.

class CData(head, *children, **attrs)[source]

Bases: TextElement

A CDATA section.

The head value is the contents.

classmethod read_head(origin)[source]

Return the value as computed from the specified origin Tokens.

The default implementation concatenates the text from all tokens.

write_head()[source]

Return the textual output that represents our head value.

The default implementation just returns the head attribute, assuming it is text.

class String(*children, **attrs)[source]

Bases: BlockElement

Base class for strings.

indent_children()[source]

Reimplemented to indent children of a BlockElement type by default.

class SqString(*children, **attrs)[source]

Bases: String

A single-quoted string.

Inside are Text or EntityRef elements.

head = "'"
tail = "'"
class DqString(*children, **attrs)[source]

Bases: String

A double-quoted string.

Inside are Text or EntityRef elements.

head = '"'
tail = '"'
class Number(head, *children, **attrs)[source]

Bases: TextElement

An integer or floating-point value.

Only used in the attributes of LilyPond tags.

classmethod read_head(origin)[source]

Return the value as computed from the specified origin Tokens.

The default implementation concatenates the text from all tokens.

write_head()[source]

Return the textual output that represents our head value.

The default implementation just returns the head attribute, assuming it is text.

class Unit(head, *children, **attrs)[source]

Bases: TextElement

A short unit string like "mm", used in lilypond book options.

class ProcessingInstruction(*children, **attrs)[source]

Bases: BlockElement

A processing instruction.

Inside are Text, String or EntityRef elements.

head = '<?'
tail = '?>'
class Tag(*children, **attrs)[source]

Bases: BlockElement

Base class for tags.

space_between = ' '

whitespace between children

indent_children()[source]

Reimplemented to indent children of a BlockElement type by default.

class OpenTag(*children, **attrs)[source]

Bases: Tag

An Html open tag: < >.

Has a TagName child, then zero or more Attribute children.

head = '<'
tail = '>'
class CloseTag(*children, **attrs)[source]

Bases: Tag

An Html close tag: </ >.

head = '</'
tail = '>'
class SingleTag(*children, **attrs)[source]

Bases: Tag

An Html single (self closing) tag: < />.

head = '<'
tail = '/>'
class TagName(head, *children, **attrs)[source]

Bases: TextElement

The name of a tag, a child of a Tag element.

class Attribute(*children, **attrs)[source]

Bases: Element

An Xml or Html attribute within an OpenTag or SingleTag.

Has normally three children: AttrName, EqualSign, [DS]qString. In some cases it has only an AttrName child.

class AttrName(head, *children, **attrs)[source]

Bases: TextElement

The name of the attribute.

class EqualSign(*children, **attrs)[source]

Bases: HeadElement

The = in an attribute definition.

head = '='
class Colon(*children, **attrs)[source]

Bases: HeadElement

The : in a short-form LilyPond html tag.

head = ':'