trac.util.html – HTML transformations

Building HTML programmatically

With the introduction of the Jinja2 template engine in Trac 1.3.x, the (X)HTML content is produced either using Jinja2 snippet templates (see jinja2template) or using the builder API defined in this module. This builder API closely matches the Genshi genshi.builder API on the surface.

The builder API

The tag builder has some knowledge about generating HTML content, like knowing which elements are “void” elements, how attributes should be written when given a boolean value, etc.

trac.util.html.tag

An ElementFactory.

class trac.util.html.ElementFactory

Bases: trac.util.html.XMLElementFactory

An element factory can be used to build Fragments and Elements for arbitrary tag names.

class trac.util.html.Element(tag, *args, **kwargs)

Bases: trac.util.html.XMLElement

An element represents an HTML element, with a tag name, attributes and content.

Some elements and attributes are rendered specially, according to the HTML5 specification (or going there...)

class trac.util.html.Fragment(*args)

Bases: object

A fragment represents a sequence of strings or elements.

Note that the Element relies on the following lower-level API for generating the HTML attributes.

trac.util.html.html_attribute(key, val)

Returns the actual value for the attribute key, for the given value.

This follows the rules described in the HTML5 spec (Double-quoted attribute value syntax).

In addition, it treats the 'class' and the 'style' attributes in a special way, as it processes them through classes and styles.

Return type:a Markup object containing the escaped attribute value, but it can also be None to indicate that the attribute should be omitted from the output
trac.util.html.classes(*args, **kwargs)

Helper function for dynamically assembling a list of CSS class names in templates.

Any positional arguments are added to the list of class names. All positional arguments must be strings:

>>> classes('foo', 'bar')
u'foo bar'

In addition, the names of any supplied keyword arguments are added if they have a truth value:

>>> classes('foo', bar=True)
u'foo bar'
>>> classes('foo', bar=False)
u'foo'

If none of the arguments are added to the list, this function returns '':

>>> classes(bar=False)
u''
trac.util.html.styles(*args, **kwargs)

Helper function for dynamically assembling a list of CSS style name and values in templates.

Any positional arguments are added to the list of styles. All positional arguments must be strings or dicts:

>>> styles('foo: bar', 'fu: baz', {'bottom-right': '1em'})
u'foo: bar; fu: baz; bottom-right: 1em'

In addition, the names of any supplied keyword arguments are added if they have a string value:

>>> styles(foo='bar', fu='baz')
u'foo: bar; fu: baz'
>>> styles(foo='bar', bar=False)
u'foo: bar'

If none of the arguments are added to the list, this function returns '':

>>> styles(bar=False)
u''

This HTML-specific behavior can be a hindrance to writing generic XML. In that case, better use the xml builder.

trac.util.html.xml

An XMLElementFactory.

class trac.util.html.XMLElementFactory

Bases: object

An XML element factory can be used to build Fragments and XMLElements for arbitrary tag names.

class trac.util.html.XMLElement(tag, *args, **kwargs)

Bases: trac.util.html.Fragment

An element represents an XML element, with a tag name, attributes and content.

Building HTML from strings

It is also possible to mark an arbitrary string as containing HTML content, so that it will not be HTML-escaped by the template engine.

For this, use the Markup class, taken from the markupsafe package (itself a dependency of the Jinja2 package).

The Markup class should be imported from the present module:

from trac.util.html import Markup

HTML clean-up and sanitization

class trac.util.html.TracHTMLSanitizer(safe_schemes=frozenset(['mailto', 'ftp', 'http', 'file', 'https', None]), safe_css=frozenset(['counter-reset', 'counter-increment', 'min-height', 'quotes', 'border-top', 'font', 'list-style-image', 'outline-width', 'border-right', 'border-radius', 'border-bottom', 'border-spacing', 'background', 'list-style-type', 'text-align', 'page-break-inside', 'orphans', 'page-break-before', 'border-bottom-right-radius', 'line-height', 'padding-left', 'font-size', 'right', 'word-spacing', 'padding-top', 'outline-style', 'bottom', 'content', 'border-right-style', 'padding-right', 'border-left-style', 'background-color', 'border-bottom-color', 'outline-color', 'unicode-bidi', 'max-width', 'font-family', 'caption-side', 'text-transform', 'border-right-width', 'border-top-style', 'color', 'border-collapse', 'border-bottom-width', 'float', 'height', 'max-height', 'margin-right', 'border-top-width', 'border-bottom-left-radius', 'top', 'border-width', 'min-width', 'width', 'font-variant', 'border-top-color', 'background-position', 'empty-cells', 'direction', 'border-left', 'visibility', 'padding', 'border-style', 'background-attachment', 'overflow', 'border-bottom-style', 'cursor', 'margin', 'display', 'border-left-width', 'letter-spacing', 'border-top-left-radius', 'vertical-align', 'clip', 'border-color', 'list-style', 'padding-bottom', 'margin-left', 'widows', 'border', 'font-style', 'border-left-color', 'background-repeat', 'table-layout', 'margin-bottom', 'border-top-right-radius', 'font-weight', 'opacity', 'border-right-color', 'page-break-after', 'white-space', 'text-indent', 'background-image', 'outline', 'clear', 'z-index', 'text-decoration', 'margin-top', 'position', 'left', 'list-style-position']), safe_tags=frozenset(['em', 'pre', 'code', 'p', 'h2', 'h3', 'h1', 'h6', 'h4', 'h5', 'table', 'font', 'u', 'select', 'kbd', 'strong', 'span', 'sub', 'img', 'area', 'menu', 'tt', 'tr', 'tbody', 'label', 'hr', 'dfn', 'tfoot', 'th', 'sup', 'strike', 'input', 'td', 'samp', 'cite', 'thead', 'map', 'dl', 'blockquote', 'fieldset', 'option', 'form', 'acronym', 'big', 'dd', 'var', 'ol', 'abbr', 'br', 'address', 'optgroup', 'li', 'dt', 'ins', 'legend', 'a', 'b', 'center', 'textarea', 'colgroup', 'i', 'button', 'q', 'caption', 's', 'del', 'small', 'div', 'col', 'dir', 'ul']), safe_attrs=frozenset(['rev', 'prompt', 'color', 'colspan', 'accesskey', 'usemap', 'cols', 'accept', 'datetime', 'char', 'accept-charset', 'shape', 'href', 'hreflang', 'selected', 'frame', 'type', 'alt', 'nowrap', 'border', 'id', 'axis', 'compact', 'style', 'rows', 'checked', 'for', 'start', 'hspace', 'charset', 'ismap', 'label', 'target', 'bgcolor', 'readonly', 'rel', 'valign', 'scope', 'size', 'cellspacing', 'cite', 'media', 'multiple', 'src', 'rules', 'nohref', 'action', 'rowspan', 'abbr', 'span', 'method', 'height', 'class', 'enctype', 'lang', 'disabled', 'name', 'charoff', 'clear', 'summary', 'value', 'longdesc', 'headers', 'vspace', 'noshade', 'coords', 'width', 'maxlength', 'cellpadding', 'title', 'align', 'dir', 'tabindex']), uri_attrs=frozenset(['src', 'lowsrc', 'href', 'dynsrc', 'background', 'action']), safe_origins=frozenset(['data:']))

Bases: object

Sanitize HTML constructions which are potentially vector of phishing or XSS attacks, in user-supplied HTML.

The usual way to use the sanitizer is to call the sanitize method on some potentially unsafe HTML content.

Note that for backward compatibility, the TracHTMLSanitizer still behaves as a Genshi filter.

See also genshi.HTMLSanitizer from which the TracHTMLSanitizer has evolved.

Note: safe_schemes and safe_css have to remain the first parameters, for backward-compatibility purpose.

is_safe_css(prop, value)

Determine whether the given css property declaration is to be considered safe for inclusion in the output.

is_safe_elem(tag, attrs)

Determine whether the given element should be considered safe for inclusion in the output.

Parameters:
  • tag (QName or basestring) – the tag name of the element
  • attrs (Attrs or list) – the element attributes
Returns:

whether the element should be considered safe

Return type:

bool

is_safe_uri(uri)

Determine whether the given URI is to be considered safe for inclusion in the output.

The default implementation checks whether the scheme of the URI is in the set of allowed URIs (safe_schemes).

>>> sanitizer = TracHTMLSanitizer()
>>> sanitizer.is_safe_uri('http://example.org/')
True
>>> sanitizer.is_safe_uri('javascript:alert(document.cookie)')
False
Parameters:uri – the URI to check
Returns:True if the URI can be considered safe, False otherwise
Return type:bool
sanitize(html)

Transforms the incoming HTML by removing anything’s that deemed unsafe.

Parameters:html – the input HTML
Type:basestring
Returns:the sanitized content
Return type:Markup
sanitize_attrs(tag, attrs)

Remove potentially dangerous attributes and sanitize the style attribute .

Parameters:tag – the tag name of the element
Returns:a dict containing only safe or sanitized attributes
Return type:dict
sanitize_css(text)

Remove potentially dangerous property declarations from CSS code.

In particular, properties using the CSS url() function with a scheme that is not considered safe are removed:

>>> sanitizer = TracHTMLSanitizer()
>>> sanitizer.sanitize_css(u'''
...   background: url(javascript:alert("foo"));
...   color: #000;
... ''')
[u'color: #000']

Also, the proprietary Internet Explorer function expression() is always stripped:

>>> sanitizer.sanitize_css(u'''
...   background: #fff;
...   color: #000;
...   width: e/**/xpression(alert("F"));
... ''')
[u'background: #fff', u'color: #000', u'width: e xpression(alert("F"))']
Parameters:text – the CSS text; this is expected to be unicode and to not contain any character or numeric references
Returns:a list of declarations that are considered safe
Return type:list
class trac.util.html.Deuglifier

Bases: object

Help base class used for cleaning up HTML riddled with <FONT COLOR=...> tags and replace them with appropriate <span class="...">.

The subclass must define a rules() static method returning a list of regular expression fragments, each defining a capture group in which the name will be reused for the span’s class. Two special group names, font and endfont are used to emit <span> and </span>, respectively.

trac.util.html.escape(str, quotes=True)

Create a Markup instance from a string and escape special characters it may contain (<, >, & and ”).

Parameters:
  • text – the string to escape; if not a string, it is assumed that the input can be converted to a string
  • quotes – if True, double quote characters are escaped in addition to the other special characters
>>> escape('"1 < 2"')
Markup(u'&#34;1 &lt; 2&#34;')
>>> escape(['"1 < 2"'])
Markup(u"['&#34;1 &lt; 2&#34;']")

If the quotes parameter is set to False, the ” character is left as is. Escaping quotes is generally only required for strings that are to be used in attribute values.

>>> escape('"1 < 2"', quotes=False)
Markup(u'"1 &lt; 2"')
>>> escape(['"1 < 2"'], quotes=False)
Markup(u'[\'"1 &lt; 2"\']')

However, escape behaves slightly differently with Markup and Fragment behave instances, as they are passed through unmodified.

>>> escape(Markup('"1 < 2 &#39;"'))
Markup(u'"1 < 2 &#39;"')
>>> escape(Markup('"1 < 2 &#39;"'), quotes=False)
Markup(u'"1 < 2 &#39;"')
>>> escape(tag.b('"1 < 2"'))
Markup(u'<b>"1 &lt; 2"</b>')
>>> escape(tag.b('"1 < 2"'), quotes=False)
Markup(u'<b>"1 &lt; 2"</b>')
Returns:the escaped Markup string
Return type:Markup
trac.util.html.unescape(text)

Reverse-escapes &, <, >, and ” and returns a unicode object.

>>> unescape(Markup('1 &lt; 2'))
u'1 < 2'

If the provided text object is not a Markup instance, it is returned unchanged.

>>> unescape('1 &lt; 2')
'1 &lt; 2'
Parameters:text – the text to unescape
Returns:the unescsaped string
Return type:unicode
trac.util.html.stripentities(text, keepxmlentities=False)

Return a copy of the given text with any character or numeric entities replaced by the equivalent UTF-8 characters.

>>> stripentities('1 &lt; 2')
Markup(u'1 < 2')
>>> stripentities('more &hellip;')
Markup(u'more \u2026')
>>> stripentities('&#8230;')
Markup(u'\u2026')
>>> stripentities('&#x2026;')
Markup(u'\u2026')
>>> stripentities(Markup(u'\u2026'))
Markup(u'\u2026')

If the keepxmlentities parameter is provided and is a truth value, the core XML entities (&amp;, &apos;, &gt;, &lt; and &quot;) are left intact.

>>> stripentities('1 &lt; 2 &hellip;', keepxmlentities=True)
Markup(u'1 &lt; 2 \u2026')
Returns:a Markup instance with entities removed
Return type:Markup
trac.util.html.striptags(text)

Return a copy of the text with any XML/HTML tags removed.

>>> striptags('<span>Foo</span> bar')
Markup(u'Foo bar')
>>> striptags('<span class="bar">Foo</span>')
Markup(u'Foo')
>>> striptags('Foo<br />')
Markup(u'Foo')

HTML/XML comments are stripped, too:

>>> striptags('<!-- <blub>hehe</blah> -->test')
Markup(u'test')
Parameters:text – the string to remove tags from
Returns:a Markup instance with all tags removed
Return type:Markup
trac.util.html.plaintext(text, keeplinebreaks=True)

Extract the text elements from (X)HTML content

Parameters:
  • textunicode or Fragment
  • keeplinebreaks – optionally keep linebreaks
class trac.util.html.FormTokenInjector(form_token, out)

Bases: trac.util.html.HTMLTransform

Identify and protect forms from CSRF attacks.

This filter works by adding a input type=hidden field to POST forms.

class trac.util.html.HTMLTransform(out)

Bases: HTMLParser.HTMLParser

Convenience base class for writing HTMLParsers.

The default implementation of the HTMLParser handle_* methods do nothing, while in our case we try to rewrite the incoming document unmodified.

class trac.util.html.HTMLSanitization(sanitizer, out)

Bases: trac.util.html.HTMLTransform

Sanitize parsed HTML using TracHTMLSanitizer.

Misc. HTML processing

trac.util.html.find_element(frag, attr=None, cls=None, tag=None)

Return the first element in the fragment having the given attribute, class or tag, using a preorder depth-first search.

trac.util.html.to_fragment(input)

Convert input to a Fragment object.

trac.util.html.valid_html_bytes(bytes)
trac.util.html.to_fragment(input)

Convert input to a Fragment object.

Kept for backward compatibility purposes:

trac.util.html.expand_markup(stream, ctxt=None)

A Genshi stream filter for expanding genshi.Markup events.

Note: Expansion may not be possible if the fragment is badly formed, or partial.