trac.util.html
– HTML transformations¶
Building HTML programmatically¶
With the introduction of the Jinja2 template engine in Trac 1.3.x,
the (X)HTML content is produced either using Jinja2 snippet templates
(see jinja2template
) or using the builder API defined in this
module. This builder API closely matches the Genshi genshi.builder
API on the surface.
The builder API¶
The tag
builder has some knowledge about generating HTML content,
like knowing which elements are “void” elements, how attributes should
be written when given a boolean value, etc.
-
trac.util.html.
tag
¶ An
ElementFactory
.
-
class
trac.util.html.
ElementFactory
¶ Bases:
trac.util.html.XMLElementFactory
An element factory can be used to build Fragments and Elements for arbitrary tag names.
-
class
trac.util.html.
Element
(tag, *args, **kwargs)¶ Bases:
trac.util.html.XMLElement
An element represents an HTML element, with a tag name, attributes and content.
Some elements and attributes are rendered specially, according to the HTML5 specification (or going there...)
-
class
trac.util.html.
Fragment
(*args)¶ Bases:
object
A fragment represents a sequence of strings or elements.
Note that the Element
relies on the following lower-level API for
generating the HTML attributes.
-
trac.util.html.
html_attribute
(key, val)¶ Returns the actual value for the attribute
key
, for the givenvalue
.This follows the rules described in the HTML5 spec (Double-quoted attribute value syntax).
In addition, it treats the
'class'
and the'style'
attributes in a special way, as it processes them throughclasses
andstyles
.Return type: a Markup
object containing the escaped attribute value, but it can also beNone
to indicate that the attribute should be omitted from the output
-
trac.util.html.
classes
(*args, **kwargs)¶ Helper function for dynamically assembling a list of CSS class names in templates.
Any positional arguments are added to the list of class names. All positional arguments must be strings:
>>> classes('foo', 'bar') u'foo bar'
In addition, the names of any supplied keyword arguments are added if they have a truth value:
>>> classes('foo', bar=True) u'foo bar' >>> classes('foo', bar=False) u'foo'
If none of the arguments are added to the list, this function returns
''
:>>> classes(bar=False) u''
-
trac.util.html.
styles
(*args, **kwargs)¶ Helper function for dynamically assembling a list of CSS style name and values in templates.
Any positional arguments are added to the list of styles. All positional arguments must be strings or dicts:
>>> styles('foo: bar', 'fu: baz', {'bottom-right': '1em'}) u'foo: bar; fu: baz; bottom-right: 1em'
In addition, the names of any supplied keyword arguments are added if they have a string value:
>>> styles(foo='bar', fu='baz') u'foo: bar; fu: baz' >>> styles(foo='bar', bar=False) u'foo: bar'
If none of the arguments are added to the list, this function returns
''
:>>> styles(bar=False) u''
This HTML-specific behavior can be a hindrance to writing generic XML.
In that case, better use the xml
builder.
-
trac.util.html.
xml
¶
-
class
trac.util.html.
XMLElementFactory
¶ Bases:
object
An XML element factory can be used to build Fragments and XMLElements for arbitrary tag names.
-
class
trac.util.html.
XMLElement
(tag, *args, **kwargs)¶ Bases:
trac.util.html.Fragment
An element represents an XML element, with a tag name, attributes and content.
Building HTML from strings¶
It is also possible to mark an arbitrary string as containing HTML content, so that it will not be HTML-escaped by the template engine.
For this, use the Markup
class, taken from the markupsafe
package
(itself a dependency of the Jinja2 package).
The Markup
class should be imported from the present module:
from trac.util.html import Markup
HTML clean-up and sanitization¶
-
class
trac.util.html.
TracHTMLSanitizer
(safe_schemes=frozenset(['mailto', 'ftp', 'http', 'file', 'https', None]), safe_css=frozenset(['counter-reset', 'counter-increment', 'min-height', 'quotes', 'border-top', 'font', 'list-style-image', 'outline-width', 'border-right', 'border-radius', 'border-bottom', 'border-spacing', 'background', 'list-style-type', 'text-align', 'page-break-inside', 'orphans', 'page-break-before', 'border-bottom-right-radius', 'line-height', 'padding-left', 'font-size', 'right', 'word-spacing', 'padding-top', 'outline-style', 'bottom', 'content', 'border-right-style', 'padding-right', 'border-left-style', 'background-color', 'border-bottom-color', 'outline-color', 'unicode-bidi', 'max-width', 'font-family', 'caption-side', 'text-transform', 'border-right-width', 'border-top-style', 'color', 'border-collapse', 'border-bottom-width', 'float', 'height', 'max-height', 'margin-right', 'border-top-width', 'border-bottom-left-radius', 'top', 'border-width', 'min-width', 'width', 'font-variant', 'border-top-color', 'background-position', 'empty-cells', 'direction', 'border-left', 'visibility', 'padding', 'border-style', 'background-attachment', 'overflow', 'border-bottom-style', 'cursor', 'margin', 'display', 'border-left-width', 'letter-spacing', 'border-top-left-radius', 'vertical-align', 'clip', 'border-color', 'list-style', 'padding-bottom', 'margin-left', 'widows', 'border', 'font-style', 'border-left-color', 'background-repeat', 'table-layout', 'margin-bottom', 'border-top-right-radius', 'font-weight', 'opacity', 'border-right-color', 'page-break-after', 'white-space', 'text-indent', 'background-image', 'outline', 'clear', 'z-index', 'text-decoration', 'margin-top', 'position', 'left', 'list-style-position']), safe_tags=frozenset(['em', 'pre', 'code', 'p', 'h2', 'h3', 'h1', 'h6', 'h4', 'h5', 'table', 'font', 'u', 'select', 'kbd', 'strong', 'span', 'sub', 'img', 'area', 'menu', 'tt', 'tr', 'tbody', 'label', 'hr', 'dfn', 'tfoot', 'th', 'sup', 'strike', 'input', 'td', 'samp', 'cite', 'thead', 'map', 'dl', 'blockquote', 'fieldset', 'option', 'form', 'acronym', 'big', 'dd', 'var', 'ol', 'abbr', 'br', 'address', 'optgroup', 'li', 'dt', 'ins', 'legend', 'a', 'b', 'center', 'textarea', 'colgroup', 'i', 'button', 'q', 'caption', 's', 'del', 'small', 'div', 'col', 'dir', 'ul']), safe_attrs=frozenset(['rev', 'prompt', 'color', 'colspan', 'accesskey', 'usemap', 'cols', 'accept', 'datetime', 'char', 'accept-charset', 'shape', 'href', 'hreflang', 'selected', 'frame', 'type', 'alt', 'nowrap', 'border', 'id', 'axis', 'compact', 'style', 'rows', 'checked', 'for', 'start', 'hspace', 'charset', 'ismap', 'label', 'target', 'bgcolor', 'readonly', 'rel', 'valign', 'scope', 'size', 'cellspacing', 'cite', 'media', 'multiple', 'src', 'rules', 'nohref', 'action', 'rowspan', 'abbr', 'span', 'method', 'height', 'class', 'enctype', 'lang', 'disabled', 'name', 'charoff', 'clear', 'summary', 'value', 'longdesc', 'headers', 'vspace', 'noshade', 'coords', 'width', 'maxlength', 'cellpadding', 'title', 'align', 'dir', 'tabindex']), uri_attrs=frozenset(['src', 'lowsrc', 'href', 'dynsrc', 'background', 'action']), safe_origins=frozenset(['data:']))¶ Bases:
object
Sanitize HTML constructions which are potentially vector of phishing or XSS attacks, in user-supplied HTML.
The usual way to use the sanitizer is to call the
sanitize
method on some potentially unsafe HTML content.Note that for backward compatibility, the TracHTMLSanitizer still behaves as a Genshi filter.
See also genshi.HTMLSanitizer from which the TracHTMLSanitizer has evolved.
Note: safe_schemes and safe_css have to remain the first parameters, for backward-compatibility purpose.
-
is_safe_css
(prop, value)¶ Determine whether the given css property declaration is to be considered safe for inclusion in the output.
-
is_safe_elem
(tag, attrs)¶ Determine whether the given element should be considered safe for inclusion in the output.
Parameters: - tag (QName or basestring) – the tag name of the element
- attrs (Attrs or list) – the element attributes
Returns: whether the element should be considered safe
Return type:
-
is_safe_uri
(uri)¶ Determine whether the given URI is to be considered safe for inclusion in the output.
The default implementation checks whether the scheme of the URI is in the set of allowed URIs (
safe_schemes
).>>> sanitizer = TracHTMLSanitizer() >>> sanitizer.is_safe_uri('http://example.org/') True >>> sanitizer.is_safe_uri('javascript:alert(document.cookie)') False
Parameters: uri – the URI to check Returns: True
if the URI can be considered safe,False
otherwiseReturn type: bool
-
sanitize
(html)¶ Transforms the incoming HTML by removing anything’s that deemed unsafe.
Parameters: html – the input HTML Type: basestring Returns: the sanitized content Return type: Markup
-
sanitize_attrs
(tag, attrs)¶ Remove potentially dangerous attributes and sanitize the style attribute .
Parameters: tag – the tag name of the element Returns: a dict containing only safe or sanitized attributes Return type: dict
-
sanitize_css
(text)¶ Remove potentially dangerous property declarations from CSS code.
In particular, properties using the CSS
url()
function with a scheme that is not considered safe are removed:>>> sanitizer = TracHTMLSanitizer() >>> sanitizer.sanitize_css(u''' ... background: url(javascript:alert("foo")); ... color: #000; ... ''') [u'color: #000']
Also, the proprietary Internet Explorer function
expression()
is always stripped:>>> sanitizer.sanitize_css(u''' ... background: #fff; ... color: #000; ... width: e/**/xpression(alert("F")); ... ''') [u'background: #fff', u'color: #000', u'width: e xpression(alert("F"))']
Parameters: text – the CSS text; this is expected to be unicode
and to not contain any character or numeric referencesReturns: a list of declarations that are considered safe Return type: list
-
-
class
trac.util.html.
Deuglifier
¶ Bases:
object
Help base class used for cleaning up HTML riddled with
<FONT COLOR=...>
tags and replace them with appropriate<span class="...">
.The subclass must define a
rules()
static method returning a list of regular expression fragments, each defining a capture group in which the name will be reused for the span’s class. Two special group names,font
andendfont
are used to emit<span>
and</span>
, respectively.
-
trac.util.html.
escape
(str, quotes=True)¶ Create a Markup instance from a string and escape special characters it may contain (<, >, & and ”).
Parameters: - text – the string to escape; if not a string, it is assumed that the input can be converted to a string
- quotes – if
True
, double quote characters are escaped in addition to the other special characters
>>> escape('"1 < 2"') Markup(u'"1 < 2"')
>>> escape(['"1 < 2"']) Markup(u"['"1 < 2"']")
If the
quotes
parameter is set toFalse
, the ” character is left as is. Escaping quotes is generally only required for strings that are to be used in attribute values.>>> escape('"1 < 2"', quotes=False) Markup(u'"1 < 2"')
>>> escape(['"1 < 2"'], quotes=False) Markup(u'[\'"1 < 2"\']')
However,
escape
behaves slightly differently withMarkup
andFragment
behave instances, as they are passed through unmodified.>>> escape(Markup('"1 < 2 '"')) Markup(u'"1 < 2 '"')
>>> escape(Markup('"1 < 2 '"'), quotes=False) Markup(u'"1 < 2 '"')
>>> escape(tag.b('"1 < 2"')) Markup(u'<b>"1 < 2"</b>')
>>> escape(tag.b('"1 < 2"'), quotes=False) Markup(u'<b>"1 < 2"</b>')
Returns: the escaped Markup
stringReturn type: Markup
-
trac.util.html.
unescape
(text)¶ Reverse-escapes &, <, >, and ” and returns a
unicode
object.>>> unescape(Markup('1 < 2')) u'1 < 2'
If the provided
text
object is not aMarkup
instance, it is returned unchanged.>>> unescape('1 < 2') '1 < 2'
Parameters: text – the text to unescape Returns: the unescsaped string Return type: unicode
-
trac.util.html.
stripentities
(text, keepxmlentities=False)¶ Return a copy of the given text with any character or numeric entities replaced by the equivalent UTF-8 characters.
>>> stripentities('1 < 2') Markup(u'1 < 2') >>> stripentities('more …') Markup(u'more \u2026') >>> stripentities('…') Markup(u'\u2026') >>> stripentities('…') Markup(u'\u2026') >>> stripentities(Markup(u'\u2026')) Markup(u'\u2026')
If the
keepxmlentities
parameter is provided and is a truth value, the core XML entities (&, ', >, < and ") are left intact.>>> stripentities('1 < 2 …', keepxmlentities=True) Markup(u'1 < 2 \u2026')
Returns: a Markup
instance with entities removedReturn type: Markup
Return a copy of the text with any XML/HTML tags removed.
>>> striptags('<span>Foo</span> bar') Markup(u'Foo bar') >>> striptags('<span class="bar">Foo</span>') Markup(u'Foo') >>> striptags('Foo<br />') Markup(u'Foo')
HTML/XML comments are stripped, too:
>>> striptags('<!-- <blub>hehe</blah> -->test') Markup(u'test')
Parameters: text – the string to remove tags from Returns: a Markup
instance with all tags removedReturn type: Markup
-
trac.util.html.
plaintext
(text, keeplinebreaks=True)¶ Extract the text elements from (X)HTML content
Parameters:
-
class
trac.util.html.
FormTokenInjector
(form_token, out)¶ Bases:
trac.util.html.HTMLTransform
Identify and protect forms from CSRF attacks.
This filter works by adding a input type=hidden field to POST forms.
-
class
trac.util.html.
HTMLTransform
(out)¶ Bases:
HTMLParser.HTMLParser
Convenience base class for writing HTMLParsers.
The default implementation of the HTMLParser
handle_*
methods do nothing, while in our case we try to rewrite the incoming document unmodified.
-
class
trac.util.html.
HTMLSanitization
(sanitizer, out)¶ Bases:
trac.util.html.HTMLTransform
Sanitize parsed HTML using TracHTMLSanitizer.
Misc. HTML processing¶
-
trac.util.html.
find_element
(frag, attr=None, cls=None, tag=None)¶ Return the first element in the fragment having the given attribute, class or tag, using a preorder depth-first search.
-
trac.util.html.
valid_html_bytes
(bytes)¶
-
trac.util.html.
to_fragment
(input) Convert input to a
Fragment
object.
Kept for backward compatibility purposes:
-
trac.util.html.
expand_markup
(stream, ctxt=None)¶ A Genshi stream filter for expanding
genshi.Markup
events.Note: Expansion may not be possible if the fragment is badly formed, or partial.