trac.util.text
– Text manipulation¶
The Jinja2 template engine¶
As Jinja2 is mainly a text template engine, the low-level helper functions dealing with this package are placed here.
-
trac.util.text.
jinja2env
(**kwargs)¶ Creates a Jinja2
Environment
configured with Trac conventions.All default parameters can optionally be overriden. The
loader
parameter is not set by default, so unless it is set by the caller, only inline templates can be created from the environment.Return type: jinja.Environment
The Unicode toolbox¶
Trac internals are almost exclusively dealing with Unicode text,
represented by unicode
objects. The main advantage of using
unicode
over UTF-8 encoded str
(as this used to be the case before
version 0.10), is that text transformation functions in the present
module will operate in a safe way on individual characters, and won’t
risk to cut a multi-byte sequence in the middle. Similar issues with
Python string handling routines are avoided as well. For example, did
you know that “Priorità” is encoded as 'Priorit\xc3\x0a'
in UTF-8?
Calling strip()
on this value in some locales can cut away the
trailing \x0a
and it’s no longer valid UTF-8...
The drawback is that most of the outside world, while eventually
“Unicode”, is definitely not unicode
. This is why we need to convert
back and forth between str
and unicode
at the boundaries of the
system. And more often than not we even have to guess which encoding
is used in the incoming str
strings.
Encoding unicode
to str
is usually directly performed by calling
encode()
on the unicode
instance, while decoding is preferably
left to the to_unicode
helper function, which converts str
to
unicode
in a robust and guaranteed successful way.
-
trac.util.text.
to_unicode
(text, charset=None)¶ Convert input to an
unicode
object.For a
str
object, we’ll first try to decode the bytes using the givencharset
encoding (or UTF-8 if none is specified), then we fall back to the latin1 encoding which might be correct or not, but at least preserves the original byte sequence by mapping each byte to the corresponding unicode code point in the range U+0000 to U+00FF.For anything else, a simple
unicode()
conversion is attempted, with special care taken withException
objects.
-
trac.util.text.
exception_to_unicode
(e, traceback=False)¶ Convert an
Exception
to anunicode
object.In addition to
to_unicode
, this representation of the exception also contains the class name and optionally the traceback.
Web utilities¶
-
trac.util.text.
unicode_quote
(value, safe='/')¶ A unicode aware version of
urllib.quote
Parameters:
-
trac.util.text.
unicode_quote_plus
(value, safe='')¶ A unicode aware version of
urllib.quote_plus
.Parameters:
-
trac.util.text.
unicode_unquote
(value)¶ A unicode aware version of
urllib.unquote
.Parameters: str – UTF-8 encoded str
value (for example, as obtained byunicode_quote
).Return type: unicode
-
trac.util.text.
unicode_urlencode
(params, safe='')¶ A unicode aware version of
urllib.urlencode
.Values set to
empty
are converted to the key alone, without the equal sign.
-
trac.util.text.
quote_query_string
(text)¶ Quote strings for query string
-
trac.util.text.
javascript_quote
(text)¶ Quote strings for inclusion in single or double quote delimited Javascript strings
-
trac.util.text.
to_js_string
(text)¶ Embed the given string in a double quote delimited Javascript string (conform to the JSON spec)
Console and file system¶
-
trac.util.text.
getpreferredencoding
()¶ Return the encoding, which is retrieved on ahead, according to user preference.
We should use this instead of
locale.getpreferredencoding()
which is not thread-safe.
-
trac.util.text.
path_to_unicode
(path)¶ Convert a filesystem path to unicode, using the filesystem encoding.
-
trac.util.text.
stream_encoding
(stream)¶ Return the appropriate encoding for the given stream.
-
trac.util.text.
console_print
(out, *args, **kwargs)¶ Output the given arguments to the console, encoding the output as appropriate.
Parameters: kwargs – newline
controls whether a newline will be appended (defaults toTrue
)
-
trac.util.text.
printout
(*args, **kwargs)¶ Do a
console_print
onsys.stdout
.
-
trac.util.text.
printerr
(*args, **kwargs)¶ Do a
console_print
onsys.stderr
.
-
trac.util.text.
raw_input
(prompt)¶ Input one line from the console and converts it to unicode as appropriate.
Miscellaneous¶
-
trac.util.text.
empty
¶ A special tag object evaluating to the empty string, used as marker for missing value (as opposed to a present but empty value).
-
class
trac.util.text.
unicode_passwd
¶ Bases:
unicode
Conceal the actual content of the string when
repr
is called.
-
trac.util.text.
cleandoc
(message)¶ Removes uniform indentation and leading/trailing whitespace.
-
trac.util.text.
levenshtein_distance
(lhs, rhs)¶ Return the Levenshtein distance between two strings.
-
trac.util.text.
sub_vars
(text, args)¶ Substitute $XYZ-style variables in a string with provided values.
Parameters: - text – string containing variables to substitute.
- args – dictionary with keys matching the variables to be substituted. The keys should not be prefixed with the $ character.
-
trac.util.text.
getpreferredencoding
() Return the encoding, which is retrieved on ahead, according to user preference.
We should use this instead of
locale.getpreferredencoding()
which is not thread-safe.
Text formatting¶
-
trac.util.text.
pretty_size
(size, format='%.1f')¶ Pretty print content size information with appropriate unit.
Parameters: - size – number of bytes
- format – can be used to adjust the precision shown
-
trac.util.text.
breakable_path
(path)¶ Make a path breakable after path separators, and conversely, avoid breaking at spaces.
-
trac.util.text.
normalize_whitespace
(text, to_space=u'\xa0', remove=u'\u200b')¶ Normalize whitespace in a string, by replacing special spaces by normal spaces and removing zero-width spaces.
-
trac.util.text.
unquote_label
(txt)¶ Remove (one level of) enclosing single or double quotes.
New in version 1.0.
-
trac.util.text.
fix_eol
(text, eol)¶ Fix end-of-lines in a text.
-
trac.util.text.
expandtabs
(s, tabstop=8, ignoring=None)¶ Expand tab characters
'\t'
into spaces.Parameters: - tabstop – number of space characters per tab (defaults to the canonical 8)
- ignoring – if not
None
, the expansion will be “smart” and go from one tabstop to the next. In addition, this parameter lists characters which can be ignored when computing the indent.
-
trac.util.text.
is_obfuscated
(word)¶ Returns
True
if theword
looks like an obfuscated e-mail address.Since: 1.2
-
trac.util.text.
obfuscate_email_address
(address)¶ Replace anything looking like an e-mail address (
'@something'
) with a trailing ellipsis ('@…'
)
-
trac.util.text.
text_width
(text, ambiwidth=1)¶ Determine the column width of
text
in Unicode characters.The characters in the East Asian Fullwidth (F) or East Asian Wide (W) have a column width of 2. The other characters in the East Asian Halfwidth (H) or East Asian Narrow (Na) have a column width of 1.
That
ambiwidth
parameter is used for the column width of the East Asian Ambiguous (A). If1
, the same width as characters in US-ASCII. This is expected by most users. If2
, twice the width of US-ASCII characters. This is expected by CJK users.
-
trac.util.text.
print_table
(data, headers=None, sep=' ', out=None, ambiwidth=None)¶ Print data according to a tabular layout.
Parameters: - data – a sequence of rows; assume all rows are of equal length.
- headers – an optional row containing column headers; must be of
the same length as each row in
data
. - sep – column separator
- out – output file descriptor (
None
means usesys.stdout
) - ambiwidth – column width of the East Asian Ambiguous (A). If None,
detect ambiwidth with the locale settings. If others,
pass to the
ambiwidth
parameter oftext_width
.
-
trac.util.text.
shorten_line
(text, maxlen=75)¶ Truncates
text
to length less than or equal tomaxlen
characters.This tries to be (a bit) clever and attempts to find a proper word boundary for doing so.
-
trac.util.text.
stripws
(text, leading=True, trailing=True)¶ Strips unicode white-spaces and ZWSPs from
text
.Parameters:
-
trac.util.text.
strip_line_ws
(text, leading=True, trailing=True)¶ Strips unicode white-spaces and ZWSPs from each line of
text
.Parameters:
-
trac.util.text.
strip_line_ws
(text, leading=True, trailing=True) Strips unicode white-spaces and ZWSPs from each line of
text
.Parameters:
-
trac.util.text.
wrap
(t, cols=75, initial_indent='', subsequent_indent='', linesep='\n', ambiwidth=1)¶ Wraps the single paragraph in
t
, which contains unicode characters. The every line is at mostcols
characters long.That
ambiwidth
parameter is used for the column width of the East Asian Ambiguous (A). If1
, the same width as characters in US-ASCII. This is expected by most users. If2
, twice the width of US-ASCII characters. This is expected by CJK users.
-
trac.util.text.
cleandoc
(message) Removes uniform indentation and leading/trailing whitespace.
-
trac.util.text.
sub_vars
(text, args) Substitute $XYZ-style variables in a string with provided values.
Parameters: - text – string containing variables to substitute.
- args – dictionary with keys matching the variables to be substituted. The keys should not be prefixed with the $ character.
Conversion utilities¶
-
trac.util.text.
unicode_to_base64
(text, strip_newlines=True)¶ Safe conversion of
text
to base64 representation using utf-8 bytes.Strips newlines from output unless
strip_newlines
isFalse
.
-
trac.util.text.
unicode_from_base64
(text)¶ Safe conversion of
text
to unicode based on utf-8 bytes.