ocean.text.utf.UtfUtil

Contains utility functions for working with unicode strings. Contains a function to return the length of a UTF-8 string, a method to truncate a UTF-8 string to the nearest whitespace character that is less than a maximum length parameter, and a method to truncate a UTF-8 string and append a set ending to it.

Example usage:

char[] utf = ...; // some UTF-8 character sequence

// using the default unicode error handler
size_t len1 = utf8Length(utf);

// using a custom error handler
// which takes the index of the string as a parameter
size_t len2 = utf8Length(utf, (size_t i){ // error handling code...  });

Members

Functions

limitStringLength
inout(mstring) limitStringLength(inout(mstring) str, size_t max_len)

Limits the length of a UTF-8 string, to at most the specified number of bytes.

truncateAppendEnding
mstring truncateAppendEnding(mstring str, size_t n, cstring ending)

Truncate the length of a UTF-8 string and append a set ending. The string is initially truncated so that it is of maximum length n (this includes the extra ending paramter so the string is truncated to position n - ending.length).

truncateAtN
mstring truncateAtN(cstring src, size_t n, mstring buffer, cstring ending, float fill_ratio)

Truncates a string at the last space before the n-th Unicode character or, if the resulting string is too short, at the n-th Unicode character. The string should be a valid UTF-8 (the caller should have validated it before calling this function).

truncateAtWordBreak
mstring truncateAtWordBreak(mstring str, size_t n)

Limits str to a length of n UTF-8 code points, cutting off on the last space, if found. If str is not valid UTF-8, str.length is assumed to be the number of code points.

utf8Length
size_t utf8Length(cstring str)

Calculates the number of UTF8 code points in a UTF8-encoded string. Calls the standard unicode error handler on error, which throws a new UnicodeException.

utf8Length
size_t utf8Length(cstring str, void delegate(size_t) error_dg)

Calculates the number of UTF8 code points in a UTF8-encoded string. Calls error_dg if an invalid UTF8 code unit is detected, which may throw an exception to abort processing.

Variables

ellipsis
istring ellipsis;

UTF-8 representation of "…".

Meta

License

Boost Software License Version 1.0. See LICENSE_BOOST.txt for details. Alternatively, this file may be distributed under the terms of the Tango 3-Clause BSD License (see LICENSE_BSD.txt for details).