truncateAtN

Truncates a string at the last space before the n-th Unicode character or, if the resulting string is too short, at the n-th Unicode character. The string should be a valid UTF-8 (the caller should have validated it before calling this function).

If a string is truncated before the end, then the final Unicode chartacter is made an ending. Trailing space is removed before the ending is added. The returned string will always be no more than n Unicode characters (including the ending).

The basic algorithm is to walk through src keeping track of how many bytes needed to be sliced at any particular time until we know when we need to end. Because we don't know till the end if we need an ending we need to keep track of one Unicode character behind as well as the position of the Unicode character berore the last space. We have to be careful we never point at spaces.

Important points when reading the algorithm:

1) Unicode character != byte 2) i == the number of bytes required to include the _previous_ Unicode character (i.e. the number of bytes to the start of c)

truncateAtN
(,
size_t n
,,,
float fill_ratio = 0.75
)
out (result) { size_t result_length = 0; foreach (dchar c; result) { ++result_length; } assert (result_length <= n); }

Parameters

src cstring

the string to truncate (must be UTF-8 encoded)

n size_t

the maximum number of Unicode characters allowed in the returned string

buffer mstring

a buffer to be used to store the result in (may be resized). The buffer is required because "ending" may contain Unicode characters taking more bytes than the Unicode characters in src they replace, thus leading to a string with fewer Unicode characters but more bytes.

ending cstring

These Unicode characters will be appended when "src" needs to be truncated.

fill_ratio float

if cutting the string in the last space would make its Unicode character length smaller than "n*fill_ratio", then we cut it on the n-th Unicode character

Return Value

Type: mstring

buffer

Meta