Go to the source code of this file.
Functions | |
SBuf | Latin1ToUtf8 (const char *in) |
converts ISO-LATIN-1 to UTF-8 More... | |
SBuf | Cp1251ToUtf8 (const char *in) |
converts CP1251 to UTF-8 More... | |
static size_t | utf8CodePointLength (const char b0) |
static bool | isValidUtf8CodePoint (const unsigned char *source, const size_t length) |
bool | isValidUtf8String (const char *source, const char *sourceEnd) |
returns whether the given input is a valid (or empty) sequence of UTF-8 code points More... | |
Function Documentation
◆ Cp1251ToUtf8()
SBuf Cp1251ToUtf8 | ( | const char * | in | ) |
Definition at line 37 of file toUtf.cc.
References SBuf::append(), and max().
◆ isValidUtf8CodePoint()
|
static |
Utility routine to tell whether a sequence of bytes is valid UTF-8. This must be called with the length pre-determined by the first byte. If presented with a length > 4, this returns false. The Unicode definition of UTF-8 goes up to 4-byte code points.
Definition at line 123 of file toUtf.cc.
Referenced by isValidUtf8String().
◆ isValidUtf8String()
bool isValidUtf8String | ( | const char * | source, |
const char * | sourceEnd | ||
) |
- Returns
- whether the given input is a valid (or empty) sequence of UTF-8 code points
Definition at line 172 of file toUtf.cc.
References isValidUtf8CodePoint(), and utf8CodePointLength().
◆ Latin1ToUtf8()
SBuf Latin1ToUtf8 | ( | const char * | in | ) |
Definition at line 16 of file toUtf.cc.
References SBuf::append().
◆ utf8CodePointLength()
|
inlinestatic |
- Returns
- the length of a UTF-8 code point that starts at the given byte
- Return values
-
0 indicates an invalid code point
- Parameters
-
b0 the first byte of a UTF-8 code point
Definition at line 101 of file toUtf.cc.
Referenced by isValidUtf8String().