将unicode代码点转换为UTF-8的最简单方法

时间:2023-01-05 16:30:28

What's the simplest way to convert a Unicode codepoint into a UTF-8 byte sequence in C? The only way that springs to mind is using iconv to map from the UTF-32LE codepage to UTF-8, but that seems like overkill.

在C中将Unicode代码点转换为UTF-8字节序列的最简单方法是什么?想到的唯一方法是使用iconv从UTF-32LE代码页映射到UTF-8,但这看起来有点过分。

3 个解决方案

#1


7  

Unicode conversion is not a simple task. Using iconv doesn't seem like overkill at all to me. Perhaps there is a library version of iconv you can use to avoid make a system() call, if that's what you want to avoid.

Unicode转换不是一项简单的任务。使用iconv对我来说似乎并不过分。也许有一个库版本的iconv你可以使用它来避免进行system()调用,如果这是你想要避免的。

#2


5  

Might I suggest ICU? It's a reasonably "industry standard" way of handling i18n issues.

我可以建议ICU吗?这是处理i18n问题的合理“行业标准”方式。

I haven't used the C version myself, but I suspect ucnv_fromUnicode might be the function you're after.

我自己没有使用过C版本,但我怀疑ucnv_fromUnicode可能是你所追求的功能。

#3


3  

UTF8 works by coding the length of the encoded codepoint into the highest bits of the encoded bytes. see http://en.wikipedia.org/wiki/UTF-8#Description

UTF8通过将编码的码点的长度编码为编码字节的最高位来工作。见http://en.wikipedia.org/wiki/UTF-8#Description

I found this small function in C here http://www.deanlee.cn/programming/convert-unicode-to-utf8/ , didn't test it though.

我在这里找到了这个小函数http://www.deanlee.cn/programming/convert-unicode-to-utf8/,虽然没有测试。

#1


7  

Unicode conversion is not a simple task. Using iconv doesn't seem like overkill at all to me. Perhaps there is a library version of iconv you can use to avoid make a system() call, if that's what you want to avoid.

Unicode转换不是一项简单的任务。使用iconv对我来说似乎并不过分。也许有一个库版本的iconv你可以使用它来避免进行system()调用,如果这是你想要避免的。

#2


5  

Might I suggest ICU? It's a reasonably "industry standard" way of handling i18n issues.

我可以建议ICU吗?这是处理i18n问题的合理“行业标准”方式。

I haven't used the C version myself, but I suspect ucnv_fromUnicode might be the function you're after.

我自己没有使用过C版本,但我怀疑ucnv_fromUnicode可能是你所追求的功能。

#3


3  

UTF8 works by coding the length of the encoded codepoint into the highest bits of the encoded bytes. see http://en.wikipedia.org/wiki/UTF-8#Description

UTF8通过将编码的码点的长度编码为编码字节的最高位来工作。见http://en.wikipedia.org/wiki/UTF-8#Description

I found this small function in C here http://www.deanlee.cn/programming/convert-unicode-to-utf8/ , didn't test it though.

我在这里找到了这个小函数http://www.deanlee.cn/programming/convert-unicode-to-utf8/,虽然没有测试。