在c++中相当于Java的String.getBytes(“UTF-8”)?

时间:2021-06-07 21:00:29

I need to implement this Java code in (unmanaged) c++:

我需要在(非托管)c++中实现这个Java代码:

byte[] b = string.getBytes("UTF8");

I'm new to c++, and can't find anything to do this. It has to be platform independent, if possible. Using c++11 compiler.

我是c++的新手,找不到任何可以做的事情。如果可能的话,它必须是平台无关的。使用c++编译器。

4 个解决方案

#1


3  

Java String is roughly equivalent to std::u16string, a specialization of std::basic_string. I suggest you try something like...

Java字符串大致相当于std::u16字符串,std的专门化::basic_string。我建议你试试……

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
std::string converted = convert.to_bytes(u"HELLO, WORLD!");
const char *bytes = converted.data();

Note this relies on C++11; it might be sometime before your compiler vendor fully supports these features.

注意,这依赖于c++ 11;可能在您的编译器供应商完全支持这些特性之前。

Here, we utilize the newly introduced std::wstring_convert to convert from a wide-character UTF-16 string to the UTF-8 multibyte string via to_bytes (it also supports conversion in the other direction, too).

在这里,我们利用新引入的std::wstring_convert,通过to_bytes将宽字符UTF-16字符串转换为UTF-8多字节字符串(它也支持另一个方向的转换)。

This is made possible via the (also newly introduced) std::codecvt_utf8_utf16 conversion facet. It takes care of the actual conversion for us nicely.

这是通过(新引入的)std::codecvt_utf8_utf16转换facet实现的。它很好地处理了我们的实际转换。

Besides that, it makes use of the new character literal prefixes added with C++11 -- in particular, u, which is for char16_t UTF-16 strings :-) There are also u8 and U for UTF-8 and UTF-32, respectively.

除此之外,它还使用了添加了c++ 11的新字符文字前缀——特别是,用于char16_t UTF-16字符串的u,分别是UTF-8和UTF-32。


PS data is (as of C++11) guaranteed to be equal to c_str and therefore can be relied upon to be NUL-terminated.

PS数据(C++11)保证等于c_str,因此可以依赖于null终止。

#2


1  

Solution Number 1:-

解决方案1:-

 char bytecpp[]= u8"You don't need strings.getbytes :P";

Solution Number 2:-

解决方案2:-

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>> myconv;
std::string mbstring = myconv.to_bytes(u"Hello\n");
std::cout << mbstring;

#3


0  

Assuming the string is already in UTF-8, you can use:

假设字符串已经在UTF-8中,您可以使用:

char const *c = myString.c_str();

For read/write access, you could use:

对于读/写访问,您可以使用:

std::vector<char> bytes(myString.begin(), myString.end());
bytes.push_back('\0');
char *c = &bytes[0];

#4


0  

A string in C++ is typically ASCII 1 byte per character. So you would have to take care of it before you marshaled it to C++ if you went with the typical std::string. However C++ does define a wide character string std::wstring, unfortunately(from the wikipedia article on wide characters):

在c++中,字符串通常是每个字符的ASCII 1字节。因此,如果你使用典型的std::string,那么在将它发送到c++之前,你必须先处理它。然而,c++确实定义了一个宽字符串std::wstring,不幸的是(从*上关于宽字符的文章):

The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers.

wchar_t的宽度是特定于编译器的,可以小到8位。因此,需要在任何C或c++编译器中移植的程序不应该使用wchar_t来存储Unicode文本。wchar_t类型用于存储编译器定义的宽字符,这些字符可能是一些编译器中的Unicode字符。

So we would have to know what C++ compiler you were going to use to answer the question completely. For the std::wstring class there is no to bytes type function, so what you want to do is use c_str() as mentioned in the other answers then use &(bit wise and) and a byte mask to split the wide characters in to bytes.

所以我们必须知道你要用什么c++编译器来完全回答这个问题。对于std::wstring类没有字节类型的函数,所以您要做的是使用其他答案中提到的c_str(),然后使用&(bit wise)和一个字节掩码将宽字符分割成字节。

in visual C++ a wide character is 16 bits so you would want something like the following to process each characters in to bytes:

在visual c++中,一个宽字符是16位,所以您需要如下方法来处理每个字符的字节:

high_byte = wcharacter & 0xFF00;
low_byte = wcharacter & 0xFF;

#1


3  

Java String is roughly equivalent to std::u16string, a specialization of std::basic_string. I suggest you try something like...

Java字符串大致相当于std::u16字符串,std的专门化::basic_string。我建议你试试……

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
std::string converted = convert.to_bytes(u"HELLO, WORLD!");
const char *bytes = converted.data();

Note this relies on C++11; it might be sometime before your compiler vendor fully supports these features.

注意,这依赖于c++ 11;可能在您的编译器供应商完全支持这些特性之前。

Here, we utilize the newly introduced std::wstring_convert to convert from a wide-character UTF-16 string to the UTF-8 multibyte string via to_bytes (it also supports conversion in the other direction, too).

在这里,我们利用新引入的std::wstring_convert,通过to_bytes将宽字符UTF-16字符串转换为UTF-8多字节字符串(它也支持另一个方向的转换)。

This is made possible via the (also newly introduced) std::codecvt_utf8_utf16 conversion facet. It takes care of the actual conversion for us nicely.

这是通过(新引入的)std::codecvt_utf8_utf16转换facet实现的。它很好地处理了我们的实际转换。

Besides that, it makes use of the new character literal prefixes added with C++11 -- in particular, u, which is for char16_t UTF-16 strings :-) There are also u8 and U for UTF-8 and UTF-32, respectively.

除此之外,它还使用了添加了c++ 11的新字符文字前缀——特别是,用于char16_t UTF-16字符串的u,分别是UTF-8和UTF-32。


PS data is (as of C++11) guaranteed to be equal to c_str and therefore can be relied upon to be NUL-terminated.

PS数据(C++11)保证等于c_str,因此可以依赖于null终止。

#2


1  

Solution Number 1:-

解决方案1:-

 char bytecpp[]= u8"You don't need strings.getbytes :P";

Solution Number 2:-

解决方案2:-

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>> myconv;
std::string mbstring = myconv.to_bytes(u"Hello\n");
std::cout << mbstring;

#3


0  

Assuming the string is already in UTF-8, you can use:

假设字符串已经在UTF-8中,您可以使用:

char const *c = myString.c_str();

For read/write access, you could use:

对于读/写访问,您可以使用:

std::vector<char> bytes(myString.begin(), myString.end());
bytes.push_back('\0');
char *c = &bytes[0];

#4


0  

A string in C++ is typically ASCII 1 byte per character. So you would have to take care of it before you marshaled it to C++ if you went with the typical std::string. However C++ does define a wide character string std::wstring, unfortunately(from the wikipedia article on wide characters):

在c++中,字符串通常是每个字符的ASCII 1字节。因此,如果你使用典型的std::string,那么在将它发送到c++之前,你必须先处理它。然而,c++确实定义了一个宽字符串std::wstring,不幸的是(从*上关于宽字符的文章):

The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers.

wchar_t的宽度是特定于编译器的,可以小到8位。因此,需要在任何C或c++编译器中移植的程序不应该使用wchar_t来存储Unicode文本。wchar_t类型用于存储编译器定义的宽字符,这些字符可能是一些编译器中的Unicode字符。

So we would have to know what C++ compiler you were going to use to answer the question completely. For the std::wstring class there is no to bytes type function, so what you want to do is use c_str() as mentioned in the other answers then use &(bit wise and) and a byte mask to split the wide characters in to bytes.

所以我们必须知道你要用什么c++编译器来完全回答这个问题。对于std::wstring类没有字节类型的函数,所以您要做的是使用其他答案中提到的c_str(),然后使用&(bit wise)和一个字节掩码将宽字符分割成字节。

in visual C++ a wide character is 16 bits so you would want something like the following to process each characters in to bytes:

在visual c++中,一个宽字符是16位,所以您需要如下方法来处理每个字符的字节:

high_byte = wcharacter & 0xFF00;
low_byte = wcharacter & 0xFF;