I need to implement this Java code in (unmanaged) c++:
我需要在(非托管)c++中实现这个Java代码:
byte[] b = string.getBytes("UTF8");
I'm new to c++, and can't find anything to do this. It has to be platform independent, if possible. Using c++11 compiler.
我是c++的新手,找不到任何可以做的事情。如果可能的话,它必须是平台无关的。使用c++编译器。
4 个解决方案
#1
3
Java String
is roughly equivalent to std::u16string
, a specialization of std::basic_string
. I suggest you try something like...
Java字符串大致相当于std::u16字符串,std的专门化::basic_string。我建议你试试……
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
std::string converted = convert.to_bytes(u"HELLO, WORLD!");
const char *bytes = converted.data();
Note this relies on C++11; it might be sometime before your compiler vendor fully supports these features.
注意,这依赖于c++ 11;可能在您的编译器供应商完全支持这些特性之前。
Here, we utilize the newly introduced std::wstring_convert
to convert from a wide-character UTF-16 string to the UTF-8 multibyte string via to_bytes
(it also supports conversion in the other direction, too).
在这里,我们利用新引入的std::wstring_convert,通过to_bytes将宽字符UTF-16字符串转换为UTF-8多字节字符串(它也支持另一个方向的转换)。
This is made possible via the (also newly introduced) std::codecvt_utf8_utf16
conversion facet. It takes care of the actual conversion for us nicely.
这是通过(新引入的)std::codecvt_utf8_utf16转换facet实现的。它很好地处理了我们的实际转换。
Besides that, it makes use of the new character literal prefixes added with C++11 -- in particular, u
, which is for char16_t
UTF-16 strings :-) There are also u8
and U
for UTF-8 and UTF-32, respectively.
除此之外,它还使用了添加了c++ 11的新字符文字前缀——特别是,用于char16_t UTF-16字符串的u,分别是UTF-8和UTF-32。
PS data
is (as of C++11) guaranteed to be equal to c_str
and therefore can be relied upon to be NUL-terminated.
PS数据(C++11)保证等于c_str,因此可以依赖于null终止。
#2
1
Solution Number 1:-
解决方案1:-
char bytecpp[]= u8"You don't need strings.getbytes :P";
Solution Number 2:-
解决方案2:-
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>> myconv;
std::string mbstring = myconv.to_bytes(u"Hello\n");
std::cout << mbstring;
#3
0
Assuming the string is already in UTF-8, you can use:
假设字符串已经在UTF-8中,您可以使用:
char const *c = myString.c_str();
For read/write access, you could use:
对于读/写访问,您可以使用:
std::vector<char> bytes(myString.begin(), myString.end());
bytes.push_back('\0');
char *c = &bytes[0];
#4
0
A string in C++ is typically ASCII 1 byte per character. So you would have to take care of it before you marshaled it to C++ if you went with the typical std::string. However C++ does define a wide character string std::wstring, unfortunately(from the wikipedia article on wide characters):
在c++中,字符串通常是每个字符的ASCII 1字节。因此,如果你使用典型的std::string,那么在将它发送到c++之前,你必须先处理它。然而,c++确实定义了一个宽字符串std::wstring,不幸的是(从*上关于宽字符的文章):
The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers.
wchar_t的宽度是特定于编译器的,可以小到8位。因此,需要在任何C或c++编译器中移植的程序不应该使用wchar_t来存储Unicode文本。wchar_t类型用于存储编译器定义的宽字符,这些字符可能是一些编译器中的Unicode字符。
So we would have to know what C++ compiler you were going to use to answer the question completely. For the std::wstring class there is no to bytes type function, so what you want to do is use c_str() as mentioned in the other answers then use &(bit wise and) and a byte mask to split the wide characters in to bytes.
所以我们必须知道你要用什么c++编译器来完全回答这个问题。对于std::wstring类没有字节类型的函数,所以您要做的是使用其他答案中提到的c_str(),然后使用&(bit wise)和一个字节掩码将宽字符分割成字节。
in visual C++ a wide character is 16 bits so you would want something like the following to process each characters in to bytes:
在visual c++中,一个宽字符是16位,所以您需要如下方法来处理每个字符的字节:
high_byte = wcharacter & 0xFF00;
low_byte = wcharacter & 0xFF;
#1
3
Java String
is roughly equivalent to std::u16string
, a specialization of std::basic_string
. I suggest you try something like...
Java字符串大致相当于std::u16字符串,std的专门化::basic_string。我建议你试试……
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
std::string converted = convert.to_bytes(u"HELLO, WORLD!");
const char *bytes = converted.data();
Note this relies on C++11; it might be sometime before your compiler vendor fully supports these features.
注意,这依赖于c++ 11;可能在您的编译器供应商完全支持这些特性之前。
Here, we utilize the newly introduced std::wstring_convert
to convert from a wide-character UTF-16 string to the UTF-8 multibyte string via to_bytes
(it also supports conversion in the other direction, too).
在这里,我们利用新引入的std::wstring_convert,通过to_bytes将宽字符UTF-16字符串转换为UTF-8多字节字符串(它也支持另一个方向的转换)。
This is made possible via the (also newly introduced) std::codecvt_utf8_utf16
conversion facet. It takes care of the actual conversion for us nicely.
这是通过(新引入的)std::codecvt_utf8_utf16转换facet实现的。它很好地处理了我们的实际转换。
Besides that, it makes use of the new character literal prefixes added with C++11 -- in particular, u
, which is for char16_t
UTF-16 strings :-) There are also u8
and U
for UTF-8 and UTF-32, respectively.
除此之外,它还使用了添加了c++ 11的新字符文字前缀——特别是,用于char16_t UTF-16字符串的u,分别是UTF-8和UTF-32。
PS data
is (as of C++11) guaranteed to be equal to c_str
and therefore can be relied upon to be NUL-terminated.
PS数据(C++11)保证等于c_str,因此可以依赖于null终止。
#2
1
Solution Number 1:-
解决方案1:-
char bytecpp[]= u8"You don't need strings.getbytes :P";
Solution Number 2:-
解决方案2:-
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>> myconv;
std::string mbstring = myconv.to_bytes(u"Hello\n");
std::cout << mbstring;
#3
0
Assuming the string is already in UTF-8, you can use:
假设字符串已经在UTF-8中,您可以使用:
char const *c = myString.c_str();
For read/write access, you could use:
对于读/写访问,您可以使用:
std::vector<char> bytes(myString.begin(), myString.end());
bytes.push_back('\0');
char *c = &bytes[0];
#4
0
A string in C++ is typically ASCII 1 byte per character. So you would have to take care of it before you marshaled it to C++ if you went with the typical std::string. However C++ does define a wide character string std::wstring, unfortunately(from the wikipedia article on wide characters):
在c++中,字符串通常是每个字符的ASCII 1字节。因此,如果你使用典型的std::string,那么在将它发送到c++之前,你必须先处理它。然而,c++确实定义了一个宽字符串std::wstring,不幸的是(从*上关于宽字符的文章):
The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers.
wchar_t的宽度是特定于编译器的,可以小到8位。因此,需要在任何C或c++编译器中移植的程序不应该使用wchar_t来存储Unicode文本。wchar_t类型用于存储编译器定义的宽字符,这些字符可能是一些编译器中的Unicode字符。
So we would have to know what C++ compiler you were going to use to answer the question completely. For the std::wstring class there is no to bytes type function, so what you want to do is use c_str() as mentioned in the other answers then use &(bit wise and) and a byte mask to split the wide characters in to bytes.
所以我们必须知道你要用什么c++编译器来完全回答这个问题。对于std::wstring类没有字节类型的函数,所以您要做的是使用其他答案中提到的c_str(),然后使用&(bit wise)和一个字节掩码将宽字符分割成字节。
in visual C++ a wide character is 16 bits so you would want something like the following to process each characters in to bytes:
在visual c++中,一个宽字符是16位,所以您需要如下方法来处理每个字符的字节:
high_byte = wcharacter & 0xFF00;
low_byte = wcharacter & 0xFF;