如何用UTF-8编码将LPWSTR转换为char *

时间:2021-09-04 23:23:11

I'm working on a cross-platform project using Qt. On Windows, I want to pass some Unicode characters (for instance, file path that contains Chinese characters) as arguments when launching the application from the command line. Then use these arguments to create a QCoreApplication.

在Windows上,当从命令行启动应用程序时,我想传递一些Unicode字符(例如,包含中文字符的文件路径)作为参数。然后使用这些参数创建一个QCoreApplication。

For some reasons, I need to use CommandLineToArgvW to get the argument list like this:

由于某些原因,我需要使用CommandLineToArgvW来获取如下的参数列表:

LPWSTR * argvW = CommandLineToArgvW( GetCommandLineW(), &argc );

I understand on modern Windows OS, LPWSTR is actually wchar_t* which is 16bit and uses UTF-16 encoding.

我知道在现代Windows操作系统中,LPWSTR实际上是wchar_t*,它是16bit,使用UTF-16编码。

While if I want to initialize the QCoreApplication, it only takes char* but not wchar_t*. QCoreApplication

如果我想初始化QCoreApplication,它只需要char*而不是wchar_t*。QCoreApplication

So the question is: how can I safely convert the LPWSTR returned by CommandLineToArgvW() function to char* without losing the UNICODE encoding (i.e. the Chinese characters are still Chinese characters for example)?

因此问题是:如何安全地将CommandLineToArgvW()函数返回的LPWSTR转换为char*而不丢失UNICODE编码(例如,汉字仍然是汉字)?

I've tried many different ways without success:

我尝试过很多不同的方法却没有成功:

1:

1:

    std::string const argvString = boost::locale::conv::utf_to_utf<char>( argvW[0] )

2:

2:

    int res;
    char buf[0x400];
    char* pbuf = buf;
    boost::shared_ptr<char[]> shared_pbuf;

    res = WideCharToMultiByte(CP_UTF8, 0, pcs, -1, buf, sizeof(buf), NULL, NULL);

3: Convert to QString first, then convert to UTF-8.

3:先转换成QString,再转换成UTF-8。

ETID: Problem solved. The UTF-16 wide character to UTF-8 char conversion actually works fine without problem with all these three approaches. And in Visual Studio, in order to correctly view the UTF-8 string in debug, it's necessary to append the s8 format specifier after the watched variable name (see: https://msdn.microsoft.com/en-us/library/75w45ekt.aspx). This is the part that I overlooked and made me think that my string conversion was wrong.

ETID:问题解决了。从UTF-16宽字符到UTF-8字符转换实际上可以很好地处理这三种方法。在Visual Studio中,为了正确地在debug中查看UTF-8字符串,有必要在观察变量名之后追加s8格式说明符(参见:https://msdn.microsoft.com/en-us/library/75w45ekt.aspx)。这是我忽略的部分,使我认为我的字符串转换是错误的。

The real issue here is actually when calling QCoreApplication.arguments(), the returned QString is constructed by QString::fromLocal8Bit(), which would cause encoding issues on Windows when the command line arguments contain unicode characters. The workaround is whenever necessary to retrieve the command line arguments on Windows, always call the Windows API CommandLineToArgvW(), and convert the 16-bit UTF-16 wchar_t * (or LPWSTR) to 8-bit UTF-8 char * (by one of the three ways mentioned above).

这里真正的问题是在调用QCoreApplication.arguments()时,返回的QString由QString::fromLocal8Bit()构造,当命令行参数包含unicode字符时,这会导致Windows上的编码问题。无论何时需要在Windows上检索命令行参数,都要调用Windows API CommandLineToArgvW(),并将16位UTF-16 wchar_t *(或LPWSTR)转换为8位UTF-8 char *(使用上述三种方法之一)。

2 个解决方案

#1


2  

You should be able to use QString's functions. For example

您应该能够使用QString的函数。例如

QString str = QString::fromUtf16((const ushort*)argvW[0]);
::MessageBoxW(0, (const wchar_t*)str.utf16(), 0, 0);

When using WideCharToMultiByte, pass zero for output buffer and output buffer's length. This will tell you how many characters you need for output buffer. For example:

当使用WideCharToMultiByte时,为输出缓冲区和输出缓冲区的长度传递0。这将告诉您需要多少字符用于输出缓冲区。例如:

const wchar_t* wbuf = argvW[0];
int len = WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, 0, 0, 0, 0);

std::string buf(len, 0);

WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, &buf[0], len,0,0);
QString utf8;
utf8 = QString::fromUtf8(buf.c_str());
::MessageBoxW(0, (const wchar_t*)utf8.utf16(), 0, 0);

The same information should be available in QCoreApplication::arguments. For example, run this code with Unicode argument and see the output:

在QCoreApplication:: parameters中应该有相同的信息。例如,使用Unicode参数运行此代码,并查看输出:

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);
    QString filename = QString::fromUtf8("ελληνική.txt");
    QFile fout(filename);
    if (fout.open(QIODevice::WriteOnly | QIODevice::Text))
    {
        QTextStream oss(&fout);
        oss.setCodec("UTF-8");
        oss << filename << "\n";
        QStringList list = a.arguments();
        for (int i = 0; i < list.count(); i++)
            oss << list[i] << "\n";
    }
    fout.close();
    return a.exec();
}

Note that in above example the filename is internally converted to UTF-16, that's done by Qt. WinAPI uses UTF-16, not UTF-8

注意,在上面的示例中,文件名在内部被转换为UTF-16,这是Qt. WinAPI使用UTF-16,而不是UTF-8。

#2


2  

Qt internally wraps int main(), extracting and parsing the Unicode command line arguments (via CommandLineToArgvW) before any of your code is executed. The resulting parsed data is converted to the local UTF-8 format as char **argv via the equivalent of QString::toLocal8Bit().

Qt内部封装int main(),在执行任何代码之前提取和解析Unicode命令行参数(通过CommandLineToArgvW)。所得到的解析数据被转换为本地UTF-8格式,通过等效于QString::toLocal8Bit()转换为char **argv。

Use QCoreApplication::arguments() to retrieve the Unicode args. Also, a helpful note from the docs:

使用QCoreApplication:: parameters()检索Unicode args。同时,来自医生的一个有用的提示:

On Windows, the list is built from the argc and argv parameters only if modified argv/argc parameters are passed to the constructor. In that case, encoding problems might occur.

在Windows上,只有将修改后的argv/argc参数传递给构造函数时,列表才由argc和argv参数构建。在这种情况下,可能会出现编码问题。

#1


2  

You should be able to use QString's functions. For example

您应该能够使用QString的函数。例如

QString str = QString::fromUtf16((const ushort*)argvW[0]);
::MessageBoxW(0, (const wchar_t*)str.utf16(), 0, 0);

When using WideCharToMultiByte, pass zero for output buffer and output buffer's length. This will tell you how many characters you need for output buffer. For example:

当使用WideCharToMultiByte时,为输出缓冲区和输出缓冲区的长度传递0。这将告诉您需要多少字符用于输出缓冲区。例如:

const wchar_t* wbuf = argvW[0];
int len = WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, 0, 0, 0, 0);

std::string buf(len, 0);

WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, &buf[0], len,0,0);
QString utf8;
utf8 = QString::fromUtf8(buf.c_str());
::MessageBoxW(0, (const wchar_t*)utf8.utf16(), 0, 0);

The same information should be available in QCoreApplication::arguments. For example, run this code with Unicode argument and see the output:

在QCoreApplication:: parameters中应该有相同的信息。例如,使用Unicode参数运行此代码,并查看输出:

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);
    QString filename = QString::fromUtf8("ελληνική.txt");
    QFile fout(filename);
    if (fout.open(QIODevice::WriteOnly | QIODevice::Text))
    {
        QTextStream oss(&fout);
        oss.setCodec("UTF-8");
        oss << filename << "\n";
        QStringList list = a.arguments();
        for (int i = 0; i < list.count(); i++)
            oss << list[i] << "\n";
    }
    fout.close();
    return a.exec();
}

Note that in above example the filename is internally converted to UTF-16, that's done by Qt. WinAPI uses UTF-16, not UTF-8

注意,在上面的示例中,文件名在内部被转换为UTF-16,这是Qt. WinAPI使用UTF-16,而不是UTF-8。

#2


2  

Qt internally wraps int main(), extracting and parsing the Unicode command line arguments (via CommandLineToArgvW) before any of your code is executed. The resulting parsed data is converted to the local UTF-8 format as char **argv via the equivalent of QString::toLocal8Bit().

Qt内部封装int main(),在执行任何代码之前提取和解析Unicode命令行参数(通过CommandLineToArgvW)。所得到的解析数据被转换为本地UTF-8格式,通过等效于QString::toLocal8Bit()转换为char **argv。

Use QCoreApplication::arguments() to retrieve the Unicode args. Also, a helpful note from the docs:

使用QCoreApplication:: parameters()检索Unicode args。同时,来自医生的一个有用的提示:

On Windows, the list is built from the argc and argv parameters only if modified argv/argc parameters are passed to the constructor. In that case, encoding problems might occur.

在Windows上,只有将修改后的argv/argc参数传递给构造函数时,列表才由argc和argv参数构建。在这种情况下,可能会出现编码问题。