如何将QString缩短，当转换为UTF-8时，QString的长度小于某一长度?

I am trying to create an efficient algorithm, for shortening QString, so when converted to UTF-8 it would be shorter than defined length and still be valid UTF-8.

我正在尝试创建一个有效的算法，用于缩短QString，所以当转换到UTF-8时，它将比定义的长度短，并且仍然是有效的UTF-8。

Input
- QString text - string with all possible characters - no maximal length specified
- QString文本-带有所有可能字符的字符串-没有指定最大长度
- int limit - the maximal length of the output encoded in utf-8
- int极限——用utf-8编码的输出的最大长度
输入QString文本-带有所有可能字符的字符串-没有指定的最大长度int限制- utf-8编码的输出的最大长度
Output
- QByteArray output - the original text in utf-8 shorter than limit.
- QByteArray输出- utf-8中的原始文本比限制短。
输出QByteArray输出-原始文本中的utf-8比限制短。
example1:
- text = "How are you?"
- 你好吗?
- limit = 5
- 限制= 5
- output = "How a"
- 输出= "如何"
例1:text = "How are you?" limit = 5 output = "How a"
example2:
- text = "Как дела?"
- 文本= "Какдела?”
- limit = 5
- 限制= 5
- output = "Ка"
  - d0 9a d0 b0 - including "к" would be already over the limit and including d0 would result in not valid utf-8 string.
  - 9 d0 d0 b0——包括“к”将已经超出限度,包括d0会导致无效的utf - 8编码的字符串。
- 9输出= "Ка" d0 d0 b0——包括“к”已经超过了限制,包括d0会导致无效的utf - 8编码的字符串。
example2:文本= "Какдела?”限制输出= 5 =“Ка”d0 9 d0 b0——包括“к”已经超过了限制,包括d0会导致无效的utf - 8编码的字符串。

First we started with the following code, but that may cut the UTF-8 character in the middle, which is not acceptable:

首先我们从下面的代码开始，但是这可能会在中间减少UTF-8字符，这是不可接受的:

QByteArray output = text.toUtf8().left(limit);

Than we developed correct algorithm which works, but is quite ugly and not efficient:

比我们开发的正确算法有效，但相当难看和低效:

QString tmp = input;
while (tmp.toUtf8().size() > limit)
    tmp.chop(1);
QByteArray output = tmp.toUtf8();

Is there a better method, how to do this?
- If yes, please share the code?
- 如果是，请分享代码。
- If not, why?
- 如果不是,为什么?
有更好的方法吗?如果是，请分享代码?如果不是,为什么?

1 个解决方案

#1

The following approach should be optimal unless you want to write your own UTF-8 conversion routine. It relies on the fact that continuation bytes in UTF-8 sequences are in the range 0x80-0xBF. Going backward from the limit, it tries to find the first starting byte at which the string can be split safely.

以下方法应该是最优的，除非您想编写自己的UTF-8转换例程。它依赖于UTF-8序列中的连续字节在0x80-0xBF范围内。从极限返回后，它试图找到第一个开始的字节，在这个字节中，字符串可以安全地分开。

QByteArray output = tmp.toUtf8();
if (output.size() > limit) {
    int truncateAt = 0;
    for (int i = limit; i > 0; i--) {
        if ((output[i] & 0xC0) != 0x80) {
            truncateAt = i;
            break;
        }
    }
    output.truncate(truncateAt);
}

Since UTF-8 byte sequences aren't longer than 4 bytes, it shouldn't take more than 4 loop iterations to find the correct position.

由于UTF-8字节序列不超过4字节，所以不应该超过4次循环迭代才能找到正确的位置。

#1

QByteArray output = tmp.toUtf8();
if (output.size() > limit) {
    int truncateAt = 0;
    for (int i = limit; i > 0; i--) {
        if ((output[i] & 0xC0) != 0x80) {
            truncateAt = i;
            break;
        }
    }
    output.truncate(truncateAt);
}

Since UTF-8 byte sequences aren't longer than 4 bytes, it shouldn't take more than 4 loop iterations to find the correct position.

由于UTF-8字节序列不超过4字节，所以不应该超过4次循环迭代才能找到正确的位置。

秒客网

如何将QString缩短，当转换为UTF-8时，QString的长度小于某一长度?

1 个解决方案

#1

#1

相关文章