How can i convert
我怎么能转换
\xF0\x9F\x98\x83
to
\u1F603
in php?
PS: it's a Emoji -> ????, i need Unicode to use Twemoji.
PS:它是一个表情符号 - >????,我需要Unicode才能使用Twemoji。
2 个解决方案
#1
Interesting, not much is out there for PHP. There seems to be a promising post, but unfortunately the accepted answer gives incorrect results in Your case.
有趣的是,对于PHP来说并不多。似乎有一个很有前途的帖子,但不幸的是,接受的答案在你的案例中给出了不正确的结果。
So here's a revised version of Adam's solution rewritten in PHP.
所以这是用PHP重写的Adam解决方案的修订版。
/**
* Translates a sequence of UTF-8 bytes to their equivalent unicode code points.
* Each code point is prefixed with "\u".
*
* @param string $utf8
*
* @return string
*/
function utf8_to_unicode($utf8) {
$i = 0;
$l = strlen($utf8);
$out = '';
while ($i < $l) {
if ((ord($utf8[$i]) & 0x80) === 0x00) {
// 0xxxxxxx
$n = ord($utf8[$i++]);
} elseif ((ord($utf8[$i]) & 0xE0) === 0xC0) {
// 110xxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x1F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} elseif ((ord($utf8[$i]) & 0xF0) === 0xE0) {
// 1110xxxx 10xxxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x0F) << 12) |
((ord($utf8[$i++]) & 0x3F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} elseif ((ord($utf8[$i]) & 0xF8) === 0xF0) {
// 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x07) << 18) |
((ord($utf8[$i++]) & 0x3F) << 12) |
((ord($utf8[$i++]) & 0x3F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} elseif ((ord($utf8[$i]) & 0xFC) === 0xF8) {
// 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x03) << 24) |
((ord($utf8[$i++]) & 0x3F) << 18) |
((ord($utf8[$i++]) & 0x3F) << 12) |
((ord($utf8[$i++]) & 0x3F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} elseif ((ord($utf8[$i]) & 0xFE) === 0xFC) {
// 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x01) << 30) |
((ord($utf8[$i++]) & 0x3F) << 24) |
((ord($utf8[$i++]) & 0x3F) << 18) |
((ord($utf8[$i++]) & 0x3F) << 12) |
((ord($utf8[$i++]) & 0x3F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} else {
throw new \Exception('Invalid utf-8 code point');
}
$n = strtoupper(dechex($n));
$pad = strlen($n) <= 4 ? strlen($n) + strlen($n) %2 : 0;
$n = str_pad($n, $pad, "0", STR_PAD_LEFT);
$out .= sprintf("\u%s", $n);
}
return $out;
}
And in your case
在你的情况下
php > var_dump(utf8_to_unicode("\xF0\x9F\x98\x83"));
string(7) "\u1F603"
#2
Use a combination of:
使用以下组合:
-
stripcslashes()
to convert\xFF
byte escapes.
That'll result in a string of UTF-8, because that's what it seemingly was originally.stripcslashes()转换\ xFF字节转义。这将导致一串UTF-8,因为这是它原本看来的样子。
-
json_encode()
to convert "????" back to an\uFFFF
Unicode escape.
If that's what you want to end up with. (Not enough context in your question to tell.)json_encode()将“????”转换回\ uFFFF Unicode转义。如果这就是你想要的结果。 (在你的问题中没有足够的背景来讲述。)
#1
Interesting, not much is out there for PHP. There seems to be a promising post, but unfortunately the accepted answer gives incorrect results in Your case.
有趣的是,对于PHP来说并不多。似乎有一个很有前途的帖子,但不幸的是,接受的答案在你的案例中给出了不正确的结果。
So here's a revised version of Adam's solution rewritten in PHP.
所以这是用PHP重写的Adam解决方案的修订版。
/**
* Translates a sequence of UTF-8 bytes to their equivalent unicode code points.
* Each code point is prefixed with "\u".
*
* @param string $utf8
*
* @return string
*/
function utf8_to_unicode($utf8) {
$i = 0;
$l = strlen($utf8);
$out = '';
while ($i < $l) {
if ((ord($utf8[$i]) & 0x80) === 0x00) {
// 0xxxxxxx
$n = ord($utf8[$i++]);
} elseif ((ord($utf8[$i]) & 0xE0) === 0xC0) {
// 110xxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x1F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} elseif ((ord($utf8[$i]) & 0xF0) === 0xE0) {
// 1110xxxx 10xxxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x0F) << 12) |
((ord($utf8[$i++]) & 0x3F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} elseif ((ord($utf8[$i]) & 0xF8) === 0xF0) {
// 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x07) << 18) |
((ord($utf8[$i++]) & 0x3F) << 12) |
((ord($utf8[$i++]) & 0x3F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} elseif ((ord($utf8[$i]) & 0xFC) === 0xF8) {
// 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x03) << 24) |
((ord($utf8[$i++]) & 0x3F) << 18) |
((ord($utf8[$i++]) & 0x3F) << 12) |
((ord($utf8[$i++]) & 0x3F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} elseif ((ord($utf8[$i]) & 0xFE) === 0xFC) {
// 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
$n =
((ord($utf8[$i++]) & 0x01) << 30) |
((ord($utf8[$i++]) & 0x3F) << 24) |
((ord($utf8[$i++]) & 0x3F) << 18) |
((ord($utf8[$i++]) & 0x3F) << 12) |
((ord($utf8[$i++]) & 0x3F) << 6) |
((ord($utf8[$i++]) & 0x3F) << 0)
;
} else {
throw new \Exception('Invalid utf-8 code point');
}
$n = strtoupper(dechex($n));
$pad = strlen($n) <= 4 ? strlen($n) + strlen($n) %2 : 0;
$n = str_pad($n, $pad, "0", STR_PAD_LEFT);
$out .= sprintf("\u%s", $n);
}
return $out;
}
And in your case
在你的情况下
php > var_dump(utf8_to_unicode("\xF0\x9F\x98\x83"));
string(7) "\u1F603"
#2
Use a combination of:
使用以下组合:
-
stripcslashes()
to convert\xFF
byte escapes.
That'll result in a string of UTF-8, because that's what it seemingly was originally.stripcslashes()转换\ xFF字节转义。这将导致一串UTF-8,因为这是它原本看来的样子。
-
json_encode()
to convert "????" back to an\uFFFF
Unicode escape.
If that's what you want to end up with. (Not enough context in your question to tell.)json_encode()将“????”转换回\ uFFFF Unicode转义。如果这就是你想要的结果。 (在你的问题中没有足够的背景来讲述。)