10 个解决方案
#1
我也很想知道
#2
foreach(unpack(
'n*',
mb_convert_encoding('你好', 'unicode', 'gbk')
) as $i) {
echo '\u',dechex($i);
}
另,你这里的“\u60a8\u597d”其实是unicode编码,而不是utf-8编码
'n*',
mb_convert_encoding('你好', 'unicode', 'gbk')
) as $i) {
echo '\u',dechex($i);
}
另,你这里的“\u60a8\u597d”其实是unicode编码,而不是utf-8编码
#3
谢谢这位兄弟,果真是高手啊,我公司的工作了7,8年的人都没搞定这个问题呵呵。
另外再请教几个问题
首先‘n*’是什么意思?
其次我需要将一篇文章都转换为utf-8编码,比如“你好 ,同学!”要生成 “\u4f60\u597d, \u540c\u5b66!”的样式。而你的程序把空格给去除掉了,标点符号也转换出来了,请问我该如何实现呢?可以先不考虑标点符号的问题,但空格不能去除!兄弟能否加我qq:70917176交流交流?
#4
参看php手册
pack() format characters Code Description
a NUL-padded string
A SPACE-padded string
h Hex string, low nibble first
H Hex string, high nibble first
c signed char
C unsigned char
s signed short (always 16 bit, machine byte order)
S unsigned short (always 16 bit, machine byte order)
n unsigned short (always 16 bit, big endian byte order)
v unsigned short (always 16 bit, little endian byte order)
i signed integer (machine dependent size and byte order)
I unsigned integer (machine dependent size and byte order)
l signed long (always 32 bit, machine byte order)
L unsigned long (always 32 bit, machine byte order)
N unsigned long (always 32 bit, big endian byte order)
V unsigned long (always 32 bit, little endian byte order)
f float (machine dependent size and representation)
d double (machine dependent size and representation)
x NUL byte
X Back up one byte
@ NUL-fill to absolute position
n*就是表示将字符串解析为一组以big endian字节顺序保存的16位的无符号整形
其次的问题,把要处理字符串挑出来处理后再拼接
pack() format characters Code Description
a NUL-padded string
A SPACE-padded string
h Hex string, low nibble first
H Hex string, high nibble first
c signed char
C unsigned char
s signed short (always 16 bit, machine byte order)
S unsigned short (always 16 bit, machine byte order)
n unsigned short (always 16 bit, big endian byte order)
v unsigned short (always 16 bit, little endian byte order)
i signed integer (machine dependent size and byte order)
I unsigned integer (machine dependent size and byte order)
l signed long (always 32 bit, machine byte order)
L unsigned long (always 32 bit, machine byte order)
N unsigned long (always 32 bit, big endian byte order)
V unsigned long (always 32 bit, little endian byte order)
f float (machine dependent size and representation)
d double (machine dependent size and representation)
x NUL byte
X Back up one byte
@ NUL-fill to absolute position
n*就是表示将字符串解析为一组以big endian字节顺序保存的16位的无符号整形
其次的问题,把要处理字符串挑出来处理后再拼接
#5
第一个问题我明白了,谢谢赐教,第二个问题没明白,unpack返回的数据就是已经把空格都去掉了,怎么让它不去掉呢?标点符号问题先不考虑
#6
也可以根据utf-8 unicode的转换方法
中文即是
1110xxx 10xxxxxx 10xxxxxx
把utf-8中文的前三个字符的1110,10,10去掉即是unicode码。
}
中文即是
1110xxx 10xxxxxx 10xxxxxx
把utf-8中文的前三个字符的1110,10,10去掉即是unicode码。
echo preg_replace('#[\x{4e00}-\x{9fa5}]#ue','chinese_unicode("\\0")',"您好,中国");//保证"您好,中国"是utf-8。
function chinese_unicode($c) {
return "\u".dechex(((ord($c[0]) & 0x1f) << 12) + (ord($c[1]) & 0x3f << 6) + (ord($c[2]) & 0x3f));
}
#7
echo preg_replace('#[\x{4e00}-\x{9fa5}]#ue','chinese_unicode("\\0")',"您好,中国");//保证"您好,中国"是utf-8。
function chinese_unicode($c) {
return "\u".dechex(((ord($c[0]) & 0x1f) << 12) + (ord($c[1]) & 0x3f << 6) + (ord($c[2]) & 0x3f));
}
#8
没明白,啥叫吧空格去掉了?
foreach(unpack(
'n*',
mb_convert_encoding(' 你 好 ', 'unicode', 'utf-8')
) as $i) {
echo '\u',dechex($i);
}
输出 \u20\u4f60\u20\u597d\u20
空格就是\u20,这里没用0补齐,其实应该是\u0020
你用printf('\\u%04X', $i);就行了呗
#9
ok,问题搞定了!
#10
感谢2位高手的帮助!!!!结贴!!
#1
我也很想知道
#2
foreach(unpack(
'n*',
mb_convert_encoding('你好', 'unicode', 'gbk')
) as $i) {
echo '\u',dechex($i);
}
另,你这里的“\u60a8\u597d”其实是unicode编码,而不是utf-8编码
'n*',
mb_convert_encoding('你好', 'unicode', 'gbk')
) as $i) {
echo '\u',dechex($i);
}
另,你这里的“\u60a8\u597d”其实是unicode编码,而不是utf-8编码
#3
谢谢这位兄弟,果真是高手啊,我公司的工作了7,8年的人都没搞定这个问题呵呵。
另外再请教几个问题
首先‘n*’是什么意思?
其次我需要将一篇文章都转换为utf-8编码,比如“你好 ,同学!”要生成 “\u4f60\u597d, \u540c\u5b66!”的样式。而你的程序把空格给去除掉了,标点符号也转换出来了,请问我该如何实现呢?可以先不考虑标点符号的问题,但空格不能去除!兄弟能否加我qq:70917176交流交流?
#4
参看php手册
pack() format characters Code Description
a NUL-padded string
A SPACE-padded string
h Hex string, low nibble first
H Hex string, high nibble first
c signed char
C unsigned char
s signed short (always 16 bit, machine byte order)
S unsigned short (always 16 bit, machine byte order)
n unsigned short (always 16 bit, big endian byte order)
v unsigned short (always 16 bit, little endian byte order)
i signed integer (machine dependent size and byte order)
I unsigned integer (machine dependent size and byte order)
l signed long (always 32 bit, machine byte order)
L unsigned long (always 32 bit, machine byte order)
N unsigned long (always 32 bit, big endian byte order)
V unsigned long (always 32 bit, little endian byte order)
f float (machine dependent size and representation)
d double (machine dependent size and representation)
x NUL byte
X Back up one byte
@ NUL-fill to absolute position
n*就是表示将字符串解析为一组以big endian字节顺序保存的16位的无符号整形
其次的问题,把要处理字符串挑出来处理后再拼接
pack() format characters Code Description
a NUL-padded string
A SPACE-padded string
h Hex string, low nibble first
H Hex string, high nibble first
c signed char
C unsigned char
s signed short (always 16 bit, machine byte order)
S unsigned short (always 16 bit, machine byte order)
n unsigned short (always 16 bit, big endian byte order)
v unsigned short (always 16 bit, little endian byte order)
i signed integer (machine dependent size and byte order)
I unsigned integer (machine dependent size and byte order)
l signed long (always 32 bit, machine byte order)
L unsigned long (always 32 bit, machine byte order)
N unsigned long (always 32 bit, big endian byte order)
V unsigned long (always 32 bit, little endian byte order)
f float (machine dependent size and representation)
d double (machine dependent size and representation)
x NUL byte
X Back up one byte
@ NUL-fill to absolute position
n*就是表示将字符串解析为一组以big endian字节顺序保存的16位的无符号整形
其次的问题,把要处理字符串挑出来处理后再拼接
#5
第一个问题我明白了,谢谢赐教,第二个问题没明白,unpack返回的数据就是已经把空格都去掉了,怎么让它不去掉呢?标点符号问题先不考虑
#6
也可以根据utf-8 unicode的转换方法
中文即是
1110xxx 10xxxxxx 10xxxxxx
把utf-8中文的前三个字符的1110,10,10去掉即是unicode码。
}
中文即是
1110xxx 10xxxxxx 10xxxxxx
把utf-8中文的前三个字符的1110,10,10去掉即是unicode码。
echo preg_replace('#[\x{4e00}-\x{9fa5}]#ue','chinese_unicode("\\0")',"您好,中国");//保证"您好,中国"是utf-8。
function chinese_unicode($c) {
return "\u".dechex(((ord($c[0]) & 0x1f) << 12) + (ord($c[1]) & 0x3f << 6) + (ord($c[2]) & 0x3f));
}
#7
echo preg_replace('#[\x{4e00}-\x{9fa5}]#ue','chinese_unicode("\\0")',"您好,中国");//保证"您好,中国"是utf-8。
function chinese_unicode($c) {
return "\u".dechex(((ord($c[0]) & 0x1f) << 12) + (ord($c[1]) & 0x3f << 6) + (ord($c[2]) & 0x3f));
}
#8
没明白,啥叫吧空格去掉了?
foreach(unpack(
'n*',
mb_convert_encoding(' 你 好 ', 'unicode', 'utf-8')
) as $i) {
echo '\u',dechex($i);
}
输出 \u20\u4f60\u20\u597d\u20
空格就是\u20,这里没用0补齐,其实应该是\u0020
你用printf('\\u%04X', $i);就行了呗
#9
ok,问题搞定了!
#10
感谢2位高手的帮助!!!!结贴!!