前言
本文主要给大家介绍了关于python3中全角和半角字符转换的相关内容,分享出来供大家参考学习,下面话不多说了,来一起看看详细的介绍吧。
一、背景介绍
解决什么问题:快速方便的对文本进行全角半角自动转换
适用什么场景:学生答题数据中全角字符替换为半角字符
二、全角半角原理
全角即:Double Byte Character,简称DBC
半角即:Single Byte Character,简称SBC
在 windows 中,中文和全角字符都占两个字节,并且使用了 ascii chart 2 (codes 128–255);
全角字符的第一个字节总是被置为 163,而第二个字节则是相同半角字符码加上128(不包括空格,全角空格和半角空格也要考虑进去);
对于中文来说,它的第一个字节被置为大于163,如'阿'为:176 162,检测到中文时不进行转换。
例如:半角 a 为 65,则全角 a 是 163(第一个字节)、193(第二个字节,128+65)。
全角半角示例:(文本 test.txt 包含全角和半角字符)
1
2
3
4
5
6
|
F:\test>type test.txt
123456
123456
abcdefg
abcdefg
中国你好
|
三、使用 Python3 实现全角半角转换
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
|
# -*- coding:utf-8 -*-
# i@mail.chenpeng.info
”'
全角即:Double Byte Character,简称:DBC
半角即:Single Byte Character,简称:SBC
”'
def DBC2SBC(ustring):
” ' 全角转半角 ”'
rstring = “”
for uchar in ustring:
inside_code = ord (uchar)
if inside_code = = 0x3000 :
inside_code = 0x0020
else :
inside_code - = 0xfee0
if not ( 0x0021 < = inside_code and inside_code < = 0x7e ):
rstring + = uchar
continue
rstring + = chr (inside_code)
return rstring
def SBC2DBC(ustring):
” ' 半角转全角 ”'
rstring = “”
for uchar in ustring:
inside_code = ord (uchar)
if inside_code = = 0x0020 :
inside_code = 0x3000
else :
if not ( 0x0021 < = inside_code and inside_code < = 0x7e ):
rstring + = uchar
continue
inside_code + = 0xfee0
rstring + = chr (inside_code)
return rstring
s = ”'
array(‘0 ' => ‘0' , ‘1 ' => ‘1' , ‘2 ' => ‘2' , ‘3 ' => ‘3' , ‘4 ' => ‘4' ,
‘5 ' => ‘5' , ‘6 ' => ‘6' , ‘7 ' => ‘7' , ‘8 ' => ‘8' , ‘9 ' => ‘9' ,
‘A ' => ‘A' , ‘B ' => ‘B' , ‘C ' => ‘C' , ‘D ' => ‘D' , ‘E ' => ‘E' ,
‘F ' => ‘F' , ‘G ' => ‘G' , ‘H ' => ‘H' , ‘I ' => ‘I' , ‘J ' => ‘J' ,
‘K ' => ‘K' , ‘L ' => ‘L' , ‘M ' => ‘M' , ‘N ' => ‘N' , ‘O ' => ‘O' ,
‘P ' => ‘P' , ‘Q ' => ‘Q' , ‘R ' => ‘R' , ‘S ' => ‘S' , ‘T ' => ‘T' ,
‘U ' => ‘U' , ‘V ' => ‘V' , ‘W ' => ‘W' , ‘X ' => ‘X' , ‘Y ' => ‘Y' ,
‘Z ' => ‘Z' , ‘a ' => ‘a' , ‘b ' => ‘b' , ‘c ' => ‘c' , ‘d ' => ‘d' ,
‘e ' => ‘e' , ‘f ' => ‘f' , ‘g ' => ‘g' , ‘h ' => ‘h' , ‘i ' => ‘i' ,
‘j ' => ‘j' , ‘k ' => ‘k' , ‘l ' => ‘l' , ‘m ' => ‘m' , ‘n ' => ‘n' ,
‘o ' => ‘o' , ‘p ' => ‘p' , ‘q ' => ‘q' , ‘r ' => ‘r' , ‘s ' => ‘s' ,
‘t ' => ‘t' , ‘u ' => ‘u' , ‘v ' => ‘v' , ‘w ' => ‘w' , ‘x ' => ‘x' ,
‘y ' => ‘y' , ‘z ' => ‘z' ,
‘( ' => ‘(‘, ‘)' = > ‘) ', ‘〔' = > ‘[‘, ‘〕 ' => ‘]' , ‘【' = > ‘[‘,
‘】 ' => ‘]' , ‘〖 ' => ‘[‘, ‘〗' = > ‘] ', ‘”‘ => ‘[‘, ‘”‘ => ‘]' ,
‘\” = > ‘[‘, ‘\” = > ‘] ', ‘{' = > ‘{‘, ‘} ' => ‘}' , ‘《' = > ‘<‘,
‘》 ' => ‘>' ,
‘% ' => ‘%' , ‘+ ' => ‘+' , ‘— ' => ‘-‘, ‘-' = > ‘ - ‘, ‘~' = > ‘ - ‘,
‘: ' => ‘:' , ‘。 ' => ‘.' , ‘、 ' => ‘,' , ‘, ' => ‘.' , ‘、 ' => ‘.' ,
‘; ' => ‘,' , ‘? ' => ‘?' , ‘! ' => ‘!' , ‘… ' => ‘-‘, ‘‖' = > ‘|',
‘”‘ = > ‘”‘, ‘\” = > ‘` ', ‘\” => ‘`' , ‘| ' => ‘|' , ‘〃' = > ‘”‘,
‘ ' = > ‘ ‘);
”'
# 全角转半角
print (DBC2SBC(s))
# 半角转全角
print (SBC2DBC(s))
s = ” '中文测试”'
# 全角转半角
print (DBC2SBC(s))
# 半角转全角
print (SBC2DBC(s))
|
四、总结
以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流,谢谢大家对服务器之家的支持。
五、参考资料
http://thinkerou.com/2015-06/covert-dbc-sbc/
原文链接:http://chenpeng.info/html/3795