文件名称:Universal Character Set Detector
文件大小:315KB
文件格式:7Z
更新时间:2017-09-11 10:44:07
代码页 检测
# Universal Character Set Detector (UCSD) 从Mozilla和siuying的代码中扒拉出来的文本编码自动检测模块,主要是基于字频判断,检测ANSI编码的CJK系还算比较准确,但是对于没有BOM的UTF16数据流效果很差 Code is from [siuying/UniversalDetector][1] and [Mozilla][2] Thanks ## Known character sets The list of possible character sets that can be returned from the library as of the most recent update are: Big5 EUC-JP EUC-KR GB18030 gb18030 HZ-GB-2312 IBM855 IBM866 ISO-2022-CN ISO-2022-JP ISO-2022-KR ISO-8859-2 ISO-8859-5 ISO-8859-7 ISO-8859-8 KOI8-R Shift_JIS TIS-620 UTF-8 UTF-16BE UTF-16LE UTF-32BE UTF-32LE windows-1250 windows-1251 windows-1252 windows-1253 windows-1255 x-euc-tw X-ISO-10646-UCS-4-2143 X-ISO-10646-UCS-4-3412 x-mac-cyrillic ## Licensing Depend on Mozilla UCSD, Maybe [MPL2.0][3] [1]: https://github.com/siuying/UniversalDetector [2]: http://www-archive.mozilla.org/projects/intl/detectorsrc.html [3]: http://mozilla.org/MPL/2.0/