Linux下实用的文件编码转换命令工具iconv

时间:2022-04-06 08:39:18

Linux下实用的文件编码转换命令工具iconv

 

一、iconv命令的手册如下:

wzb@zeus:$ man iconv | cat
ICONV(1)                       Debian GNU/Linux                       ICONV(1)



NAME
       iconv - Convert encoding of given files from one encoding to another

SYNOPSIS
       iconv -f encoding [-t encoding] [inputfile]...

DESCRIPTION
       The iconv program converts the encoding of characters in inputfile, or
       from the standard input if no filename is specified, from one coded
       character set to another. The result is written to standard output
       unless otherwise specified by the --output option.

       --from-code, -f encoding
              Convert characters from encoding.

       --to-code, -t encoding
              Convert characters to encoding. If not specified the encoding
              corresponding to the current locale is used.

       --list, -l
              List known coded character sets.

       -c     Omit invalid characters from output.

       --output, -o file
              Specify output file (instead of stdout).

       --silent, -s
              Suppress warnings, but not errors.

       --verbose
              Print progress information.

       --help, -?
              Give help list.

       --usage
              Give a short usage message.

       --version, -V
              Print program version.

ENCODINGS
       The values permitted for --from-code and --to-code can be listed by the
       iconv --list command, and all combinations of the listed values are
       supported. Furthermore the following two suffixes are supported:

       //TRANSLIT
              When the string "//TRANSLIT" is appended to --to-code,
              transliteration is activated.  This means that when a character
              cannot be represented in the target character set, it can be
              approximated through one or several similarly looking
              characters.

       //IGNORE
              When the string "//IGNORE" is appended to --to-code, characters
              that cannot be represented in the target character set will be
              silently discarded.


AUTHOR
       iconv was written by Ulrich Drepper as part of the GNU C Library.

       This man page was written by Joel Klecker <espy@debian.org>, for the
       Debian GNU/Linux system.










3rd Berkeley Distribution            lenny                            ICONV(1)


二、iconv支持的文件格式:

wzb@zeus:~$ iconv -l
The following list contain all the coded character sets known.  This does
not necessarily mean that all combinations of these names can be used for
the FROM and TO command line parameters.  One coded character set can be
listed with several different names (aliases).

  437, 500, 500V1, 850, 851, 852, 855, 856, 857, 860, 861, 862, 863, 864, 865,
  866, 866NAV, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3, 8859_4,
  8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1:1993, 10646-1:1993/UCS4,
  ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110,
  ARABIC, ARABIC7, ARMSCII-8, ASCII, ASMO-708, ASMO_449, BALTIC, BIG-5,
  BIG-FIVE, BIG5-HKSCS, BIG5, BIG5HKSCS, BIGFIVE, BRF, BS_4730, CA, CN-BIG5,
  CN-GB, CN, CP-AR, CP-GR, CP-HU, CP037, CP038, CP273, CP274, CP275, CP278,
  CP280, CP281, CP282, CP284, CP285, CP290, CP297, CP367, CP420, CP423, CP424,
  CP437, CP500, CP737, CP775, CP803, CP813, CP819, CP850, CP851, CP852, CP855,
  CP856, CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP866NAV,
  CP868, CP869, CP870, CP871, CP874, CP875, CP880, CP891, CP901, CP902, CP903,
  CP904, CP905, CP912, CP915, CP916, CP918, CP920, CP921, CP922, CP930, CP932,
  CP933, CP935, CP936, CP937, CP939, CP949, CP950, CP1004, CP1008, CP1025,
  CP1026, CP1046, CP1047, CP1070, CP1079, CP1081, CP1084, CP1089, CP1097,
  CP1112, CP1122, CP1123, CP1124, CP1125, CP1129, CP1130, CP1132, CP1133,
  CP1137, CP1140, CP1141, CP1142, CP1143, CP1144, CP1145, CP1146, CP1147,
  CP1148, CP1149, CP1153, CP1154, CP1155, CP1156, CP1157, CP1158, CP1160,
  CP1161, CP1162, CP1163, CP1164, CP1166, CP1167, CP1250, CP1251, CP1252,
  CP1253, CP1254, CP1255, CP1256, CP1257, CP1258, CP1282, CP1361, CP1364,
  CP1371, CP1388, CP1390, CP1399, CP4517, CP4899, CP4909, CP4971, CP5347,
  CP9030, CP9066, CP9448, CP10007, CP12712, CP16804, CPIBM861, CSA7-1, CSA7-2,
  CSASCII, CSA_T500-1983, CSA_T500, CSA_Z243.4-1985-1, CSA_Z243.4-1985-2,
  CSA_Z243.419851, CSA_Z243.419852, CSDECMCS, CSEBCDICATDE, CSEBCDICATDEA,
  CSEBCDICCAFR, CSEBCDICDKNO, CSEBCDICDKNOA, CSEBCDICES, CSEBCDICESA,
  CSEBCDICESS, CSEBCDICFISE, CSEBCDICFISEA, CSEBCDICFR, CSEBCDICIT, CSEBCDICPT,
  CSEBCDICUK, CSEBCDICUS, CSEUCKR, CSEUCPKDFMTJAPANESE, CSGB2312, CSHPROMAN8,
  CSIBM037, CSIBM038, CSIBM273, CSIBM274, CSIBM275, CSIBM277, CSIBM278,
  CSIBM280, CSIBM281, CSIBM284, CSIBM285, CSIBM290, CSIBM297, CSIBM420,
  CSIBM423, CSIBM424, CSIBM500, CSIBM803, CSIBM851, CSIBM855, CSIBM856,
  CSIBM857, CSIBM860, CSIBM863, CSIBM864, CSIBM865, CSIBM866, CSIBM868,
  CSIBM869, CSIBM870, CSIBM871, CSIBM880, CSIBM891, CSIBM901, CSIBM902,
  CSIBM903, CSIBM904, CSIBM905, CSIBM918, CSIBM921, CSIBM922, CSIBM930,
  CSIBM932, CSIBM933, CSIBM935, CSIBM937, CSIBM939, CSIBM943, CSIBM1008,
  CSIBM1025, CSIBM1026, CSIBM1097, CSIBM1112, CSIBM1122, CSIBM1123, CSIBM1124,
  CSIBM1129, CSIBM1130, CSIBM1132, CSIBM1133, CSIBM1137, CSIBM1140, CSIBM1141,
  CSIBM1142, CSIBM1143, CSIBM1144, CSIBM1145, CSIBM1146, CSIBM1147, CSIBM1148,
  CSIBM1149, CSIBM1153, CSIBM1154, CSIBM1155, CSIBM1156, CSIBM1157, CSIBM1158,
  CSIBM1160, CSIBM1161, CSIBM1163, CSIBM1164, CSIBM1166, CSIBM1167, CSIBM1364,
  CSIBM1371, CSIBM1388, CSIBM1390, CSIBM1399, CSIBM4517, CSIBM4899, CSIBM4909,
  CSIBM4971, CSIBM5347, CSIBM9030, CSIBM9066, CSIBM9448, CSIBM12712,
  CSIBM16804, CSIBM11621162, CSISO4UNITEDKINGDOM, CSISO10SWEDISH,
  CSISO11SWEDISHFORNAMES, CSISO14JISC6220RO, CSISO15ITALIAN, CSISO16PORTUGESE,
  CSISO17SPANISH, CSISO18GREEK7OLD, CSISO19LATINGREEK, CSISO21GERMAN,
  CSISO25FRENCH, CSISO27LATINGREEK1, CSISO49INIS, CSISO50INIS8,
  CSISO51INISCYRILLIC, CSISO58GB1988, CSISO60DANISHNORWEGIAN,
  CSISO60NORWEGIAN1, CSISO61NORWEGIAN2, CSISO69FRENCH, CSISO84PORTUGUESE2,
  CSISO85SPANISH2, CSISO86HUNGARIAN, CSISO88GREEK7, CSISO89ASMO449, CSISO90,
  CSISO92JISC62991984B, CSISO99NAPLPS, CSISO103T618BIT, CSISO111ECMACYRILLIC,
  CSISO121CANADIAN1, CSISO122CANADIAN2, CSISO139CSN369103, CSISO141JUSIB1002,
  CSISO143IECP271, CSISO150, CSISO150GREEKCCITT, CSISO151CUBA,
  CSISO153GOST1976874, CSISO646DANISH, CSISO2022CN, CSISO2022JP, CSISO2022JP2,
  CSISO2022KR, CSISO2033, CSISO5427CYRILLIC, CSISO5427CYRILLIC1981,
  CSISO5428GREEK, CSISO10367BOX, CSISOLATIN1, CSISOLATIN2, CSISOLATIN3,
  CSISOLATIN4, CSISOLATIN5, CSISOLATIN6, CSISOLATINARABIC, CSISOLATINCYRILLIC,
  CSISOLATINGREEK, CSISOLATINHEBREW, CSKOI8R, CSKSC5636, CSMACINTOSH,
  CSNATSDANO, CSNATSSEFI, CSN_369103, CSPC8CODEPAGE437, CSPC775BALTIC,
  CSPC850MULTILINGUAL, CSPC862LATINHEBREW, CSPCP852, CSSHIFTJIS, CSUCS4,
  CSUNICODE, CSWINDOWS31J, CUBA, CWI-2, CWI, CYRILLIC, DE, DEC-MCS, DEC,
  DECMCS, DIN_66003, DK, DS2089, DS_2089, E13B, EBCDIC-AT-DE-A, EBCDIC-AT-DE,
  EBCDIC-BE, EBCDIC-BR, EBCDIC-CA-FR, EBCDIC-CP-AR1, EBCDIC-CP-AR2,
  EBCDIC-CP-BE, EBCDIC-CP-CA, EBCDIC-CP-CH, EBCDIC-CP-DK, EBCDIC-CP-ES,
  EBCDIC-CP-FI, EBCDIC-CP-FR, EBCDIC-CP-GB, EBCDIC-CP-GR, EBCDIC-CP-HE,
  EBCDIC-CP-IS, EBCDIC-CP-IT, EBCDIC-CP-NL, EBCDIC-CP-NO, EBCDIC-CP-ROECE,
  EBCDIC-CP-SE, EBCDIC-CP-TR, EBCDIC-CP-US, EBCDIC-CP-WT, EBCDIC-CP-YU,
  EBCDIC-CYRILLIC, EBCDIC-DK-NO-A, EBCDIC-DK-NO, EBCDIC-ES-A, EBCDIC-ES-S,
  EBCDIC-ES, EBCDIC-FI-SE-A, EBCDIC-FI-SE, EBCDIC-FR, EBCDIC-GREEK, EBCDIC-INT,
  EBCDIC-INT1, EBCDIC-IS-FRISS, EBCDIC-IT, EBCDIC-JP-E, EBCDIC-JP-KANA,
  EBCDIC-PT, EBCDIC-UK, EBCDIC-US, EBCDICATDE, EBCDICATDEA, EBCDICCAFR,
  EBCDICDKNO, EBCDICDKNOA, EBCDICES, EBCDICESA, EBCDICESS, EBCDICFISE,
  EBCDICFISEA, EBCDICFR, EBCDICISFRISS, EBCDICIT, EBCDICPT, EBCDICUK, EBCDICUS,
  ECMA-114, ECMA-118, ECMA-128, ECMA-CYRILLIC, ECMACYRILLIC, ELOT_928, ES, ES2,
  EUC-CN, EUC-JISX0213, EUC-JP-MS, EUC-JP, EUC-KR, EUC-TW, EUCCN, EUCJP-MS,
  EUCJP-OPEN, EUCJP-WIN, EUCJP, EUCKR, EUCTW, FI, FR, GB, GB2312, GB13000,
  GB18030, GBK, GB_1988-80, GB_198880, GEORGIAN-ACADEMY, GEORGIAN-PS,
  GOST_19768-74, GOST_19768, GOST_1976874, GREEK-CCITT, GREEK, GREEK7-OLD,
  GREEK7, GREEK7OLD, GREEK8, GREEKCCITT, HEBREW, HP-GREEK8, HP-ROMAN8,
  HP-ROMAN9, HP-THAI8, HP-TURKISH8, HPGREEK8, HPROMAN8, HPROMAN9, HPTHAI8,
  HPTURKISH8, HU, IBM-803, IBM-856, IBM-901, IBM-902, IBM-921, IBM-922,
  IBM-930, IBM-932, IBM-933, IBM-935, IBM-937, IBM-939, IBM-943, IBM-1008,
  IBM-1025, IBM-1046, IBM-1047, IBM-1097, IBM-1112, IBM-1122, IBM-1123,
  IBM-1124, IBM-1129, IBM-1130, IBM-1132, IBM-1133, IBM-1137, IBM-1140,
  IBM-1141, IBM-1142, IBM-1143, IBM-1144, IBM-1145, IBM-1146, IBM-1147,
  IBM-1148, IBM-1149, IBM-1153, IBM-1154, IBM-1155, IBM-1156, IBM-1157,
  IBM-1158, IBM-1160, IBM-1161, IBM-1162, IBM-1163, IBM-1164, IBM-1166,
  IBM-1167, IBM-1364, IBM-1371, IBM-1388, IBM-1390, IBM-1399, IBM-4517,
  IBM-4899, IBM-4909, IBM-4971, IBM-5347, IBM-9030, IBM-9066, IBM-9448,
  IBM-12712, IBM-16804, IBM037, IBM038, IBM256, IBM273, IBM274, IBM275, IBM277,
  IBM278, IBM280, IBM281, IBM284, IBM285, IBM290, IBM297, IBM367, IBM420,
  IBM423, IBM424, IBM437, IBM500, IBM775, IBM803, IBM813, IBM819, IBM848,
  IBM850, IBM851, IBM852, IBM855, IBM856, IBM857, IBM860, IBM861, IBM862,
  IBM863, IBM864, IBM865, IBM866, IBM866NAV, IBM868, IBM869, IBM870, IBM871,
  IBM874, IBM875, IBM880, IBM891, IBM901, IBM902, IBM903, IBM904, IBM905,
  IBM912, IBM915, IBM916, IBM918, IBM920, IBM921, IBM922, IBM930, IBM932,
  IBM933, IBM935, IBM937, IBM939, IBM943, IBM1004, IBM1008, IBM1025, IBM1026,
  IBM1046, IBM1047, IBM1089, IBM1097, IBM1112, IBM1122, IBM1123, IBM1124,
  IBM1129, IBM1130, IBM1132, IBM1133, IBM1137, IBM1140, IBM1141, IBM1142,
  IBM1143, IBM1144, IBM1145, IBM1146, IBM1147, IBM1148, IBM1149, IBM1153,
  IBM1154, IBM1155, IBM1156, IBM1157, IBM1158, IBM1160, IBM1161, IBM1162,
  IBM1163, IBM1164, IBM1166, IBM1167, IBM1364, IBM1371, IBM1388, IBM1390,
  IBM1399, IBM4517, IBM4899, IBM4909, IBM4971, IBM5347, IBM9030, IBM9066,
  IBM9448, IBM12712, IBM16804, IEC_P27-1, IEC_P271, INIS-8, INIS-CYRILLIC,
  INIS, INIS8, INISCYRILLIC, ISIRI-3342, ISIRI3342, ISO-2022-CN-EXT,
  ISO-2022-CN, ISO-2022-JP-2, ISO-2022-JP-3, ISO-2022-JP, ISO-2022-KR,
  ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6,
  ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-9E, ISO-8859-10, ISO-8859-11,
  ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16, ISO-10646,
  ISO-10646/UCS2, ISO-10646/UCS4, ISO-10646/UTF-8, ISO-10646/UTF8, ISO-CELTIC,
  ISO-IR-4, ISO-IR-6, ISO-IR-8-1, ISO-IR-9-1, ISO-IR-10, ISO-IR-11, ISO-IR-14,
  ISO-IR-15, ISO-IR-16, ISO-IR-17, ISO-IR-18, ISO-IR-19, ISO-IR-21, ISO-IR-25,
  ISO-IR-27, ISO-IR-37, ISO-IR-49, ISO-IR-50, ISO-IR-51, ISO-IR-54, ISO-IR-55,
  ISO-IR-57, ISO-IR-60, ISO-IR-61, ISO-IR-69, ISO-IR-84, ISO-IR-85, ISO-IR-86,
  ISO-IR-88, ISO-IR-89, ISO-IR-90, ISO-IR-92, ISO-IR-98, ISO-IR-99, ISO-IR-100,
  ISO-IR-101, ISO-IR-103, ISO-IR-109, ISO-IR-110, ISO-IR-111, ISO-IR-121,
  ISO-IR-122, ISO-IR-126, ISO-IR-127, ISO-IR-138, ISO-IR-139, ISO-IR-141,
  ISO-IR-143, ISO-IR-144, ISO-IR-148, ISO-IR-150, ISO-IR-151, ISO-IR-153,
  ISO-IR-155, ISO-IR-156, ISO-IR-157, ISO-IR-166, ISO-IR-179, ISO-IR-193,
  ISO-IR-197, ISO-IR-199, ISO-IR-203, ISO-IR-209, ISO-IR-226, ISO/TR_11548-1,
  ISO646-CA, ISO646-CA2, ISO646-CN, ISO646-CU, ISO646-DE, ISO646-DK, ISO646-ES,
  ISO646-ES2, ISO646-FI, ISO646-FR, ISO646-FR1, ISO646-GB, ISO646-HU,
  ISO646-IT, ISO646-JP-OCR-B, ISO646-JP, ISO646-KR, ISO646-NO, ISO646-NO2,
  ISO646-PT, ISO646-PT2, ISO646-SE, ISO646-SE2, ISO646-US, ISO646-YU,
  ISO2022CN, ISO2022CNEXT, ISO2022JP, ISO2022JP2, ISO2022KR, ISO6937,
  ISO8859-1, ISO8859-2, ISO8859-3, ISO8859-4, ISO8859-5, ISO8859-6, ISO8859-7,
  ISO8859-8, ISO8859-9, ISO8859-9E, ISO8859-10, ISO8859-11, ISO8859-13,
  ISO8859-14, ISO8859-15, ISO8859-16, ISO11548-1, ISO88591, ISO88592, ISO88593,
  ISO88594, ISO88595, ISO88596, ISO88597, ISO88598, ISO88599, ISO88599E,
  ISO885910, ISO885911, ISO885913, ISO885914, ISO885915, ISO885916,
  ISO_646.IRV:1991, ISO_2033-1983, ISO_2033, ISO_5427-EXT, ISO_5427,
  ISO_5427:1981, ISO_5427EXT, ISO_5428, ISO_5428:1980, ISO_6937-2,
  ISO_6937-2:1983, ISO_6937, ISO_6937:1992, ISO_8859-1, ISO_8859-1:1987,
  ISO_8859-2, ISO_8859-2:1987, ISO_8859-3, ISO_8859-3:1988, ISO_8859-4,
  ISO_8859-4:1988, ISO_8859-5, ISO_8859-5:1988, ISO_8859-6, ISO_8859-6:1987,
  ISO_8859-7, ISO_8859-7:1987, ISO_8859-7:2003, ISO_8859-8, ISO_8859-8:1988,
  ISO_8859-9, ISO_8859-9:1989, ISO_8859-9E, ISO_8859-10, ISO_8859-10:1992,
  ISO_8859-14, ISO_8859-14:1998, ISO_8859-15, ISO_8859-15:1998, ISO_8859-16,
  ISO_8859-16:2001, ISO_9036, ISO_10367-BOX, ISO_10367BOX, ISO_11548-1,
  ISO_69372, IT, JIS_C6220-1969-RO, JIS_C6229-1984-B, JIS_C62201969RO,
  JIS_C62291984B, JOHAB, JP-OCR-B, JP, JS, JUS_I.B1.002, KOI-7, KOI-8, KOI8-R,
  KOI8-RU, KOI8-T, KOI8-U, KOI8, KOI8R, KOI8U, KSC5636, L1, L2, L3, L4, L5, L6,
  L7, L8, L10, LATIN-9, LATIN-GREEK-1, LATIN-GREEK, LATIN1, LATIN2, LATIN3,
  LATIN4, LATIN5, LATIN6, LATIN7, LATIN8, LATIN9, LATIN10, LATINGREEK,
  LATINGREEK1, MAC-CENTRALEUROPE, MAC-CYRILLIC, MAC-IS, MAC-SAMI, MAC-UK, MAC,
  MACCYRILLIC, MACINTOSH, MACIS, MACUK, MACUKRAINIAN, MIK, MS-ANSI, MS-ARAB,
  MS-CYRL, MS-EE, MS-GREEK, MS-HEBR, MS-MAC-CYRILLIC, MS-TURK, MS932, MS936,
  MSCP949, MSCP1361, MSMACCYRILLIC, MSZ_7795.3, MS_KANJI, NAPLPS, NATS-DANO,
  NATS-SEFI, NATSDANO, NATSSEFI, NC_NC0010, NC_NC00-10, NC_NC00-10:81,
  NF_Z_62-010, NF_Z_62-010_(1973), NF_Z_62-010_1973, NF_Z_62010,
  NF_Z_62010_1973, NO, NO2, NS_4551-1, NS_4551-2, NS_45511, NS_45512,
  OS2LATIN1, OSF00010001, OSF00010002, OSF00010003, OSF00010004, OSF00010005,
  OSF00010006, OSF00010007, OSF00010008, OSF00010009, OSF0001000A, OSF00010020,
  OSF00010100, OSF00010101, OSF00010102, OSF00010104, OSF00010105, OSF00010106,
  OSF00030010, OSF0004000A, OSF0005000A, OSF05010001, OSF100201A4, OSF100201A8,
  OSF100201B5, OSF100201F4, OSF100203B5, OSF1002011C, OSF1002011D, OSF1002035D,
  OSF1002035E, OSF1002035F, OSF1002036B, OSF1002037B, OSF10010001, OSF10010004,
  OSF10010006, OSF10020025, OSF10020111, OSF10020115, OSF10020116, OSF10020118,
  OSF10020122, OSF10020129, OSF10020352, OSF10020354, OSF10020357, OSF10020359,
  OSF10020360, OSF10020364, OSF10020365, OSF10020366, OSF10020367, OSF10020370,
  OSF10020387, OSF10020388, OSF10020396, OSF10020402, OSF10020417, PT, PT2,
  PT154, R8, R9, RK1048, ROMAN8, ROMAN9, RUSCII, SE, SE2, SEN_850200_B,
  SEN_850200_C, SHIFT-JIS, SHIFT_JIS, SHIFT_JISX0213, SJIS-OPEN, SJIS-WIN,
  SJIS, SS636127, STRK1048-2002, ST_SEV_358-88, T.61-8BIT, T.61, T.618BIT,
  TCVN-5712, TCVN, TCVN5712-1, TCVN5712-1:1993, THAI8, TIS-620, TIS620-0,
  TIS620.2529-1, TIS620.2533-0, TIS620, TS-5881, TSCII, TURKISH8, UCS-2,
  UCS-2BE, UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UCS2, UCS4, UHC, UJIS, UK,
  UNICODE, UNICODEBIG, UNICODELITTLE, US-ASCII, US, UTF-7, UTF-8, UTF-16,
  UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF7, UTF8, UTF16, UTF16BE,
  UTF16LE, UTF32, UTF32BE, UTF32LE, VISCII, WCHAR_T, WIN-SAMI-2, WINBALTRIM,
  WINDOWS-31J, WINDOWS-874, WINDOWS-936, WINDOWS-1250, WINDOWS-1251,
  WINDOWS-1252, WINDOWS-1253, WINDOWS-1254, WINDOWS-1255, WINDOWS-1256,
  WINDOWS-1257, WINDOWS-1258, WINSAMI2, WS2, YU
wzb@zeus:~$


三、iconv的用法:

wzb@zeus:~$ iconv -?
Usage: iconv [OPTION...] [FILE...]
Convert encoding of given files from one encoding to another.

 Input/Output format specification:
  -f, --from-code=NAME       encoding of original text
  -t, --to-code=NAME         encoding for output

 Information:
  -l, --list                 list all known coded character sets

 Output control:
  -c                         omit invalid characters from output
  -o, --output=FILE          output file
  -s, --silent               suppress warnings
      --verbose              print progress information

  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
wzb@zeus:~$

 

四、用iconv进行文件进行处理举例和对文件进行批处理shell脚本:

1. 在命令行, 转换文件filename内容由 GBK  到 UTF8保存到newfile:

   命令: iconv  -f gbk -t utf8  filename  -o  newfile

2. 使用shell脚本,批量转换文件内容由 GBK 到 UTF8:

for i in `find . *`

do

if [ -f "$i" ] then

  iconv -f gb2312 -t utf8 $i > "./converted/$i"

fi;

done

 

for i in `find . *`

do 

if [ -f "$i" ] then

  iconv -f gb2312 -t utf8 $i > "./converted/$i" 

fi;

done


五、iconv的编程接口相关

  头文件"inconv.h"。iconv命令可以将一种已知的字符集文件转换成另一种已知的字符集文件。

  它的作用是在多种国际编码格式之间进行文本内码的转换。

       

        iconv是一个计算机程序以及一套应用程序编程接口的名称。

  作为应用程序的iconv采用命令行界面,允许将某种特定编码的文件转换为另一种编码。

  iconv基于GPL公开源代码,是GNU项目的一部分。在各种UNIX操作系统下均可使用,

而在Windows系统,需要特殊的环境如cygwin或者GnuWin32等软件平台下方可使用。现在在SourceForge上也有运行于Windows系统的,需要同时安装gettext程序。

  目前版本为2.3.26,支持的内码包括:Unicode相关编码,如UTF-8、UTF-16等等,各国采用的ANSI编码,其中包括GB2312BIG5等中文编码方式。

  作为编程接口的iconv包括3个函数:

  (1). iconv_open 函数用于初始化用于转换的内部缓冲区,需要指明需要从何种编码方式转换到哪一种。

   (2). iconv 函数进行实际的转换,需要给出两个间接缓冲区指针和剩余字节数指针。该函数需要更新所有相关信息,因此将不可改写的指针传递给iconv是错误的。

    (3). iconv_close 函数释放 iconv_open函数的缓冲区。

 

六、其他支持:perl中 piconv: -- iconv(1), reinvented in perl。

  piconv是流模式的,处理超大文件比较方便. convmv是给文件名重命名的,windows和linux系统间切换后尤其有用。

wzb@zeus:~$ man piconv | cat
PICONV(1)              Perl Programmers Reference Guide              PICONV(1)



NAME
       piconv -- iconv(1), reinvented in perl

SYNOPSIS
         piconv [-f from_encoding] [-t to_encoding] [-s string] [files...]
         piconv -l
         piconv [-C N|-c|-p]
         piconv -S scheme ...
         piconv -r encoding
         piconv -D ...
         piconv -h

DESCRIPTION
       piconv is perl version of iconv, a character encoding converter widely
       available for various Unixen today.  This script was primarily a
       technology demonstrator for Perl 5.8.0, but you can use piconv in the
       place of iconv for virtually any case.

       piconv converts the character encoding of either STDIN or files
       specified in the argument and prints out to STDOUT.

       Here is the list of options.  Each option can be in short format (-f)
       or long (--from).

       -f,--from from_encoding
           Specifies the encoding you are converting from.  Unlike iconv, this
           option can be omitted.  In such cases, the current locale is used.

       -t,--to to_encoding
           Specifies the encoding you are converting to.  Unlike iconv, this
           option can be omitted.  In such cases, the current locale is used.

           Therefore, when both -f and -t are omitted, piconv just acts like
           cat.

       -s,--string string
           uses string instead of file for the source of text.

       -l,--list
           Lists all available encodings, one per line, in case-insensitive
           order.  Note that only the canonical names are listed; many aliases
           exist.  For example, the names are case-insensitive, and many
           standard and common aliases work, such as "latin1" for
           "ISO-8859-1", or "ibm850" instead of "cp850", or "winlatin1" for
           "cp1252".  See Encode::Supported for a full discussion.

       -C,--check N
           Check the validity of the stream if N = 1.  When N = -1, something
           interesting happens when it encounters an invalid character.

       -c  Same as "-C 1".

       -p,--perlqq
       --htmlcref
       --xmlcref
           Applies PERLQQ, HTMLCREF, XMLCREF, respectively.  Try

             piconv -f utf8 -t ascii --perlqq

           To see what it does.

       -h,--help
           Show usage.

       -D,--debug
           Invokes debugging mode.  Primarily for Encode hackers.

       -S,--scheme scheme
           Selects which scheme is to be used for conversion.  Available
           schemes are as follows:

           from_to
               Uses Encode::from_to for conversion.  This is the default.

           decode_encode
               Input strings are decode()d then encode()d.  A straight two-
               step implementation.

           perlio
               The new perlIO layer is used.  NI-S’ favorite.

               You should use this option if you are using UTF-16 and others
               which linefeed is not $/.

           Like the -D option, this is also for Encode hackers.

SEE ALSO
       "1" in iconv "3" in locale Encode Encode::Supported Encode::Alias
       PerlIO



perl v5.10.0                      2009-08-28                         PICONV(1)
wzb@zeus:~$

可见,piconv 比iconv使用起来更方便!

php中的iconv:

unix下安装PHP的module,需要重新编译PHP;

在Windows下安装,只需将php.ini里的配置打开相应的dll就可:

例如,需要加入gb库的支持,需要如下设置:

  extension_dir = “C:/ipaddr/php/extensions/”

(注意,建议写全地址,并且后面加上/,很多时候是因为这里设置不对,才导致无法加载其它模块的dll的)

  再打开

  extension=php_gd2.dll

但如果是安装iconv.dll,按上面方法,打开php_iconv.dll后,还是无法开启iconv模块,

需要如下配置:

  a.  去iconv的官方下载站点 下载Windows版的iconv文件:libiconv-1.9.1.bin.woe32.zip

  将该文件解压,将bin/下面的charset.dll,iconv.dll,iconv.exe拷贝到c:/windows/ (或其它的系统PATH中)

  (这步是必须的,php_iconv.dll也是调用GNU的iconv库的,所以,先要安装GNU的iconv库)

  b. 开启php.ini里面的php_iconv.dll

  c. 重启Apache,再在phpinfo();检测是否开启iconv。

  最近在做一个程序,需要用到iconv函数把抓取来过的utf-8编码的页面转成gb2312,

           发现只有用iconv函数把抓取过来的数据一转码数据就会无缘无故的少一些。

         去网上一查资料才知道这是iconv函数的一个bug。iconv在转换字符”—”到gb2312时会出错

  解决方法很简单,就是在需要转成的编码后加 “//IGNORE” 也就是iconv函数第二个参数后.如下:

  以下为引用的内容:

  iconv(“UTF-8″,”GB2312//IGNORE”,$data)

  ignore的意思是忽略转换时的错误,如果没有ignore参数,所有该字符后面的字符串都无法被保存。

  这个iconv()这个函数,在php5中是内置的.

  举例:

  

<?php 

  echo $str= ‘你好,这里是卖咖啡!’; 

  echo ‘<br />’; 

  echo iconv(‘GB2312′, ‘UTF-8′, $str); //将字符串的编码从GB2312转到UTF-8 

  echo ‘<br />’; 

  echo iconv_substr($str, 1, 1, ‘UTF-8′); //按字符个数截取而非字节 

  print_r(iconv_get_encoding()); //得到当前页面编码信息 

  echo iconv_strlen($str, ‘UTF-8′); //得到设定编码的字符串长度 

  //也有这样用的 

  $content = iconv(“UTF-8″,”gbk//TRANSLIT”,$content); 

  ?> 

 

php版iconv下载地址:http://ftp.gnu.org/pub/gnu/libiconv/

 

七、声明:部分材料来自百度百科。