Do you know any way that I could programmatically or via scrirpt transform a set of text files saved in ansi character encoding, to unicode encoding?
您是否知道我可以通过编程方式或通过scrirpt将一组以ansi字符编码保存的文本文件转换为unicode编码?
I would like to do the same as I do when I open the file with notepad and choose to save it as an unicode file.
当我用记事本打开文件并选择将其保存为unicode文件时,我想做同样的事情。
6 个解决方案
#1
You can use iconv. On Windows you can use it under Cygwin.
你可以使用iconv。在Windows上,您可以在Cygwin下使用它。
iconv -f from_encoding -t to_encoding file
#2
This could work for you, but notice that it'll grab every file in the current folder:
这可能对您有用,但请注意它将获取当前文件夹中的每个文件:
Get-ChildItem | Foreach-Object { $c = (Get-Content $_); `
Set-Content -Encoding UTF8 $c -Path ($_.name + "u") }
Same thing using aliases for brevity:
同样使用别名来简化:
gci | %{ $c = (gc $_); sc -Encoding UTF8 $c -Path ($_.name + "u") }
Steven Murawski suggests using Out-File
instead. The differences between both cmdlets are the following:
Steven Murawski建议使用Out-File。两个cmdlet之间的差异如下:
-
Out-File
will attempt to format the input it receives. -
Out-File
's default encoding is Unicode-based, whereasSet-Content
uses the system's default.
Out-File将尝试格式化它接收的输入。
Out-File的默认编码是基于Unicode的,而Set-Content使用系统的默认编码。
Here's an example assuming the file test.txt
doesn't exist in either case:
这是一个假设文件test.txt在两种情况下都不存在的示例:
PS> [system.string] | Out-File test.txt
PS> Get-Content test.txt
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
# test.txt encoding is Unicode-based with BOM
PS> [system.string] | Set-Content test.txt
PS> Get-Content test.txt
System.String
# test.txt encoding is "ANSI" (Windows character set)
In fact, if you don't need any specific Unicode encoding, you could as well do the following to convert a text file to Unicode:
实际上,如果您不需要任何特定的Unicode编码,您还可以执行以下操作将文本文件转换为Unicode:
PS> Get-Content sourceASCII.txt > targetUnicode.txt
Out-File
is a "redirection operator with optional parameters" of sorts.
Out-File是一种“可选参数的重定向运算符”。
#3
The easiest way would be Get-Content 'path/to/text/file' | out-file 'name/of/file'.
最简单的方法是Get-Content'path / to / text / file'| out-file'name / of / file'。
Out-File has an -encoding parameter, the default of which is Unicode.
Out-File有一个-encoding参数,默认值为Unicode。
If you wanted to script a batch of them, you could do something like
如果你想编写一批脚本,你可以做类似的事情
$files = get-childitem 'directory/of/text/files'
foreach ($file in $files)
{
get-content $file | out-file $file.fullname
}
#4
Use the System.IO.StreamReader(To read the file contents) class together with the System.Text.Encoding.Encoding(To create the Encoder object which does the encoding) base class.
使用System.IO.StreamReader(读取文件内容)类以及System.Text.Encoding.Encoding(创建执行编码的Encoder对象)基类。
#5
You could create a new text file and write the bytes from the original file into the new one, placing a '\0' before each original byte (assuming the original text file was in English).
您可以创建一个新的文本文件,并将原始文件中的字节写入新文件,在每个原始字节前放置一个'\ 0'(假设原始文本文件是英文)。
#6
pseudo code...
Dim system, file, contents, newFile, oldFile
昏暗的系统,文件,内容,newFile,oldFile
Const ForReading = 1, ForWriting = 2, ForAppending = 3 Const AnsiFile = -2, UnicodeFile = -1
Const ForReading = 1,ForWriting = 2,ForAppending = 3 Const AnsiFile = -2,UnicodeFile = -1
Set system = CreateObject("Scripting.FileSystemObject...
设置system = CreateObject(“Scripting.FileSystemObject ...
Set file = system.GetFile("text1.txt")
设置file = system.GetFile(“text1.txt”)
Set oldFile = file.OpenAsTextStream(ForReading, AnsiFile)
设置oldFile = file.OpenAsTextStream(ForReading,AnsiFile)
contents = oldFile.ReadAll()
contents = oldFile.ReadAll()
oldFile.Close
system.CreateTextFile "text1.txt"
Set file = system.GetFile("text1.txt")
设置file = system.GetFile(“text1.txt”)
Set newFile = file.OpenAsTextStream(ForWriting, UnicodeFile)
设置newFile = file.OpenAsTextStream(ForWriting,UnicodeFile)
newFile.Write contents
newFile.Close
Hope this approach will work..
希望这种方法有效..
#1
You can use iconv. On Windows you can use it under Cygwin.
你可以使用iconv。在Windows上,您可以在Cygwin下使用它。
iconv -f from_encoding -t to_encoding file
#2
This could work for you, but notice that it'll grab every file in the current folder:
这可能对您有用,但请注意它将获取当前文件夹中的每个文件:
Get-ChildItem | Foreach-Object { $c = (Get-Content $_); `
Set-Content -Encoding UTF8 $c -Path ($_.name + "u") }
Same thing using aliases for brevity:
同样使用别名来简化:
gci | %{ $c = (gc $_); sc -Encoding UTF8 $c -Path ($_.name + "u") }
Steven Murawski suggests using Out-File
instead. The differences between both cmdlets are the following:
Steven Murawski建议使用Out-File。两个cmdlet之间的差异如下:
-
Out-File
will attempt to format the input it receives. -
Out-File
's default encoding is Unicode-based, whereasSet-Content
uses the system's default.
Out-File将尝试格式化它接收的输入。
Out-File的默认编码是基于Unicode的,而Set-Content使用系统的默认编码。
Here's an example assuming the file test.txt
doesn't exist in either case:
这是一个假设文件test.txt在两种情况下都不存在的示例:
PS> [system.string] | Out-File test.txt
PS> Get-Content test.txt
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
# test.txt encoding is Unicode-based with BOM
PS> [system.string] | Set-Content test.txt
PS> Get-Content test.txt
System.String
# test.txt encoding is "ANSI" (Windows character set)
In fact, if you don't need any specific Unicode encoding, you could as well do the following to convert a text file to Unicode:
实际上,如果您不需要任何特定的Unicode编码,您还可以执行以下操作将文本文件转换为Unicode:
PS> Get-Content sourceASCII.txt > targetUnicode.txt
Out-File
is a "redirection operator with optional parameters" of sorts.
Out-File是一种“可选参数的重定向运算符”。
#3
The easiest way would be Get-Content 'path/to/text/file' | out-file 'name/of/file'.
最简单的方法是Get-Content'path / to / text / file'| out-file'name / of / file'。
Out-File has an -encoding parameter, the default of which is Unicode.
Out-File有一个-encoding参数,默认值为Unicode。
If you wanted to script a batch of them, you could do something like
如果你想编写一批脚本,你可以做类似的事情
$files = get-childitem 'directory/of/text/files'
foreach ($file in $files)
{
get-content $file | out-file $file.fullname
}
#4
Use the System.IO.StreamReader(To read the file contents) class together with the System.Text.Encoding.Encoding(To create the Encoder object which does the encoding) base class.
使用System.IO.StreamReader(读取文件内容)类以及System.Text.Encoding.Encoding(创建执行编码的Encoder对象)基类。
#5
You could create a new text file and write the bytes from the original file into the new one, placing a '\0' before each original byte (assuming the original text file was in English).
您可以创建一个新的文本文件,并将原始文件中的字节写入新文件,在每个原始字节前放置一个'\ 0'(假设原始文本文件是英文)。
#6
pseudo code...
Dim system, file, contents, newFile, oldFile
昏暗的系统,文件,内容,newFile,oldFile
Const ForReading = 1, ForWriting = 2, ForAppending = 3 Const AnsiFile = -2, UnicodeFile = -1
Const ForReading = 1,ForWriting = 2,ForAppending = 3 Const AnsiFile = -2,UnicodeFile = -1
Set system = CreateObject("Scripting.FileSystemObject...
设置system = CreateObject(“Scripting.FileSystemObject ...
Set file = system.GetFile("text1.txt")
设置file = system.GetFile(“text1.txt”)
Set oldFile = file.OpenAsTextStream(ForReading, AnsiFile)
设置oldFile = file.OpenAsTextStream(ForReading,AnsiFile)
contents = oldFile.ReadAll()
contents = oldFile.ReadAll()
oldFile.Close
system.CreateTextFile "text1.txt"
Set file = system.GetFile("text1.txt")
设置file = system.GetFile(“text1.txt”)
Set newFile = file.OpenAsTextStream(ForWriting, UnicodeFile)
设置newFile = file.OpenAsTextStream(ForWriting,UnicodeFile)
newFile.Write contents
newFile.Close
Hope this approach will work..
希望这种方法有效..