
时间:2022-09-23 07:43:06

Input file:



Goal: output file with reordered columns, say



UPDATED Question: What is a good way of using powershell to solve this problem. I am aware of the existence of CSV related cmdlets, but these have limitations. Note that the order of records does not need to be changed, so loading the entire input/output file in memory should not be needed.


5 个解决方案



Here is the solution suitable for millions of records (assuming that your data do not have embedded ';')


$reader = [System.IO.File]::OpenText('data1.csv')
$writer = New-Object System.IO.StreamWriter 'data2.csv'
for(;;) {
    $line = $reader.ReadLine()
    if ($null -eq $line) {
    $data = $line.Split(";")
    $writer.WriteLine('{0};{1};{2}', $data[0], $data[2], $data[1])



Import-CSV C:\Path\To\Original.csv | Select-Object Column1, Column3, Column2 | Export-CSV C:\Path\To\Newfile.csv



It's great that people came with their solutions based on pure .NET. However, I would fight for the simplicity, if possible. That's why I upvoted all of you ;)


Why? I tried to generate 1.000.000 records and store it in CSV and then reorder the columns. Generating the csv was in my case much more demanding then the reordering. Look at the results.


It took only 1,8 minute to reorder the columns. For me it's pretty decent result. Is it ok for me? -> Yes, I don't need to try to find out quicker solution, it's good enough -> saved my time for some other interesting stuff ;)

只需1.8分钟就可以对列进行重新排序。对我而言,这是相当不错的结果。对我好吗? - >是的,我不需要尝试找出更快的解决方案,这已经足够了 - >为其他一些有趣的东西节省了我的时间;)

# generate some csv; objects have several properties
measure-command { 
    1..1mb | 
    % { 
        $date = get-date
        New-Object PsObject -Property @{
            Hour = $date.Hour
            Minute = $date.Minute
            Second = $date.Second
            ReadableTime = $date.ToLongTimeString()
            ReadableDate = $date.ToLongDateString()
        }} | 
    Export-Csv d:\temp\exported.csv 

TotalMinutes      : 6,100025295

# reorder the columns
measure-command { 
    Import-Csv d:\temp\exported.csv | 
        Select ReadableTime, ReadableDate, Hour, Minute, Second, Column1, Column2, Column3 | 
        Export-Csv d:\temp\exported2.csv 

TotalMinutes      : 2,33151559833333



Edit: Benchmarking info below.


I would not use the Powershell csv-related cmdlets. I would use either System.IO.StreamReader or Microsoft.VisualBasic.FileIO.TextFieldParser for reading in the file line-by-line to avoid loading the entire thing in memory, and I would use System.IO.StreamWriter to write it back out. The TextFieldParser internally uses a StreamReader, but handles parsing delimited fields so you don't have to, making it very useful if the CSV format is not straightforward (e.g., has delimiter characters in quoted fields).

我不会使用Powershell csv相关的cmdlet。我会使用System.IO.StreamReader或Microsoft.VisualBasic.FileIO.TextFieldParser逐行读取文件,以避免将整个内容加载到内存中,我会使用System.IO.StreamWriter将其写回。 TextFieldParser在内部使用StreamReader,但处理分析的分隔字段,因此您不必这样做,如果CSV格式不简单(例如,在引用字段中具有分隔符字符),则非常有用。

I would also not do this in Powershell at all, but rather in a .NET application, as it will be much faster than a Powershell script even if they use the same objects.


Here's C# for a simple version, assuming no quoted fields and ASCII encoding:


static void Main(){
    string source = @"D:\test.csv";
    string dest = @"D:\test2.csv";

    using ( var reader = new Microsoft.VisualBasic.FileIO.TextFieldParser( source, Encoding.ASCII ) ) {
        using ( var writer = new System.IO.StreamWriter( dest, false, Encoding.ASCII ) ) {
            reader.SetDelimiters( ";" );
            while ( !reader.EndOfData ) {
                var fields = reader.ReadFields();
                swap(fields, 1, 2);
                writer.WriteLine( string.Join( ";", fields ) );

static void swap( string[] arr, int a, int b ) {
    string t = arr[ a ];
    arr[ a ] = arr[ b ];
    arr[ b ] = t;

Here's the Powershell version:



$source = 'D:\test.csv'
$dest = 'D:\test2.csv'

$reader = new-object Microsoft.VisualBasic.FileIO.TextFieldParser $source
$writer = new-object System.IO.StreamWriter $dest

function swap($f,$a,$b){ $t = $f[$a]; $f[$a] = $f[$b]; $f[$b] = $t}

while ( !$reader.EndOfData ) {
    $fields = $reader.ReadFields()
    swap $fields 1 2
    $writer.WriteLine([string]::join(';', $fields))


I benchmarked both of these against a 3-column csv file with 10,000,000 rows. The C# version took 171.132 seconds (just under 3 minutes). The Powershell version took 2,364.995 seconds (39 minutes, 25 seconds).

我将这两个对比为具有10,000,000行的3列csv文件。 C#版本花了171.132秒(不到3分钟)。 Powershell版本耗时2,364.995秒(39分25秒)。

Edit: Why mine take so darn long.


The swap function is a huge bottleneck in my Powershell version. Replacing it with '{0};{1};{2}'-style output like Roman Kuzmin's answer cut it down to less than 9 minutes. Replacing TextFieldParser more than halved the remaining to under 4 minutes.

交换功能是我的Powershell版本的一个巨大瓶颈。将其替换为“{0}; {1}; {2}” - 样式输出,如Roman Kuzmin的答案,将其缩短至不到9分钟。将TextFieldParser替换为将剩余时间减少一半以下不到4分钟。

However, a .NET console app version of Roman Kuzmin's answer took 20 seconds.

但是,一个.NET控制台应用程序版本的Roman Kuzmin的答案花了20秒。



I'd do it this way:


$new_csv = new-object system.collections.ArrayList
get-content mycsv.csv |% {
$new_csv.add((($_ -split ";")[0,2,1]) -join ";") > $nul
$new_csv | out-file myreordered.csv



Here is the solution suitable for millions of records (assuming that your data do not have embedded ';')


$reader = [System.IO.File]::OpenText('data1.csv')
$writer = New-Object System.IO.StreamWriter 'data2.csv'
for(;;) {
    $line = $reader.ReadLine()
    if ($null -eq $line) {
    $data = $line.Split(";")
    $writer.WriteLine('{0};{1};{2}', $data[0], $data[2], $data[1])



Import-CSV C:\Path\To\Original.csv | Select-Object Column1, Column3, Column2 | Export-CSV C:\Path\To\Newfile.csv



It's great that people came with their solutions based on pure .NET. However, I would fight for the simplicity, if possible. That's why I upvoted all of you ;)


Why? I tried to generate 1.000.000 records and store it in CSV and then reorder the columns. Generating the csv was in my case much more demanding then the reordering. Look at the results.


It took only 1,8 minute to reorder the columns. For me it's pretty decent result. Is it ok for me? -> Yes, I don't need to try to find out quicker solution, it's good enough -> saved my time for some other interesting stuff ;)

只需1.8分钟就可以对列进行重新排序。对我而言,这是相当不错的结果。对我好吗? - >是的,我不需要尝试找出更快的解决方案,这已经足够了 - >为其他一些有趣的东西节省了我的时间;)

# generate some csv; objects have several properties
measure-command { 
    1..1mb | 
    % { 
        $date = get-date
        New-Object PsObject -Property @{
            Hour = $date.Hour
            Minute = $date.Minute
            Second = $date.Second
            ReadableTime = $date.ToLongTimeString()
            ReadableDate = $date.ToLongDateString()
        }} | 
    Export-Csv d:\temp\exported.csv 

TotalMinutes      : 6,100025295

# reorder the columns
measure-command { 
    Import-Csv d:\temp\exported.csv | 
        Select ReadableTime, ReadableDate, Hour, Minute, Second, Column1, Column2, Column3 | 
        Export-Csv d:\temp\exported2.csv 

TotalMinutes      : 2,33151559833333



Edit: Benchmarking info below.


I would not use the Powershell csv-related cmdlets. I would use either System.IO.StreamReader or Microsoft.VisualBasic.FileIO.TextFieldParser for reading in the file line-by-line to avoid loading the entire thing in memory, and I would use System.IO.StreamWriter to write it back out. The TextFieldParser internally uses a StreamReader, but handles parsing delimited fields so you don't have to, making it very useful if the CSV format is not straightforward (e.g., has delimiter characters in quoted fields).

我不会使用Powershell csv相关的cmdlet。我会使用System.IO.StreamReader或Microsoft.VisualBasic.FileIO.TextFieldParser逐行读取文件,以避免将整个内容加载到内存中,我会使用System.IO.StreamWriter将其写回。 TextFieldParser在内部使用StreamReader,但处理分析的分隔字段,因此您不必这样做,如果CSV格式不简单(例如,在引用字段中具有分隔符字符),则非常有用。

I would also not do this in Powershell at all, but rather in a .NET application, as it will be much faster than a Powershell script even if they use the same objects.


Here's C# for a simple version, assuming no quoted fields and ASCII encoding:


static void Main(){
    string source = @"D:\test.csv";
    string dest = @"D:\test2.csv";

    using ( var reader = new Microsoft.VisualBasic.FileIO.TextFieldParser( source, Encoding.ASCII ) ) {
        using ( var writer = new System.IO.StreamWriter( dest, false, Encoding.ASCII ) ) {
            reader.SetDelimiters( ";" );
            while ( !reader.EndOfData ) {
                var fields = reader.ReadFields();
                swap(fields, 1, 2);
                writer.WriteLine( string.Join( ";", fields ) );

static void swap( string[] arr, int a, int b ) {
    string t = arr[ a ];
    arr[ a ] = arr[ b ];
    arr[ b ] = t;

Here's the Powershell version:



$source = 'D:\test.csv'
$dest = 'D:\test2.csv'

$reader = new-object Microsoft.VisualBasic.FileIO.TextFieldParser $source
$writer = new-object System.IO.StreamWriter $dest

function swap($f,$a,$b){ $t = $f[$a]; $f[$a] = $f[$b]; $f[$b] = $t}

while ( !$reader.EndOfData ) {
    $fields = $reader.ReadFields()
    swap $fields 1 2
    $writer.WriteLine([string]::join(';', $fields))


I benchmarked both of these against a 3-column csv file with 10,000,000 rows. The C# version took 171.132 seconds (just under 3 minutes). The Powershell version took 2,364.995 seconds (39 minutes, 25 seconds).

我将这两个对比为具有10,000,000行的3列csv文件。 C#版本花了171.132秒(不到3分钟)。 Powershell版本耗时2,364.995秒(39分25秒)。

Edit: Why mine take so darn long.


The swap function is a huge bottleneck in my Powershell version. Replacing it with '{0};{1};{2}'-style output like Roman Kuzmin's answer cut it down to less than 9 minutes. Replacing TextFieldParser more than halved the remaining to under 4 minutes.

交换功能是我的Powershell版本的一个巨大瓶颈。将其替换为“{0}; {1}; {2}” - 样式输出,如Roman Kuzmin的答案,将其缩短至不到9分钟。将TextFieldParser替换为将剩余时间减少一半以下不到4分钟。

However, a .NET console app version of Roman Kuzmin's answer took 20 seconds.

但是,一个.NET控制台应用程序版本的Roman Kuzmin的答案花了20秒。



I'd do it this way:


$new_csv = new-object system.collections.ArrayList
get-content mycsv.csv |% {
$new_csv.add((($_ -split ";")[0,2,1]) -join ";") > $nul
$new_csv | out-file myreordered.csv