在Powershell中,如何拆分大型二进制文件?

时间:2022-08-03 21:37:24

I've seen the answer elsewhere for text files, but I need to do this for a compressed file.

我已经在其他地方看到了文本文件的答案,但我需要为压缩文件执行此操作。

I've got a 6G binary file which needs to be split into 100M chunks. Am I missing the analog for unix's "head" somewhere?

我有一个6G二进制文件,需要拆分成100M块。我在某处错过了unix的“头部”模拟器吗?

3 个解决方案

#1


13  

Never mind. Here you go:

没关系。干得好:

function split($inFile,  $outPrefix, [Int32] $bufSize){

  $stream = [System.IO.File]::OpenRead($inFile)
  $chunkNum = 1
  $barr = New-Object byte[] $bufSize

  while( $bytesRead = $stream.Read($barr,0,$bufsize)){
    $outFile = "$outPrefix$chunkNum"
    $ostream = [System.IO.File]::OpenWrite($outFile)
    $ostream.Write($barr,0,$bytesRead);
    $ostream.close();
    echo "wrote $outFile"
    $chunkNum += 1
  }
}

Assumption: bufSize fits in memory.

假设:bufSize适合内存。

#2


10  

The answer to the corollary question: How do you put them back together?

结果问题的答案是:你如何将它们重新组合在一起?

function stitch($infilePrefix, $outFile) {

    $ostream = [System.Io.File]::OpenWrite($outFile)
    $chunkNum = 1
    $infileName = "$infilePrefix$chunkNum"

    $offset = 0

    while(Test-Path $infileName) {
        $bytes = [System.IO.File]::ReadAllBytes($infileName)
        $ostream.Write($bytes, 0, $bytes.Count)
        Write-Host "read $infileName"
        $chunkNum += 1
        $infileName = "$infilePrefix$chunkNum"
    }

    $ostream.close();
}

#3


0  

I answered the question alluded to in this question's comments by bernd_k but I would use -ReadCount in this case instead of -TotalCount e.g.

我回答了bernd_k在这个问题的评论中提到的问题,但在这种情况下我会使用-ReadCount而不是-TotalCount,例如

Get-Content bigfile.bin -ReadCount 100MB -Encoding byte

This causes Get-Content to read a chunk of the file at a time where the chunk size is either a line for text encodings or a byte for byte encoding. Keep in mind that when it does this, you get an array passed down the pipeline and not individual bytes or lines of text.

这使得Get-Content在块大小是文本编码行或字节编码字节时读取文件块。请记住,当它执行此操作时,您将获得一个传递给管道的数组,而不是单个字节或文本行。

#1


13  

Never mind. Here you go:

没关系。干得好:

function split($inFile,  $outPrefix, [Int32] $bufSize){

  $stream = [System.IO.File]::OpenRead($inFile)
  $chunkNum = 1
  $barr = New-Object byte[] $bufSize

  while( $bytesRead = $stream.Read($barr,0,$bufsize)){
    $outFile = "$outPrefix$chunkNum"
    $ostream = [System.IO.File]::OpenWrite($outFile)
    $ostream.Write($barr,0,$bytesRead);
    $ostream.close();
    echo "wrote $outFile"
    $chunkNum += 1
  }
}

Assumption: bufSize fits in memory.

假设:bufSize适合内存。

#2


10  

The answer to the corollary question: How do you put them back together?

结果问题的答案是:你如何将它们重新组合在一起?

function stitch($infilePrefix, $outFile) {

    $ostream = [System.Io.File]::OpenWrite($outFile)
    $chunkNum = 1
    $infileName = "$infilePrefix$chunkNum"

    $offset = 0

    while(Test-Path $infileName) {
        $bytes = [System.IO.File]::ReadAllBytes($infileName)
        $ostream.Write($bytes, 0, $bytes.Count)
        Write-Host "read $infileName"
        $chunkNum += 1
        $infileName = "$infilePrefix$chunkNum"
    }

    $ostream.close();
}

#3


0  

I answered the question alluded to in this question's comments by bernd_k but I would use -ReadCount in this case instead of -TotalCount e.g.

我回答了bernd_k在这个问题的评论中提到的问题,但在这种情况下我会使用-ReadCount而不是-TotalCount,例如

Get-Content bigfile.bin -ReadCount 100MB -Encoding byte

This causes Get-Content to read a chunk of the file at a time where the chunk size is either a line for text encodings or a byte for byte encoding. Keep in mind that when it does this, you get an array passed down the pipeline and not individual bytes or lines of text.

这使得Get-Content在块大小是文本编码行或字节编码字节时读取文件块。请记住,当它执行此操作时,您将获得一个传递给管道的数组,而不是单个字节或文本行。