根据分隔符将一个文件分割成多个文件

时间:2021-09-26 20:20:45

I have one file with -| as delimiter after each section...need to create separate files for each section using unix.

我有一个文件-|作为每个部分的分隔符…需要使用unix为每个部分创建单独的文件。

example of input file

输入文件的例子

wertretr
ewretrtret
1212132323
000232
-|
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|

Expected result in File 1

文件1中的预期结果

wertretr
ewretrtret
1212132323
000232
-|

Expected result in File 2

文件2中的预期结果

ereteertetet
232434234
erewesdfsfsfs
0234342343
-|

Expected result in File 3

文件3中的预期结果

jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|

11 个解决方案

#1


64  

A one liner, no programming. (except the regexp etc.)

一行代码,没有编程。(除了regexp等等)。

csplit --digits=2  --quiet --prefix=outfile infile "/-|/+1" "{*}"

#2


25  

awk '{print $0 " -|"> "file" NR}' RS='-\\|'  input-file

#3


7  

Debian has csplit, but I don't know if that's common to all/most/other distributions. If not, though, it shouldn't be too hard to track down the source and compile it...

Debian有csplit,但我不知道这对所有/大多数/其他发行版是否普遍。不过,如果不这样做,就不应该很难找到源代码并编译它……

#4


5  

I solved a slightly different problem, where the file contains a line with the name where the text that follows should go. This perl code does the trick for me:

我解决了一个稍微不同的问题,文件中包含一行名称,后面的文本应该放在那里。这段perl代码对我来说很有用:

#!/path/to/perl -w

#comment the line below for UNIX systems
use Win32::Clipboard;

# Get command line flags

#print ($#ARGV, "\n");
if($#ARGV == 0) {
    print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] \n\nNote that no space is allowed between the '--' and the related parameter.\n\nThe mff is found on a line followed by a filename.  All of the contents of filename.txt are written to that file until another mff is found.\n";
    exit;
}

# this package sets the ARGV count variable to -1;

use Getopt::Long;
my $mff = "";
GetOptions('mff' => \$mff);

# set a default $mff variable
if ($mff eq "") {$mff = "-#-"};
print ("using file switch=", $mff, "\n\n");

while($_ = shift @ARGV) {
    if(-f "$_") {
    push @filelist, $_;
    } 
}

# Could be more than one file name on the command line, 
# but this version throws away the subsequent ones.

$readfile = $filelist[0];

open SOURCEFILE, "<$readfile" or die "File not found...\n\n";
#print SOURCEFILE;

while (<SOURCEFILE>) {
  /^$mff (.*$)/o;
    $outname = $1;
#   print $outname;
#   print "right is: $1 \n";

if (/^$mff /) {

    open OUTFILE, ">$outname" ;
    print "opened $outname\n";
    }
    else {print OUTFILE "$_"};
  }

#5


2  

You can also use awk. I'm not very familiar with awk, but the following did seem to work for me. It generated part1.txt, part2.txt, part3.txt, and part4.txt. Do note, that the last partn.txt file that this generates is empty. I'm not sure how fix that, but I'm sure it could be done with a little tweaking. Any suggestions anyone?

你也可以使用awk。我对awk不是很熟悉,但是下面这些似乎对我很有用。它生成的part1。txt,第二部分。txt,part3。txt和part4.txt。注意,这是最后一部分。生成的txt文件为空。我不确定如何修复这个问题,但我确信只需稍加调整就可以完成。有什么建议吗?

awk_pattern file:

awk_pattern文件:

BEGIN{ fn = "part1.txt"; n = 1 }
{
   print > fn
   if (substr($0,1,2) == "-|") {
       close (fn)
       n++
       fn = "part" n ".txt"
   }
}

bash command:

bash命令:

awk -f awk_pattern input.file

awk - f awk_pattern input.file

#6


1  

Here's a Python 3 script that splits a file into multiple files based on a filename provided by the delimiters. Example input file:

下面是一个Python 3脚本,它根据分隔符提供的文件名将文件分割成多个文件。示例输入文件:

# Ignored

######## FILTER BEGIN foo.conf
This goes in foo.conf.
######## FILTER END

# Ignored

######## FILTER BEGIN bar.conf
This goes in bar.conf.
######## FILTER END

Here's the script:

这是脚本:

#!/usr/bin/env python3

import os
import argparse

# global settings
start_delimiter = '######## FILTER BEGIN'
end_delimiter = '######## FILTER END'

# parse command line arguments
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input-file", required=True, help="input filename")
parser.add_argument("-o", "--output-dir", required=True, help="output directory")

args = parser.parse_args()

# read the input file
with open(args.input_file, 'r') as input_file:
    input_data = input_file.read()

# iterate through the input data by line
input_lines = input_data.splitlines()
while input_lines:
    # discard lines until the next start delimiter
    while input_lines and not input_lines[0].startswith(start_delimiter):
        input_lines.pop(0)

    # corner case: no delimiter found and no more lines left
    if not input_lines:
        break

    # extract the output filename from the start delimiter
    output_filename = input_lines.pop(0).replace(start_delimiter, "").strip()
    output_path = os.path.join(args.output_dir, output_filename)

    # open the output file
    print("extracting file: {0}".format(output_path))
    with open(output_path, 'w') as output_file:
        # while we have lines left and they don't match the end delimiter
        while input_lines and not input_lines[0].startswith(end_delimiter):
            output_file.write("{0}\n".format(input_lines.pop(0)))

        # remove end delimiter if present
        if not input_lines:
            input_lines.pop(0)

Finally here's how you run it:

最后,你如何运行它:

$ python3 script.py -i input-file.txt -o ./output-folder/

#7


0  

cat file| ( I=0; echo -n "">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done )

and the formated version:

和格式化的版本:

#!/bin/bash
cat FILE | (
  I=0;
  echo -n"">file0;
  while read line; 
  do
    echo $line >> file$I;
    if [ "$line" == '-|' ];
    then I=$[I+1];
      echo -n "" > file$I;
    fi;
  done;
)

#8


0  

This is the sort of problem I wrote context-split for: http://stromberg.dnsalias.org/~strombrg/context-split.html

这是我为http://stromberg.dnsalias.org/~strombrg/context-split.html编写的那种问题

$ ./context-split -h
usage:
./context-split [-s separator] [-n name] [-z length]
        -s specifies what regex should separate output files
        -n specifies how output files are named (default: numeric
        -z specifies how long numbered filenames (if any) should be
        -i include line containing separator in output files
        operations are always performed on stdin

#9


0  

The following command works for me. Hope it helps. bash awk 'BEGIN{file = 0; filename = "output_" file ".txt"} /-|/ {getline; file ++; filename = "output_" file ".txt"}{print $0 > filename}' input

以下命令对我有效。希望它可以帮助。bash awk 'BEGIN{file = 0;文件名= "output_" file "。txt”} / - | / { getline;文件+ +;文件名= "output_" file "。{打印$0 >文件名}的输入

#10


0  

Use csplit if you have it.

如果有的话,使用csplit。

If you don't, but you have Python... don't use Perl.

如果你没有,但是你有Python…不要使用Perl。

Assuming your sample file is called "samplein":

假设您的示例文件被称为“samplein”:

$ python -c "import sys
for i, c in enumerate(sys.stdin.read().split('-|')):
    open(f'out{i}', 'w').write(c)" < samplein

If you have Python 3.5 or lower, you can't use f-strings:

如果你有Python 3.5或更低的,你不能使用f-string:

$ python -c "import sys
for i, c in enumerate(sys.stdin.read().split('-|')):
    open('out' + str(i), 'w').write(c)" < samplein

and now:

现在:

$ ls out*
out0  out1  out2  out3

#11


-1  

Here is a perl code that will do the thing

下面是一个perl代码来完成这个任务。

#!/usr/bin/perl
open(FI,"file.txt") or die "Input file not found";
$cur=0;
open(FO,">res.$cur.txt") or die "Cannot open output file $cur";
while(<FI>)
{
    print FO $_;
    if(/^-\|/)
    {
        close(FO);
        $cur++;
        open(FO,">res.$cur.txt") or die "Cannot open output file $cur"
    }
}
close(FO);

#1


64  

A one liner, no programming. (except the regexp etc.)

一行代码,没有编程。(除了regexp等等)。

csplit --digits=2  --quiet --prefix=outfile infile "/-|/+1" "{*}"

#2


25  

awk '{print $0 " -|"> "file" NR}' RS='-\\|'  input-file

#3


7  

Debian has csplit, but I don't know if that's common to all/most/other distributions. If not, though, it shouldn't be too hard to track down the source and compile it...

Debian有csplit,但我不知道这对所有/大多数/其他发行版是否普遍。不过,如果不这样做,就不应该很难找到源代码并编译它……

#4


5  

I solved a slightly different problem, where the file contains a line with the name where the text that follows should go. This perl code does the trick for me:

我解决了一个稍微不同的问题,文件中包含一行名称,后面的文本应该放在那里。这段perl代码对我来说很有用:

#!/path/to/perl -w

#comment the line below for UNIX systems
use Win32::Clipboard;

# Get command line flags

#print ($#ARGV, "\n");
if($#ARGV == 0) {
    print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] \n\nNote that no space is allowed between the '--' and the related parameter.\n\nThe mff is found on a line followed by a filename.  All of the contents of filename.txt are written to that file until another mff is found.\n";
    exit;
}

# this package sets the ARGV count variable to -1;

use Getopt::Long;
my $mff = "";
GetOptions('mff' => \$mff);

# set a default $mff variable
if ($mff eq "") {$mff = "-#-"};
print ("using file switch=", $mff, "\n\n");

while($_ = shift @ARGV) {
    if(-f "$_") {
    push @filelist, $_;
    } 
}

# Could be more than one file name on the command line, 
# but this version throws away the subsequent ones.

$readfile = $filelist[0];

open SOURCEFILE, "<$readfile" or die "File not found...\n\n";
#print SOURCEFILE;

while (<SOURCEFILE>) {
  /^$mff (.*$)/o;
    $outname = $1;
#   print $outname;
#   print "right is: $1 \n";

if (/^$mff /) {

    open OUTFILE, ">$outname" ;
    print "opened $outname\n";
    }
    else {print OUTFILE "$_"};
  }

#5


2  

You can also use awk. I'm not very familiar with awk, but the following did seem to work for me. It generated part1.txt, part2.txt, part3.txt, and part4.txt. Do note, that the last partn.txt file that this generates is empty. I'm not sure how fix that, but I'm sure it could be done with a little tweaking. Any suggestions anyone?

你也可以使用awk。我对awk不是很熟悉,但是下面这些似乎对我很有用。它生成的part1。txt,第二部分。txt,part3。txt和part4.txt。注意,这是最后一部分。生成的txt文件为空。我不确定如何修复这个问题,但我确信只需稍加调整就可以完成。有什么建议吗?

awk_pattern file:

awk_pattern文件:

BEGIN{ fn = "part1.txt"; n = 1 }
{
   print > fn
   if (substr($0,1,2) == "-|") {
       close (fn)
       n++
       fn = "part" n ".txt"
   }
}

bash command:

bash命令:

awk -f awk_pattern input.file

awk - f awk_pattern input.file

#6


1  

Here's a Python 3 script that splits a file into multiple files based on a filename provided by the delimiters. Example input file:

下面是一个Python 3脚本,它根据分隔符提供的文件名将文件分割成多个文件。示例输入文件:

# Ignored

######## FILTER BEGIN foo.conf
This goes in foo.conf.
######## FILTER END

# Ignored

######## FILTER BEGIN bar.conf
This goes in bar.conf.
######## FILTER END

Here's the script:

这是脚本:

#!/usr/bin/env python3

import os
import argparse

# global settings
start_delimiter = '######## FILTER BEGIN'
end_delimiter = '######## FILTER END'

# parse command line arguments
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input-file", required=True, help="input filename")
parser.add_argument("-o", "--output-dir", required=True, help="output directory")

args = parser.parse_args()

# read the input file
with open(args.input_file, 'r') as input_file:
    input_data = input_file.read()

# iterate through the input data by line
input_lines = input_data.splitlines()
while input_lines:
    # discard lines until the next start delimiter
    while input_lines and not input_lines[0].startswith(start_delimiter):
        input_lines.pop(0)

    # corner case: no delimiter found and no more lines left
    if not input_lines:
        break

    # extract the output filename from the start delimiter
    output_filename = input_lines.pop(0).replace(start_delimiter, "").strip()
    output_path = os.path.join(args.output_dir, output_filename)

    # open the output file
    print("extracting file: {0}".format(output_path))
    with open(output_path, 'w') as output_file:
        # while we have lines left and they don't match the end delimiter
        while input_lines and not input_lines[0].startswith(end_delimiter):
            output_file.write("{0}\n".format(input_lines.pop(0)))

        # remove end delimiter if present
        if not input_lines:
            input_lines.pop(0)

Finally here's how you run it:

最后,你如何运行它:

$ python3 script.py -i input-file.txt -o ./output-folder/

#7


0  

cat file| ( I=0; echo -n "">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done )

and the formated version:

和格式化的版本:

#!/bin/bash
cat FILE | (
  I=0;
  echo -n"">file0;
  while read line; 
  do
    echo $line >> file$I;
    if [ "$line" == '-|' ];
    then I=$[I+1];
      echo -n "" > file$I;
    fi;
  done;
)

#8


0  

This is the sort of problem I wrote context-split for: http://stromberg.dnsalias.org/~strombrg/context-split.html

这是我为http://stromberg.dnsalias.org/~strombrg/context-split.html编写的那种问题

$ ./context-split -h
usage:
./context-split [-s separator] [-n name] [-z length]
        -s specifies what regex should separate output files
        -n specifies how output files are named (default: numeric
        -z specifies how long numbered filenames (if any) should be
        -i include line containing separator in output files
        operations are always performed on stdin

#9


0  

The following command works for me. Hope it helps. bash awk 'BEGIN{file = 0; filename = "output_" file ".txt"} /-|/ {getline; file ++; filename = "output_" file ".txt"}{print $0 > filename}' input

以下命令对我有效。希望它可以帮助。bash awk 'BEGIN{file = 0;文件名= "output_" file "。txt”} / - | / { getline;文件+ +;文件名= "output_" file "。{打印$0 >文件名}的输入

#10


0  

Use csplit if you have it.

如果有的话,使用csplit。

If you don't, but you have Python... don't use Perl.

如果你没有,但是你有Python…不要使用Perl。

Assuming your sample file is called "samplein":

假设您的示例文件被称为“samplein”:

$ python -c "import sys
for i, c in enumerate(sys.stdin.read().split('-|')):
    open(f'out{i}', 'w').write(c)" < samplein

If you have Python 3.5 or lower, you can't use f-strings:

如果你有Python 3.5或更低的,你不能使用f-string:

$ python -c "import sys
for i, c in enumerate(sys.stdin.read().split('-|')):
    open('out' + str(i), 'w').write(c)" < samplein

and now:

现在:

$ ls out*
out0  out1  out2  out3

#11


-1  

Here is a perl code that will do the thing

下面是一个perl代码来完成这个任务。

#!/usr/bin/perl
open(FI,"file.txt") or die "Input file not found";
$cur=0;
open(FO,">res.$cur.txt") or die "Cannot open output file $cur";
while(<FI>)
{
    print FO $_;
    if(/^-\|/)
    {
        close(FO);
        $cur++;
        open(FO,">res.$cur.txt") or die "Cannot open output file $cur"
    }
}
close(FO);