Am running GNUwin32
under windows 7.
我在Windows 7下运行GNUwin32。
Have many files in a single directory with file names that look like this:
在单个目录中有许多文件,文件名如下所示:
chem.001.txt
chem.002.b4.txt
chem.003.md6.txt
(more files.txt) ...
In their current form, none of the files includes the file name.
在其当前形式中,没有文件包含文件名。
Need to clean these files for further use. Want to concatenate all files into a single file. But also need to include the file name at the beginning of concatenated content to later associate the original file with clean data.
需要清理这些文件以供进一步使用。想要将所有文件连接成一个文件。但是还需要在连接内容的开头包含文件名,以便稍后将原始文件与干净数据相关联。
For example, the single, concatenated file (new_file.txt) would look like this:
例如,单个连接文件(new_file.txt)将如下所示:
chem.001.txt delimiter (could be a tab or pipe) followed by text from chem.001.txt...
chem.002.b4.txt delimiter followed by text from chem.002.b4.txt ...
chem.003.md6.txt delimiter followed by text from chem.003.md6.txt ...
etc. ...
Will then clean the concatenated file and parse content as needed.
然后将清理连接文件并根据需要解析内容。
awk
- gawk
may have a means to associate the file name with ($1), associate the text in the file with ($2) and then, in sequence, print ($1, $2) for each file into 'new_file.txt' but I've not been able to make it work.
awk - gawk可能有一种方法可以将文件名与($ 1)相关联,将文件中的文本与($ 2)相关联,然后依次将每个文件的print($ 1,$ 2)打印成'new_file.txt'但是我无法使其发挥作用。
How to do this?
这个怎么做?
1 个解决方案
#1
2
Put this in foo.awk:
把它放在foo.awk中:
BEGIN{ RS="^$"; ORS=""; OFS="|" }
{ gsub(/\n[\r]?/," "); print FILENAME, $0 > "new_file.txt" }
and then execute it as
然后执行它
awk -f foo.awk <files>
where <files>
is however you provide a list of file names in Windows. It uses GNU awk for multi-char RS to let you read a whole file as a single record.
但是,
#1
2
Put this in foo.awk:
把它放在foo.awk中:
BEGIN{ RS="^$"; ORS=""; OFS="|" }
{ gsub(/\n[\r]?/," "); print FILENAME, $0 > "new_file.txt" }
and then execute it as
然后执行它
awk -f foo.awk <files>
where <files>
is however you provide a list of file names in Windows. It uses GNU awk for multi-char RS to let you read a whole file as a single record.
但是,