从制表符分隔的文件中获取值

时间:2020-12-19 02:06:01

I have a tab delimited file. In case the values contain tabs, they will enclosed in ". So sample records will look like this:

我有一个制表符分隔文件。如果值包含选项卡,它们将包含在“。”所以示例记录将如下所示:

firstfield  secondfield thirdfield
firstfield  "second field   with    tab"    thirdfield
firstfield  secondfield thirdfield

Is it possible to write a cut/awk one liner that can take care of this situation? For example,I would like to get the second and third columns.

是否可以编写一个可以处理这种情况的切割/ awk衬垫?例如,我想获得第二和第三列。

2 个解决方案

#1


1  

As @fedorqui comments there are better tools than gawk for this task, check FPAT variable anyway.

正如@fedorqui评论的那样,有更好的工具而不是gawk来完成这项任务,无论如何都要检查FPAT变量。

A quick perlsolution.

一个快速的解决方案。

perl -F'(\w+|"[^"]+")' -ane 'print $F[3]." ".$F[5]."\n"' file 

#2


1  

Using GNU awk you can use the FPAT feature as pointed out by klashxx:

使用GNU awk,您可以使用klashxx指出的FPAT功能:

script.awk

script.awk

BEGIN { FPAT = "([^\t]+)|(\"[^\"]+\")"
        OFS  = "\t" }
      { print $2, $3 }

Use it like this: awk -f script.awk yourfile. The script is adopted from GNU Gawk manual - Splitting by content

像这样使用它:awk -f script.awk yourfile。该脚本采用GNU Gawk手册 - 按内容拆分

#1


1  

As @fedorqui comments there are better tools than gawk for this task, check FPAT variable anyway.

正如@fedorqui评论的那样,有更好的工具而不是gawk来完成这项任务,无论如何都要检查FPAT变量。

A quick perlsolution.

一个快速的解决方案。

perl -F'(\w+|"[^"]+")' -ane 'print $F[3]." ".$F[5]."\n"' file 

#2


1  

Using GNU awk you can use the FPAT feature as pointed out by klashxx:

使用GNU awk,您可以使用klashxx指出的FPAT功能:

script.awk

script.awk

BEGIN { FPAT = "([^\t]+)|(\"[^\"]+\")"
        OFS  = "\t" }
      { print $2, $3 }

Use it like this: awk -f script.awk yourfile. The script is adopted from GNU Gawk manual - Splitting by content

像这样使用它:awk -f script.awk yourfile。该脚本采用GNU Gawk手册 - 按内容拆分