I have a tab delimited file. In case the values contain tabs, they will enclosed in "
. So sample records will look like this:
我有一个制表符分隔文件。如果值包含选项卡,它们将包含在“。”所以示例记录将如下所示:
firstfield secondfield thirdfield
firstfield "second field with tab" thirdfield
firstfield secondfield thirdfield
Is it possible to write a cut/awk one liner that can take care of this situation? For example,I would like to get the second and third columns.
是否可以编写一个可以处理这种情况的切割/ awk衬垫?例如,我想获得第二和第三列。
2 个解决方案
#1
1
As @fedorqui comments there are better tools than gawk
for this task, check FPAT variable anyway.
正如@fedorqui评论的那样,有更好的工具而不是gawk来完成这项任务,无论如何都要检查FPAT变量。
A quick perl
solution.
一个快速的解决方案。
perl -F'(\w+|"[^"]+")' -ane 'print $F[3]." ".$F[5]."\n"' file
#2
1
Using GNU awk you can use the FPAT
feature as pointed out by klashxx:
使用GNU awk,您可以使用klashxx指出的FPAT功能:
script.awk
script.awk
BEGIN { FPAT = "([^\t]+)|(\"[^\"]+\")"
OFS = "\t" }
{ print $2, $3 }
Use it like this: awk -f script.awk yourfile
. The script is adopted from GNU Gawk manual - Splitting by content
像这样使用它:awk -f script.awk yourfile。该脚本采用GNU Gawk手册 - 按内容拆分
#1
1
As @fedorqui comments there are better tools than gawk
for this task, check FPAT variable anyway.
正如@fedorqui评论的那样,有更好的工具而不是gawk来完成这项任务,无论如何都要检查FPAT变量。
A quick perl
solution.
一个快速的解决方案。
perl -F'(\w+|"[^"]+")' -ane 'print $F[3]." ".$F[5]."\n"' file
#2
1
Using GNU awk you can use the FPAT
feature as pointed out by klashxx:
使用GNU awk,您可以使用klashxx指出的FPAT功能:
script.awk
script.awk
BEGIN { FPAT = "([^\t]+)|(\"[^\"]+\")"
OFS = "\t" }
{ print $2, $3 }
Use it like this: awk -f script.awk yourfile
. The script is adopted from GNU Gawk manual - Splitting by content
像这样使用它:awk -f script.awk yourfile。该脚本采用GNU Gawk手册 - 按内容拆分