尝试将TSV文件中的值解析为2个匹配的Bash数组

时间:2021-07-07 16:07:40
Unit Title      Class Title         File Name

Unit Title1     Title1              Filename1

Unit Title2     Title2              Filename2
                Title3              Filename3
                Title4              Filename4
                Title5              Filename5

Unit Title3     Title6              Filename6
                Title7              Filename7
                Title8              Filename8
                Title9              Filename9

Unit Title4     Title10             Filename10
                Title11             Filename11
                Title12             Filename12

I have a large amount of TSV (tab-separated values) files that have a structure like this. I'm trying to write a bash script that can parse these files into matching arrays. It's the empty lines that are throwing me for a loop. I need to be able to list out a class title while also listing which "Unit Title" it falls under.

我有大量具有这种结构的TSV(制表符分隔值)文件。我正在尝试编写一个可以将这些文件解析为匹配数组的bash脚本。这是空线让我循环。我需要能够列出一个类标题,同时列出它属于哪个“单元标题”。

I've can get each of the groups into their own arrays, but I can't duplicate the entries in "Unit Titles" to line up with the Class Titles. Can someone help get me pointed in the right direction? Thanks!

我可以将每个组都放到他们自己的数组中,但是我不能复制“单元标题”中的条目以与类标题对齐。有人能帮助我指出正确的方向吗?谢谢!

1 个解决方案

#1


1  

It's unclear to me exactly what you want the arrays to look like, but perhaps pre-processing the input files to have all columns filled in helps:

我不清楚你想要数组的样子,但是预处理输入文件以填充所有列有助于:

awk -F'\t' -v OFS='\t' '
  $0 != "" {  # process only non-empty lines
      # If field 1 is empty, set it to the most recent unit title.
    if ($1 != "") ut=$1; else $1=ut;
      # Print the (rebuilt) line.
    print
  }' tsvfile

This will result in something like (\t represents a literal tab), which should make parsing easier:

这将导致类似(\ t表示文字选项卡),这将使解析更容易:

Unit Title1\tTitle1\tFilename1
Unit Title2\tTitle2\tFilename2
Unit Title2\tTitle3\tFilename3
Unit Title2\tTitle4\tFilename4
Unit Title2\tTitle5\tFilename5
Unit Title3\tTitle6\tFilename6
Unit Title3\tTitle7\tFilename7
...

#1


1  

It's unclear to me exactly what you want the arrays to look like, but perhaps pre-processing the input files to have all columns filled in helps:

我不清楚你想要数组的样子,但是预处理输入文件以填充所有列有助于:

awk -F'\t' -v OFS='\t' '
  $0 != "" {  # process only non-empty lines
      # If field 1 is empty, set it to the most recent unit title.
    if ($1 != "") ut=$1; else $1=ut;
      # Print the (rebuilt) line.
    print
  }' tsvfile

This will result in something like (\t represents a literal tab), which should make parsing easier:

这将导致类似(\ t表示文字选项卡),这将使解析更容易:

Unit Title1\tTitle1\tFilename1
Unit Title2\tTitle2\tFilename2
Unit Title2\tTitle3\tFilename3
Unit Title2\tTitle4\tFilename4
Unit Title2\tTitle5\tFilename5
Unit Title3\tTitle6\tFilename6
Unit Title3\tTitle7\tFilename7
...