Unit Title Class Title File Name Unit Title1 Title1 Filename1 Unit Title2 Title2 Filename2 Title3 Filename3 Title4 Filename4 Title5 Filename5 Unit Title3 Title6 Filename6 Title7 Filename7 Title8 Filename8 Title9 Filename9 Unit Title4 Title10 Filename10 Title11 Filename11 Title12 Filename12
I have a large amount of TSV (tab-separated values) files that have a structure like this. I'm trying to write a bash script that can parse these files into matching arrays. It's the empty lines that are throwing me for a loop. I need to be able to list out a class title while also listing which "Unit Title" it falls under.
我有大量具有这种结构的TSV(制表符分隔值)文件。我正在尝试编写一个可以将这些文件解析为匹配数组的bash脚本。这是空线让我循环。我需要能够列出一个类标题,同时列出它属于哪个“单元标题”。
I've can get each of the groups into their own arrays, but I can't duplicate the entries in "Unit Titles" to line up with the Class Titles. Can someone help get me pointed in the right direction? Thanks!
我可以将每个组都放到他们自己的数组中,但是我不能复制“单元标题”中的条目以与类标题对齐。有人能帮助我指出正确的方向吗?谢谢!
1 个解决方案
#1
1
It's unclear to me exactly what you want the arrays to look like, but perhaps pre-processing the input files to have all columns filled in helps:
我不清楚你想要数组的样子,但是预处理输入文件以填充所有列有助于:
awk -F'\t' -v OFS='\t' '
$0 != "" { # process only non-empty lines
# If field 1 is empty, set it to the most recent unit title.
if ($1 != "") ut=$1; else $1=ut;
# Print the (rebuilt) line.
print
}' tsvfile
This will result in something like (\t
represents a literal tab), which should make parsing easier:
这将导致类似(\ t表示文字选项卡),这将使解析更容易:
Unit Title1\tTitle1\tFilename1
Unit Title2\tTitle2\tFilename2
Unit Title2\tTitle3\tFilename3
Unit Title2\tTitle4\tFilename4
Unit Title2\tTitle5\tFilename5
Unit Title3\tTitle6\tFilename6
Unit Title3\tTitle7\tFilename7
...
#1
1
It's unclear to me exactly what you want the arrays to look like, but perhaps pre-processing the input files to have all columns filled in helps:
我不清楚你想要数组的样子,但是预处理输入文件以填充所有列有助于:
awk -F'\t' -v OFS='\t' '
$0 != "" { # process only non-empty lines
# If field 1 is empty, set it to the most recent unit title.
if ($1 != "") ut=$1; else $1=ut;
# Print the (rebuilt) line.
print
}' tsvfile
This will result in something like (\t
represents a literal tab), which should make parsing easier:
这将导致类似(\ t表示文字选项卡),这将使解析更容易:
Unit Title1\tTitle1\tFilename1
Unit Title2\tTitle2\tFilename2
Unit Title2\tTitle3\tFilename3
Unit Title2\tTitle4\tFilename4
Unit Title2\tTitle5\tFilename5
Unit Title3\tTitle6\tFilename6
Unit Title3\tTitle7\tFilename7
...