如何使用awk提取引用字段？ [重复]

This question already has an answer here:

这个问题在这里已有答案:

Escaping separator within double quotes, in awk 3 answers

在双引号内转义分隔符,以awk 3答案

I am using

我在用

awk '{ printf "%s", $3 }'

to extract some field from a space delimited line. Of course I get partial results when the field is quoted with free spaces inside. May any body suggest a solution please?

从空格分隔的行中提取一些字段。当然,当引用字段时,我会得到部分结果。请问有谁提出解决方案吗?

4 个解决方案

#1

This is actually quite difficult. I came up with the following awk script that splits the line manually and stores all fields in an array.

这实际上非常困难。我想出了以下awk脚本,它手动拆分行并将所有字段存储在一个数组中。

{
    s = $0
    i = 0
    split("", a)
    while ((m = match(s, /"[^"]*"/)) > 0) {
        # Add all unquoted fields before this field
        n = split(substr(s, 1, m - 1), t)
        for (j = 1; j <= n; j++)
            a[++i] = t[j]
        # Add this quoted field
        a[++i] = substr(s, RSTART + 1, RLENGTH - 2)
        s = substr(s, RSTART + RLENGTH)
        if (i >= 3) # We can stop once we have field 3
            break
    }
    # Process the remaining unquoted fields after the last quoted field
    n = split(s, t)
    for (j = 1; j <= n; j++)
        a[++i] = t[j]
    print a[3]
}

#2

show your input file and desired output next time. To get quoted fields,

下次显示您的输入文件和所需的输出。要获得引用的字段,

$ cat file
field1 field2 "field 3" field4 "field5"

$ awk -F'"' '{for(i=2;i<=NF;i+=2) print $i}' file
field 3
field5

#3

Here's a possible alternative solution to this problem. It works by finding the fields that begin or end with quotes, and then joining those together. At the end it updates the fields and NF, so if you put more patterns after the one that does the merging, you can process the (new) fields using all the normal awk features.

这是此问题的可能替代解决方案。它的工作原理是找到以引号开头或结尾的字段,然后将它们连接在一起。最后它会更新字段和NF,因此如果在执行合并之后放置更多模式,则可以使用所有常规awk功能处理(新)字段。

I think this uses only features of POSIX awk and doesn't rely on gawk extensions, but I'm not completely sure.

我认为这只使用POSIX awk的功能而不依赖于gawk扩展,但我不完全确定。

# This function joins the fields $start to $stop together with FS, shifting
# subsequent fields down and updating NF.
#
function merge_fields(start, stop) {
    #printf "Merge fields $%d to $%d\n", start, stop;
    if (start >= stop)
        return;
    merged = "";
    for (i = start; i <= stop; i++) {
        if (merged)
            merged = merged OFS $i;
        else
            merged = $i;
    }
    $start = merged;

    offs = stop - start;
    for (i = start + 1; i <= NF; i++) {
        #printf "$%d = $%d\n", i, i+offs;
        $i = $(i + offs);
    }
    NF -= offs;
}

# Merge quoted fields together.
{
    start = stop = 0;
    for (i = 1; i <= NF; i++) {
        if (match($i, /^"/))
            start = i;
        if (match($i, /"$/))
            stop = i;
        if (start && stop && stop > start) {
            merge_fields(start, stop);
            # Start again from the beginning.
            i = 0;
            start = stop = 0;
        }
    }
}

# This rule executes after the one above. It sees the fields after merging.
{
    for (i = 1; i <= NF; i++) {
        printf "Field %d: >>>%s<<<\n", i, $i;
    }
}

On an input file like:

在输入文件上,如:

thing "more things" "thing" "more things and stuff"

it produces:

Field 1: >>>thing<<<
Field 2: >>>"more things"<<<
Field 3: >>>"thing"<<<
Field 4: >>>"more things and stuff"<<<

#4

If you are just looking for a specific field then

如果你只是在寻找一个特定的领域

$ cat file
field1 field2 "field 3" field4 "field5"

awk -F"\"" '{print $2}' file

works. It splits the file by ", so the 2nd field in the example above is the one you want.

作品。它将文件拆分为“,因此上面示例中的第二个字段是您想要的字段。

#1