如何使用shebang(即#!)为awk使用多个参数?

时间:2021-10-07 22:27:55

I'd like to execute an gawk script with --re-interval using a shebang. The "naive" approach of

我想使用shebang执行一个带有--re-interval的gawk脚本。 “天真”的方法

#!/usr/bin/gawk --re-interval -f
... awk script goes here

does not work, since gawk is called with the first argument "--re-interval -f" (not splitted around the whitespace), which it does not understand. Is there a workaround for that?

因为gawk使用第一个参数“--re-interval -f”(不是在空白周围分割)调用,它不能理解。有没有解决方法呢?

Of course you can either not call gawk directly but wrap it into a shell script that splits the first argument, or make a shell script that then calls gawk and put the script into another file, but I was wondering if there was some way to do this within one file.

当然你可以不直接调用gawk,而是将它包装成一个分割第一个参数的shell脚本,或者创建一个shell脚本,然后调用gawk并将脚本放到另一个文件中,但我想知道是否有某种方法可以做这在一个文件中。

The behaviour of shebang lines differs from system to system - at least in Cygwin it does not split the arguments by whitespaces. I just care about how to do it on a system that behaves like that; the script is not meant to be portable.

shebang行的行为因系统而异 - 至少在Cygwin中它不会通过空格分割参数。我只关心如何在一个行为类似的系统上做到这一点;该脚本不是便携式的。

9 个解决方案

#1


19  

This seems to work for me with (g)awk.

这似乎对我有用(g)awk。

#!/bin/sh
arbitrary_long_name==0 "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@"


# The real awk program starts here
{ print $0 }

Note the #! runs /bin/sh, so this script is first interpreted as a shell script.

注意#! runs / bin / sh,因此该脚本首先被解释为shell脚本。

At first, I simply tried "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@", but awk treated that as a command and printed out every line of input unconditionally. That is why I put in the arbitrary_long_name==0 - it's supposed to fail all the time. You could replace it with some gibberish string. Basically, I was looking for a false-condition in awk that would not adversely affect the shell script.

起初,我只是尝试了“exec”“/ usr / bin / gawk”“ - re-interval”“ - f”“$ 0”“$ @”,但awk将其视为一个命令并打印出每一行输入无条件的。这就是我输入arbitrary_long_name == 0的原因 - 它应该一直都会失败。你可以用一些乱码来代替它。基本上,我在awk中寻找一个不会对shell脚本产生负面影响的错误条件。

In the shell script, the arbitrary_long_name==0 defines a variable called arbitrary_long_name and sets it equal to =0.

在shell脚本中,arbitrary_long_name == 0定义了一个名为arbitrary_long_name的变量,并将其设置为= 0。

#2


140  

The shebang line has never been specified as part of POSIX, SUS, LSB or any other specification. AFAIK, it hasn't even been properly documented.

shebang系列从未被指定为POSIX,SUS,LSB或任何其他规范的一部分。 AFAIK,它甚至没有被正确记录。

There is a rough consensus about what it does: take everything between the ! and the \n and exec it. The assumption is that everything between the ! and the \n is a full absolute path to the interpreter. There is no consensus about what happens if it contains whitespace.

对它的作用有一个大致的共识:把所有东西都拿走!和\ n并执行它。假设是之间的一切!并且\ n是解释器的完整绝对路径。如果它包含空格,则没有达成共识。

  1. Some operating systems simply treat the entire thing as the path. After all, in most operating systems, whitespace or dashes are legal in a path.
  2. 有些操作系统只是将整个事物视为路径。毕竟,在大多数操作系统中,空格或短划线在路径中是合法的。
  3. Some operating systems split at whitespace and treat the first part as the path to the interpreter and the rest as individual arguments.
  4. 一些操作系统在空白处拆分,将第一部分视为解释器的路径,其余部分作为单独的参数。
  5. Some operating systems split at the first whitespace and treat the front part as the path to the interpeter and the rest as a single argument (which is what you are seeing).
  6. 一些操作系统在第一个空格处分开,并将前部分视为插入者的路径,将其余部分视为单个参数(这是您所看到的)。
  7. Some even don't support shebang lines at all.
  8. 有些甚至根本不支持shebang线。

Thankfully, 1. and 4. seem to have died out, but 3. is pretty widespread, so you simply cannot rely on being able to pass more than one argument.

值得庆幸的是,1.和4.似乎已经消亡,但是3.相当普遍,所以你根本不能依赖能够传递多个参数。

And since the location of commands is also not specified in POSIX or SUS, you generally use up that single argument by passing the executable's name to env so that it can determine the executable's location; e.g.:

并且由于命令的位置也没有在POSIX或SUS中指定,所以通常通过将可执行文件的名称传递给env来使用该单个参数,以便它可以确定可执行文件的位置;例如。:

#!/usr/bin/env gawk

[Obviously, this still assumes a particular path for env, but there are only very few systems where it lives in /bin, so this is generally safe. The location of env is a lot more standardized than the location of gawk or even worse something like python or ruby or spidermonkey.]

[显然,这仍然是env的特定路径,但是它只存在于/ bin中的系统非常少,所以这通常是安全的。 env的位置比gawk的位置更加标准化,甚至更糟糕的是像python或ruby或spidermonkey。]

Which means that you cannot actually use any arguments at all.

这意味着您根本无法使用任何参数。

#3


11  

I came across the same issue, with no apparent solution because of the way the whitespaces are dealt with in a shebang (at least on Linux).

我遇到了同样的问题,没有明显的解决方案,因为在shebang中处理空格的方式(至少在Linux上)。

However, you can pass several options in a shebang, as long as they are short options and they can be concatenated (the GNU way).

但是,你可以在shebang中传递几个选项,只要它们是短选项并且它们可以连接(GNU方式)。

For example, you can not have

例如,你不能拥有

#!/usr/bin/foo -i -f

but you can have

但你可以拥有

#!/usr/bin/foo -if

Obviously, that only works when the options have short equivalents and take no arguments.

显然,只有在选项具有短等价物且不带参数时才有效。

#4


11  

Under Cygwin and Linux everything after the path of the shebang gets parsed to the program as one argument.

在Cygwin和Linux下,shebang路径之后的所有内容都被解析为程序作为一个参数。

It's possible to hack around this by using another awk script inside the shebang:

通过在shebang中使用另一个awk脚本可以解决这个问题:

#!/usr/bin/gawk {system("/usr/bin/gawk --re-interval -f " FILENAME); exit}

This will execute {system("/usr/bin/gawk --re-interval -f " FILENAME); exit} in awk.
And this will execute /usr/bin/gawk --re-interval -f path/to/your/script.awk in your systems shell.

这将执行{system(“/ usr / bin / gawk --re-interval -f”FILENAME);在awk中退出}。这将在您的系统shell中执行/ usr / bin / gawk --re-interval -f path / to / your / script.awk。

#5


5  

#!/bin/sh
''':'
exec YourProg -some_options "$0" "$@"
'''

The above shell shebang trick is more portable than /usr/bin/env.

上面的shell shebang技巧比/ usr / bin / env更便携。

#6


3  

In the gawk manual (http://www.gnu.org/manual/gawk/gawk.html), the end of section 1.14 note that you should only use a single argument when running gawk from a shebang line. It says that the OS will treat everything after the path to gawk as a single argument. Perhaps there is another way to specify the --re-interval option? Perhaps your script can reference your shell in the shebang line, run gawk as a command, and include the text of your script as a "here document".

在gawk手册(http://www.gnu.org/manual/gawk/gawk.html)中,1.14节的末尾注意到从shebang行运行gawk时应该只使用一个参数。它表示操作系统会将通往gawk的路径之后的所有内容视为一个参数。也许有另一种方法来指定--re-interval选项?也许你的脚本可以在shebang行中引用你的shell,运行gawk作为命令,并将脚本的文本包含为“here here”。

#7


2  

Why not use bash and gawk itself, to skip past shebang, read the script, and pass it as a file to a second instance of gawk [--with-whatever-number-of-params-you-need]?

为什么不使用bash和gawk本身,跳过shebang,阅读脚本,并将其作为文件传递给第二个gawk实例[--with-whatever-of-params-you-need]?

#!/bin/bash
gawk --re-interval -f <(gawk 'NR>3' $0 )
exit
{
  print "Program body goes here"
  print $1
}

(-the same could naturally also be accomplished with e.g. sed or tail, but I think there's some kind of beauty depending only on bash and gawk itself;)

( - 自然也可以用例如sed或tail来实现,但我认为只有bash和gawk本身才会有某种美;)

#8


0  

Just for fun: there is the following quite weird solution that reroutes stdin and the program through file descriptors 3 and 4. You could also create a temporary file for the script.

只是为了好玩:有以下非常奇怪的解决方案,通过文件描述符3和4重新路由stdin和程序。您还可以为脚本创建一个临时文件。

#!/bin/bash
exec 3>&0
exec <<-EOF 4>&0
BEGIN {print "HALLO"}
{print \$1}
EOF
gawk --re-interval -f <(cat 0>&4) 0>&3

One thing is annoying about this: the shell does variable expansion on the script, so you have to quote every $ (as done in the second line of the script) and probably more than that.

有一件事令人烦恼:shell在脚本上进行了可变扩展,所以你必须引用每个$(如脚本的第二行所做的那样)并且可能更多。

#9


-1  

For a portable solution, use awk rather than gawk, invoke the standard BOURNE shell (/bin/sh) with your shebang, and invoke awk directly, passing the program on the command line as a here document rather than via stdin:

对于可移植的解决方案,使用awk而不是gawk,使用shebang调用标准BOURNE shell(/ bin / sh),并直接调用awk,将命令行上的程序作为here文档而不是stdin传递:

#!/bin/sh
gawk --re-interval <<<EOF
PROGRAM HERE
EOF

Note: no -f argument to awk. That leaves stdin available for awk to read input from. Assuming you have gawk installed and on your PATH, that achieves everything I think you were trying to do with your original example (assuming you wanted the file content to be the awk script and not the input, which I think your shebang approach would have treated it as).

注意:awk没有-f参数。这使得stk可用于awk读取输入。假设你已经安装了gawk并且在你的PATH上,这实现了我认为你试图用你的原始例子做的一切(假设你想要文件内容是awk脚本而不是输入,我认为你的shebang方法会对待它作为)。

#1


19  

This seems to work for me with (g)awk.

这似乎对我有用(g)awk。

#!/bin/sh
arbitrary_long_name==0 "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@"


# The real awk program starts here
{ print $0 }

Note the #! runs /bin/sh, so this script is first interpreted as a shell script.

注意#! runs / bin / sh,因此该脚本首先被解释为shell脚本。

At first, I simply tried "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@", but awk treated that as a command and printed out every line of input unconditionally. That is why I put in the arbitrary_long_name==0 - it's supposed to fail all the time. You could replace it with some gibberish string. Basically, I was looking for a false-condition in awk that would not adversely affect the shell script.

起初,我只是尝试了“exec”“/ usr / bin / gawk”“ - re-interval”“ - f”“$ 0”“$ @”,但awk将其视为一个命令并打印出每一行输入无条件的。这就是我输入arbitrary_long_name == 0的原因 - 它应该一直都会失败。你可以用一些乱码来代替它。基本上,我在awk中寻找一个不会对shell脚本产生负面影响的错误条件。

In the shell script, the arbitrary_long_name==0 defines a variable called arbitrary_long_name and sets it equal to =0.

在shell脚本中,arbitrary_long_name == 0定义了一个名为arbitrary_long_name的变量,并将其设置为= 0。

#2


140  

The shebang line has never been specified as part of POSIX, SUS, LSB or any other specification. AFAIK, it hasn't even been properly documented.

shebang系列从未被指定为POSIX,SUS,LSB或任何其他规范的一部分。 AFAIK,它甚至没有被正确记录。

There is a rough consensus about what it does: take everything between the ! and the \n and exec it. The assumption is that everything between the ! and the \n is a full absolute path to the interpreter. There is no consensus about what happens if it contains whitespace.

对它的作用有一个大致的共识:把所有东西都拿走!和\ n并执行它。假设是之间的一切!并且\ n是解释器的完整绝对路径。如果它包含空格,则没有达成共识。

  1. Some operating systems simply treat the entire thing as the path. After all, in most operating systems, whitespace or dashes are legal in a path.
  2. 有些操作系统只是将整个事物视为路径。毕竟,在大多数操作系统中,空格或短划线在路径中是合法的。
  3. Some operating systems split at whitespace and treat the first part as the path to the interpreter and the rest as individual arguments.
  4. 一些操作系统在空白处拆分,将第一部分视为解释器的路径,其余部分作为单独的参数。
  5. Some operating systems split at the first whitespace and treat the front part as the path to the interpeter and the rest as a single argument (which is what you are seeing).
  6. 一些操作系统在第一个空格处分开,并将前部分视为插入者的路径,将其余部分视为单个参数(这是您所看到的)。
  7. Some even don't support shebang lines at all.
  8. 有些甚至根本不支持shebang线。

Thankfully, 1. and 4. seem to have died out, but 3. is pretty widespread, so you simply cannot rely on being able to pass more than one argument.

值得庆幸的是,1.和4.似乎已经消亡,但是3.相当普遍,所以你根本不能依赖能够传递多个参数。

And since the location of commands is also not specified in POSIX or SUS, you generally use up that single argument by passing the executable's name to env so that it can determine the executable's location; e.g.:

并且由于命令的位置也没有在POSIX或SUS中指定,所以通常通过将可执行文件的名称传递给env来使用该单个参数,以便它可以确定可执行文件的位置;例如。:

#!/usr/bin/env gawk

[Obviously, this still assumes a particular path for env, but there are only very few systems where it lives in /bin, so this is generally safe. The location of env is a lot more standardized than the location of gawk or even worse something like python or ruby or spidermonkey.]

[显然,这仍然是env的特定路径,但是它只存在于/ bin中的系统非常少,所以这通常是安全的。 env的位置比gawk的位置更加标准化,甚至更糟糕的是像python或ruby或spidermonkey。]

Which means that you cannot actually use any arguments at all.

这意味着您根本无法使用任何参数。

#3


11  

I came across the same issue, with no apparent solution because of the way the whitespaces are dealt with in a shebang (at least on Linux).

我遇到了同样的问题,没有明显的解决方案,因为在shebang中处理空格的方式(至少在Linux上)。

However, you can pass several options in a shebang, as long as they are short options and they can be concatenated (the GNU way).

但是,你可以在shebang中传递几个选项,只要它们是短选项并且它们可以连接(GNU方式)。

For example, you can not have

例如,你不能拥有

#!/usr/bin/foo -i -f

but you can have

但你可以拥有

#!/usr/bin/foo -if

Obviously, that only works when the options have short equivalents and take no arguments.

显然,只有在选项具有短等价物且不带参数时才有效。

#4


11  

Under Cygwin and Linux everything after the path of the shebang gets parsed to the program as one argument.

在Cygwin和Linux下,shebang路径之后的所有内容都被解析为程序作为一个参数。

It's possible to hack around this by using another awk script inside the shebang:

通过在shebang中使用另一个awk脚本可以解决这个问题:

#!/usr/bin/gawk {system("/usr/bin/gawk --re-interval -f " FILENAME); exit}

This will execute {system("/usr/bin/gawk --re-interval -f " FILENAME); exit} in awk.
And this will execute /usr/bin/gawk --re-interval -f path/to/your/script.awk in your systems shell.

这将执行{system(“/ usr / bin / gawk --re-interval -f”FILENAME);在awk中退出}。这将在您的系统shell中执行/ usr / bin / gawk --re-interval -f path / to / your / script.awk。

#5


5  

#!/bin/sh
''':'
exec YourProg -some_options "$0" "$@"
'''

The above shell shebang trick is more portable than /usr/bin/env.

上面的shell shebang技巧比/ usr / bin / env更便携。

#6


3  

In the gawk manual (http://www.gnu.org/manual/gawk/gawk.html), the end of section 1.14 note that you should only use a single argument when running gawk from a shebang line. It says that the OS will treat everything after the path to gawk as a single argument. Perhaps there is another way to specify the --re-interval option? Perhaps your script can reference your shell in the shebang line, run gawk as a command, and include the text of your script as a "here document".

在gawk手册(http://www.gnu.org/manual/gawk/gawk.html)中,1.14节的末尾注意到从shebang行运行gawk时应该只使用一个参数。它表示操作系统会将通往gawk的路径之后的所有内容视为一个参数。也许有另一种方法来指定--re-interval选项?也许你的脚本可以在shebang行中引用你的shell,运行gawk作为命令,并将脚本的文本包含为“here here”。

#7


2  

Why not use bash and gawk itself, to skip past shebang, read the script, and pass it as a file to a second instance of gawk [--with-whatever-number-of-params-you-need]?

为什么不使用bash和gawk本身,跳过shebang,阅读脚本,并将其作为文件传递给第二个gawk实例[--with-whatever-of-params-you-need]?

#!/bin/bash
gawk --re-interval -f <(gawk 'NR>3' $0 )
exit
{
  print "Program body goes here"
  print $1
}

(-the same could naturally also be accomplished with e.g. sed or tail, but I think there's some kind of beauty depending only on bash and gawk itself;)

( - 自然也可以用例如sed或tail来实现,但我认为只有bash和gawk本身才会有某种美;)

#8


0  

Just for fun: there is the following quite weird solution that reroutes stdin and the program through file descriptors 3 and 4. You could also create a temporary file for the script.

只是为了好玩:有以下非常奇怪的解决方案,通过文件描述符3和4重新路由stdin和程序。您还可以为脚本创建一个临时文件。

#!/bin/bash
exec 3>&0
exec <<-EOF 4>&0
BEGIN {print "HALLO"}
{print \$1}
EOF
gawk --re-interval -f <(cat 0>&4) 0>&3

One thing is annoying about this: the shell does variable expansion on the script, so you have to quote every $ (as done in the second line of the script) and probably more than that.

有一件事令人烦恼:shell在脚本上进行了可变扩展,所以你必须引用每个$(如脚本的第二行所做的那样)并且可能更多。

#9


-1  

For a portable solution, use awk rather than gawk, invoke the standard BOURNE shell (/bin/sh) with your shebang, and invoke awk directly, passing the program on the command line as a here document rather than via stdin:

对于可移植的解决方案,使用awk而不是gawk,使用shebang调用标准BOURNE shell(/ bin / sh),并直接调用awk,将命令行上的程序作为here文档而不是stdin传递:

#!/bin/sh
gawk --re-interval <<<EOF
PROGRAM HERE
EOF

Note: no -f argument to awk. That leaves stdin available for awk to read input from. Assuming you have gawk installed and on your PATH, that achieves everything I think you were trying to do with your original example (assuming you wanted the file content to be the awk script and not the input, which I think your shebang approach would have treated it as).

注意:awk没有-f参数。这使得stk可用于awk读取输入。假设你已经安装了gawk并且在你的PATH上,这实现了我认为你试图用你的原始例子做的一切(假设你想要文件内容是awk脚本而不是输入,我认为你的shebang方法会对待它作为)。