bash:按列分割命令输出

时间:2022-10-09 15:42:24

I want to do this:

我想这样做:

  1. run a command
  2. 运行一个命令
  3. capture the output
  4. 捕获输出
  5. select a line
  6. 选择一行
  7. select a column of that line
  8. 选择该行中的一列

Just as an example, let's say I want to get the command name from a $PID (please note this is just an example, I'm not suggesting this is the easiest way to get a command name from a process id - my real problem is with another command whose output format I can't control).

只是作为一个例子,假设我想要$ PID的命令名(请注意这只是一个例子,我并不是说这是最简单的方法得到一个命令的名字从一个进程id——我真正的问题是与另一个命令的输出格式我无法控制)。

If I run ps I get:

如果我运行ps,我得到:


  PID TTY          TIME CMD
11383 pts/1    00:00:00 bash
11771 pts/1    00:00:00 ps

Now I do ps | egrep 11383 and get

现在我做ps |白鹭11383。

11383 pts/1    00:00:00 bash

Next step: ps | egrep 11383 | cut -d" " -f 4. Output is:

下一步:ps |白鹭11383 |切割-d”- f4。输出是:

<absolutely nothing/>

The problem is that cut cuts the output by single spaces, and as ps adds some spaces between the 2nd and 3rd columns to keep some resemblance of a table, cut picks an empty string. Of course, I could use cut to select the 7th and not the 4th field, but how can I know, specially when the output is variable and unknown on beforehand.

问题是,cut会减少单个空格的输出,当ps在第2和第3列之间添加一些空格以保持表的一些相似性时,cut会选择一个空字符串。当然,我可以用cut来选择第7个字段,而不是第4个字段,但是我怎么知道呢,尤其是在输出是可变的、未知的情况下。

10 个解决方案

#1


121  

One easy way is to add a pass of tr to squeeze any repeated field separators out:

一种简单的方法是添加tr,将任何重复的磁场分离器挤出:

$ ps | egrep 11383 | tr -s ' ' | cut -d ' ' -f 4

#2


57  

I think the simplest way is to use awk. Example:

我认为最简单的方法是使用awk。例子:

$ echo "11383 pts/1    00:00:00 bash" | awk '{ print $4; }'
bash

#3


7  

Please note that the tr -s ' ' option will not remove any single leading spaces. If your column is right-aligned (as with ps pid)...

请注意,tr -s '选项不会删除任何单个前导空格。如果您的列是右对齐的(与ps pid)…

$ ps h -o pid,user -C ssh,sshd | tr -s " "
 1543 root
19645 root
19731 root

Then cutting will result in a blank line for some of those fields if it is the first column:

如果是第一列,切割将导致其中一些字段出现空行:

$ <previous command> | cut -d ' ' -f1

19645
19731

Unless you precede it with a space, obviously

除非你在前面加上空格

$ <command> | sed -e "s/.*/ &/" | tr -s " "

Now, for this particular case of pid numbers (not names), there is a function called pgrep:

现在,对于这个特定的pid数字(不是名字),有一个函数叫做pgrep:

$ pgrep ssh


Shell functions

However, in general it is actually still possible to use shell functions in a concise manner, because there is a neat thing about the read command:

但是,实际上仍然可以以简洁的方式使用shell函数,因为read命令有一个简洁的地方:

$ <command> | while read a b; do echo $a; done

The first parameter to read, a, selects the first column, and if there is more, everything else will be put in b. As a result, you never need more variables than the number of your column +1.

要读取的第一个参数a,选择第一列,如果有更多,所有其他的都将放入b中,因此,您永远不需要比列+1的数量更多的变量。

So,

所以,

while read a b c d; do echo $c; done

will then output the 3rd column. As indicated in my comment...

然后输出第三列。如我的评论所示……

A piped read will be executed in an environment that does not pass variables to the calling script.

在不将变量传递给调用脚本的环境中,将执行一个管道读取。

out=$(ps whatever | { read a b c d; echo $c; })

arr=($(ps whatever | { read a b c d; echo $c $b; }))
echo ${arr[1]}     # will output 'b'`


The Array Solution

So we then end up with the answer by @frayser which is to use the shell variable IFS which defaults to a space, to split the string into an array. It only works in Bash though. Dash and Ash do not support it. I have had a really hard time splitting a string into components in a Busybox thing. It is easy enough to get a single component (e.g. using awk) and then to repeat that for every parameter you need. But then you end up repeatedly calling awk on the same line, or repeatedly using a read block with echo on the same line. Which is not efficient or pretty. So you end up splitting using ${name%% *} and so on. Makes you yearn for some Python skills because in fact shell scripting is not a lot of fun anymore if half or more of the features you are accustomed to, are gone. But you can assume that even python would not be installed on such a system, and it wasn't ;-).

然后我们得到了@frayser的答案,即使用默认为空格的shell变量IFS将字符串分割成数组。它只在Bash中工作。Dash和Ash不支持它。我很难把一个字符串分解成一个Busybox的组件。很容易获得单个组件(例如使用awk),然后为您需要的每个参数重复该组件。但是,您最终会重复地在同一行上调用awk,或者重复地使用具有相同行的echo的读取块。这既不高效也不美观。因此,您最终将使用${name% *}等进行分割。让您渴望一些Python技能,因为实际上,如果您已经习惯了一半或更多的特性都消失了,shell脚本就不再那么有趣了。但是您可以假设,即使是python也不会安装在这样的系统上,而且它不是;

#4


3  

try

试一试

ps |&
while read -p first second third fourth etc ; do
   if [[ $first == '11383' ]]
   then
       echo got: $fourth
   fi       
done

#5


2  

Similar to brianegge's awk solution, here is the Perl equivalent:

类似于brianegge的awk解决方案,这里是Perl的等效项:

ps | egrep 11383 | perl -lane 'print $F[3]'

-a enables autosplit mode, which populates the @F array with the column data.
Use -F, if your data is comma-delimited, rather than space-delimited.

-a启用自动分割模式,该模式使用列数据填充@F数组。如果数据是用逗号分隔的,而不是空格分隔的,则使用-F。

Field 3 is printed since Perl starts counting from 0 rather than 1

字段3是打印的,因为Perl从0开始计数,而不是从1开始计数

#6


1  

Getting the correct line (example for line no. 6) is done with head and tail and the correct word (word no. 4) can be captured with awk:

获取正确的行(例如行号。用头尾和正确的单词(单词no)完成。4) awk可捕获:

command|head -n 6|tail -n 1|awk '{print $4}'

#7


1  

Using array variables

使用数组变量

set $(ps | egrep "^11383 "); echo $4

or

A=( $(ps | egrep "^11383 ") ) ; echo ${A[3]}

#8


0  

Instead of doing all these greps and stuff, I'd advise you to use ps capabilities of changing output format.

我建议您使用改变输出格式的ps功能,而不是做所有这些greps之类的工作。

ps -o cmd= -p 12345

You get the cmmand line of a process with the pid specified and nothing else.

您可以使用指定的pid来获得一个进程的cmmand行。

This is POSIX-conformant and may be thus considered portable.

这是符合posix的,因此可以认为是可移植的。

#9


0  

Your command

你的命令

ps | egrep 11383 | cut -d" " -f 4

misses a tr -s to squeeze spaces, as unwind explains in his answer.

在他的回答中,没有一个tr -s来挤压空间。

However, you maybe want to use awk, since it handles all of these actions in a single command:

但是,您可能希望使用awk,因为它在一个命令中处理所有这些操作:

ps | awk '/11383/ {print $4}'

This prints the 4th column in those lines containing 11383. If you want this to match 11383 if it appears in the beginning of the line, then you can say ps | awk '/^11383/ {print $4}'.

这将在包含11383的行中打印第4列。如果你想要这个匹配11383如果它出现在一行的开始处,然后你可以说ps | awk ' / ^ 11383 / {打印4美元}’。

#10


0  

Bash's set will parse all output into position parameters.

Bash的集合将把所有输出解析为位置参数。

For instance, with set $(free -h) command, echo $7 will show "Mem:"

例如,使用set $(free -h)命令,echo $7将显示“Mem:”

#1


121  

One easy way is to add a pass of tr to squeeze any repeated field separators out:

一种简单的方法是添加tr,将任何重复的磁场分离器挤出:

$ ps | egrep 11383 | tr -s ' ' | cut -d ' ' -f 4

#2


57  

I think the simplest way is to use awk. Example:

我认为最简单的方法是使用awk。例子:

$ echo "11383 pts/1    00:00:00 bash" | awk '{ print $4; }'
bash

#3


7  

Please note that the tr -s ' ' option will not remove any single leading spaces. If your column is right-aligned (as with ps pid)...

请注意,tr -s '选项不会删除任何单个前导空格。如果您的列是右对齐的(与ps pid)…

$ ps h -o pid,user -C ssh,sshd | tr -s " "
 1543 root
19645 root
19731 root

Then cutting will result in a blank line for some of those fields if it is the first column:

如果是第一列,切割将导致其中一些字段出现空行:

$ <previous command> | cut -d ' ' -f1

19645
19731

Unless you precede it with a space, obviously

除非你在前面加上空格

$ <command> | sed -e "s/.*/ &/" | tr -s " "

Now, for this particular case of pid numbers (not names), there is a function called pgrep:

现在,对于这个特定的pid数字(不是名字),有一个函数叫做pgrep:

$ pgrep ssh


Shell functions

However, in general it is actually still possible to use shell functions in a concise manner, because there is a neat thing about the read command:

但是,实际上仍然可以以简洁的方式使用shell函数,因为read命令有一个简洁的地方:

$ <command> | while read a b; do echo $a; done

The first parameter to read, a, selects the first column, and if there is more, everything else will be put in b. As a result, you never need more variables than the number of your column +1.

要读取的第一个参数a,选择第一列,如果有更多,所有其他的都将放入b中,因此,您永远不需要比列+1的数量更多的变量。

So,

所以,

while read a b c d; do echo $c; done

will then output the 3rd column. As indicated in my comment...

然后输出第三列。如我的评论所示……

A piped read will be executed in an environment that does not pass variables to the calling script.

在不将变量传递给调用脚本的环境中,将执行一个管道读取。

out=$(ps whatever | { read a b c d; echo $c; })

arr=($(ps whatever | { read a b c d; echo $c $b; }))
echo ${arr[1]}     # will output 'b'`


The Array Solution

So we then end up with the answer by @frayser which is to use the shell variable IFS which defaults to a space, to split the string into an array. It only works in Bash though. Dash and Ash do not support it. I have had a really hard time splitting a string into components in a Busybox thing. It is easy enough to get a single component (e.g. using awk) and then to repeat that for every parameter you need. But then you end up repeatedly calling awk on the same line, or repeatedly using a read block with echo on the same line. Which is not efficient or pretty. So you end up splitting using ${name%% *} and so on. Makes you yearn for some Python skills because in fact shell scripting is not a lot of fun anymore if half or more of the features you are accustomed to, are gone. But you can assume that even python would not be installed on such a system, and it wasn't ;-).

然后我们得到了@frayser的答案,即使用默认为空格的shell变量IFS将字符串分割成数组。它只在Bash中工作。Dash和Ash不支持它。我很难把一个字符串分解成一个Busybox的组件。很容易获得单个组件(例如使用awk),然后为您需要的每个参数重复该组件。但是,您最终会重复地在同一行上调用awk,或者重复地使用具有相同行的echo的读取块。这既不高效也不美观。因此,您最终将使用${name% *}等进行分割。让您渴望一些Python技能,因为实际上,如果您已经习惯了一半或更多的特性都消失了,shell脚本就不再那么有趣了。但是您可以假设,即使是python也不会安装在这样的系统上,而且它不是;

#4


3  

try

试一试

ps |&
while read -p first second third fourth etc ; do
   if [[ $first == '11383' ]]
   then
       echo got: $fourth
   fi       
done

#5


2  

Similar to brianegge's awk solution, here is the Perl equivalent:

类似于brianegge的awk解决方案,这里是Perl的等效项:

ps | egrep 11383 | perl -lane 'print $F[3]'

-a enables autosplit mode, which populates the @F array with the column data.
Use -F, if your data is comma-delimited, rather than space-delimited.

-a启用自动分割模式,该模式使用列数据填充@F数组。如果数据是用逗号分隔的,而不是空格分隔的,则使用-F。

Field 3 is printed since Perl starts counting from 0 rather than 1

字段3是打印的,因为Perl从0开始计数,而不是从1开始计数

#6


1  

Getting the correct line (example for line no. 6) is done with head and tail and the correct word (word no. 4) can be captured with awk:

获取正确的行(例如行号。用头尾和正确的单词(单词no)完成。4) awk可捕获:

command|head -n 6|tail -n 1|awk '{print $4}'

#7


1  

Using array variables

使用数组变量

set $(ps | egrep "^11383 "); echo $4

or

A=( $(ps | egrep "^11383 ") ) ; echo ${A[3]}

#8


0  

Instead of doing all these greps and stuff, I'd advise you to use ps capabilities of changing output format.

我建议您使用改变输出格式的ps功能,而不是做所有这些greps之类的工作。

ps -o cmd= -p 12345

You get the cmmand line of a process with the pid specified and nothing else.

您可以使用指定的pid来获得一个进程的cmmand行。

This is POSIX-conformant and may be thus considered portable.

这是符合posix的,因此可以认为是可移植的。

#9


0  

Your command

你的命令

ps | egrep 11383 | cut -d" " -f 4

misses a tr -s to squeeze spaces, as unwind explains in his answer.

在他的回答中,没有一个tr -s来挤压空间。

However, you maybe want to use awk, since it handles all of these actions in a single command:

但是,您可能希望使用awk,因为它在一个命令中处理所有这些操作:

ps | awk '/11383/ {print $4}'

This prints the 4th column in those lines containing 11383. If you want this to match 11383 if it appears in the beginning of the line, then you can say ps | awk '/^11383/ {print $4}'.

这将在包含11383的行中打印第4列。如果你想要这个匹配11383如果它出现在一行的开始处,然后你可以说ps | awk ' / ^ 11383 / {打印4美元}’。

#10


0  

Bash's set will parse all output into position parameters.

Bash的集合将把所有输出解析为位置参数。

For instance, with set $(free -h) command, echo $7 will show "Mem:"

例如,使用set $(free -h)命令,echo $7将显示“Mem:”