在Bash中将字符串拆分为数组。

时间:2021-07-27 21:31:00

In a Bash script I would like to split a line into pieces and put them into an array.

在Bash脚本中,我希望将一行分割成块,并将它们放入一个数组中。

The line:

线:

Paris, France, Europe

I would like to have them in an array like this:

我想把它们放在这样的数组中:

array[0] = Paris
array[1] = France
array[2] = Europe

I would like to use simple code, the command's speed doesn't matter. How can I do it?

我想使用简单的代码,命令的速度无关紧要。我该怎么做呢?

14 个解决方案

#1


732  

IFS=', ' read -r -a array <<< "$string"

Note that the characters in $IFS are treated individually as separators so that in this case fields may be separated by either a comma or a space rather than the sequence of the two characters. Interestingly though, empty fields aren't created when comma-space appears in the input because the space is treated specially.

请注意,$IFS中的字符被单独作为分隔符处理,以便在这种情况下,字段可以由逗号或空格分隔,而不是由两个字符的序列分隔。有趣的是,当在输入中出现逗号空间时,没有创建空字段,因为空间是专门处理的。

To access an individual element:

访问单个元素:

echo "${array[0]}"

To iterate over the elements:

迭代元素:

for element in "${array[@]}"
do
    echo "$element"
done

To get both the index and the value:

获得索引和值:

for index in "${!array[@]}"
do
    echo "$index ${array[index]}"
done

The last example is useful because Bash arrays are sparse. In other words, you can delete an element or add an element and then the indices are not contiguous.

最后一个例子很有用,因为Bash数组很稀疏。换句话说,您可以删除一个元素或添加一个元素,然后索引不是连续的。

unset "array[1]"
array[42]=Earth

To get the number of elements in an array:

要获取数组中的元素个数:

echo "${#array[@]}"

As mentioned above, arrays can be sparse so you shouldn't use the length to get the last element. Here's how you can in Bash 4.2 and later:

如上所述,数组可以是稀疏的,因此您不应该使用长度来获得最后一个元素。下面是如何在Bash 4.2和稍后:

echo "${array[-1]}"

in any version of Bash (from somewhere after 2.05b):

在任何形式的Bash(从2.05b后的某个地方):

echo "${array[@]: -1:1}"

Larger negative offsets select farther from the end of the array. Note the space before the minus sign in the older form. It is required.

较大的负偏移距在数组的末尾处选择得更远。注意前面的空格前面的负号。它是必需的。

#2


179  

Here is a way without setting IFS:

这里有一种不设置IFS的方法:

string="1:2:3:4:5"
set -f                      # avoid globbing (expansion of *).
array=(${string//:/ })
for i in "${!array[@]}"
do
    echo "$i=>${array[i]}"
done

The idea is using string replacement:

这个想法是用字符串替换:

${string//substring/replacement}

to replace all matches of $substring with white space and then using the substituted string to initialize a array:

将$substring的所有匹配替换为空白,然后使用替换字符串初始化一个数组:

(element1 element2 ... elementN)

Note: this answer makes use of the split+glob operator. Thus, to prevent expansion of some characters (such as *) it is a good idea to pause globbing for this script.

注意:这个答案使用了split+glob运算符。因此,为了防止某些字符(比如*)的扩展,应该暂停对该脚本的globbing。

#3


89  

All of the answers to this question are wrong in one way or another.

这个问题的所有答案在某种程度上都是错误的。


Wrong answer #1

错误的答案# 1

IFS=', ' read -r -a array <<< "$string"

1: This is a misuse of $IFS. The value of the $IFS variable is not taken as a single variable-length string separator, rather it is taken as a set of single-character string separators, where each field that read splits off from the input line can be terminated by any character in the set (comma or space, in this example).

1:这是对$IFS的误用。$ IFS变量的值不是作为一个单一的变长字符串分隔符,而是作为一组单个字符的字符串分隔符,其中每个字段读分裂从输入行可以终止任何字符的集合(逗号或空间,在这个例子中)。

Actually, for the real sticklers out there, the full meaning of $IFS is slightly more involved. From the bash manual:

实际上,对于真正的sticklers来说,$IFS的全部含义稍微复杂一点。从bash手册:

The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. If IFS is unset, or its value is exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters <space>, <tab>, and <newline> are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.

shell将IFS的每个字符视为分隔符,并将其他扩展的结果用这些字符作为字段终止符来分隔。如果IFS是未设置的,或者它的值是 < / > ,默认值,那么 >的序列在前一个扩展的开始和结束时都被忽略,并且任何一个IFS字符序列在开始或结束时都不能用来分隔单词。如果IFS除了默认值之外还有一个值,那么空格字符 ,和 >的序列在单词的开头和结尾都被忽略,只要空格字符是IFS(一个IFS空白字符)的值。如果IFS中的任何字符都不是IFS空格,以及任何相邻的IFS空格字符,那么就会出现一个字段。将IFS空白字符序列作为分隔符处理。如果IFS的值为null,则不会出现任何消息分裂。

Basically, for non-default non-null values of $IFS, fields can be separated with either (1) a sequence of one or more characters that are all from the set of "IFS whitespace characters" (that is, whichever of <space>, <tab>, and <newline> ("newline" meaning line feed (LF)) are present anywhere in $IFS), or (2) any non-"IFS whitespace character" that's present in $IFS along with whatever "IFS whitespace characters" surround it in the input line.

基本上,对于美元IFS的非null值,字段可以被分离与(1)的一个或多个字符序列都是设置的“IFS空格字符”(也就是说,无论 <空位> , <选项卡> ,和 <换行符> (“换行”意思换行(低频))出现在美元IFS),或(2)的任何非“IFS空格字符”出现在美元IFS连同任何“IFS空格字符”环绕在输入行。

For the OP, it's possible that the second separation mode I described in the previous paragraph is exactly what he wants for his input string, but we can be pretty confident that the first separation mode I described is not correct at all. For example, what if his input string was 'Los Angeles, United States, North America'?

对于OP,我在前一段中描述的第二种分离模式可能正是他想要的输入字符串,但是我们可以非常确信,我所描述的第一个分离模式是完全不正确的。例如,如果他的输入字符串是“Los Angeles, United States, North America”,那该怎么办?

IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a;
## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")

2: Even if you were to use this solution with a single-character separator (such as a comma by itself, that is, with no following space or other baggage), if the value of the $string variable happens to contain any LFs, then read will stop processing once it encounters the first LF. The read builtin only processes one line per invocation. This is true even if you are piping or redirecting input only to the read statement, as we are doing in this example with the here-string mechanism, and thus unprocessed input is guaranteed to be lost. The code that powers the read builtin has no knowledge of the data flow within its containing command structure.

2:即使你使用这个解决方案与一个单字符分隔符(如逗号,也就是说,没有空间或其他行李)后,如果该值的字符串变量恰好包含任何LFs,然后阅读将停止处理一旦遇到第一个低频。read builtin只处理每次调用的一行。这是正确的,即使您只对read语句进行管道或重定向输入,就像我们在这个示例中使用here-string机制所做的那样,因此未处理的输入肯定会丢失。读取builtin的代码不知道其包含的命令结构中的数据流。

You could argue that this is unlikely to cause a problem, but still, it's a subtle hazard that should be avoided if possible. It is caused by the fact that the read builtin actually does two levels of input splitting: first into lines, then into fields. Since the OP only wants one level of splitting, this usage of the read builtin is not appropriate, and we should avoid it.

你可能会认为这不太可能造成问题,但是,如果可能的话,这是一种微妙的危险。这是由读取的builtin实际上执行了两层输入拆分的原因造成的:首先是行,然后是字段。由于OP只需要一个层次的分割,所以使用read builtin是不合适的,我们应该避免使用它。

3: A non-obvious potential issue with this solution is that read always drops the trailing field if it is empty, although it preserves empty fields otherwise. Here's a demo:

3:这个解决方案的一个不明显的潜在问题是,如果它是空的,read总是会删除尾随字段,尽管它保留了空字段。这里有一个演示:

string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a;
## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")

Maybe the OP wouldn't care about this, but it's still a limitation worth knowing about. It reduces the robustness and generality of the solution.

也许OP不关心这个,但它仍然是一个值得了解的限制。它降低了解决方案的健壮性和通用性。

This problem can be solved by appending a dummy trailing delimiter to the input string just prior to feeding it to read, as I will demonstrate later.

这个问题可以通过在输入字符串之前添加一个虚拟的拖尾分隔符来解决,就像我稍后将演示的那样。


Wrong answer #2

错误的答案# 2

string="1:2:3:4:5"
set -f                     # avoid globbing (expansion of *).
array=(${string//:/ })

Similar idea:

类似的想法:

t="one,two,three"
a=($(echo $t | tr ',' "\n"))

(Note: I added the missing parentheses around the command substitution which the answerer seems to have omitted.)

(注意:我在命令替换周围添加了缺失的括号,而答案似乎省略了。)

Similar idea:

类似的想法:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)

These solutions leverage word splitting in an array assignment to split the string into fields. Funnily enough, just like read, general word splitting also uses the $IFS special variable, although in this case it is implied that it is set to its default value of <space><tab><newline>, and therefore any sequence of one or more IFS characters (which are all whitespace characters now) is considered to be a field delimiter.

这些解决方案利用数组分配中的单词分割来将字符串分割成字段。可笑的是,就像阅读,一般分词也使用$ IFS特殊变量,尽管在这种情况下这是暗示,它被设置为默认值 <空位> <选项卡> <换行符> ,因此任何序列的一个或多个IFS字符(现在所有空格字符)被认为是一个字段分隔符。

This solves the problem of two levels of splitting committed by read, since word splitting by itself constitutes only one level of splitting. But just as before, the problem here is that the individual fields in the input string can already contain $IFS characters, and thus they would be improperly split during the word splitting operation. This happens to not be the case for any of the sample input strings provided by these answerers (how convenient...), but of course that doesn't change the fact that any code base that used this idiom would then run the risk of blowing up if this assumption were ever violated at some point down the line. Once again, consider my counterexample of 'Los Angeles, United States, North America' (or 'Los Angeles:United States:North America').

这就解决了由read导致的两个层次的分裂问题,因为单词分裂本身只构成了一个层次的分裂。但是和前面一样,这里的问题是输入字符串中的各个字段可能已经包含$IFS字符,因此在拆分操作期间它们会被错误地分割。这种情况不适合任何这些回答者提供的样例输入字符串(方便…),当然这并没有改变这一事实的任何代码库使用这个成语会爆炸的风险如果这种假设违反了在某种程度上。再一次,考虑我的反例:“洛杉矶,美国,北美”(或“洛杉矶:美国:北美”)。

Also, word splitting is normally followed by filename expansion (aka pathname expansion aka globbing), which, if done, would potentially corrupt words containing the characters *, ?, or [ followed by ] (and, if extglob is set, parenthesized fragments preceded by ?, *, +, @, or !) by matching them against file system objects and expanding the words ("globs") accordingly. The first of these three answerers has cleverly undercut this problem by running set -f beforehand to disable globbing. Technically this works (although you should probably add set +f afterward to reenable globbing for subsequent code which may depend on it), but it's undesirable to have to mess with global shell settings in order to hack a basic string-to-array parsing operation in local code.

同时,分词通常是紧随其后的是文件名(即路径名即globbing)扩张,扩张,如果做,可能腐败的单词包含字符*,?,或者(随后)(如果extglob设置,括号之前碎片?,*,+,@,或!)匹配他们对文件系统对象和相应扩大词(“粘稠”)。这三个答案中的第一个巧妙地通过运行set -f来消除这个问题,从而使globbing失效。从技术上讲,这是可行的(尽管您应该在以后添加set +f来重新启用可能依赖于它的后续代码),但是为了破解本地代码中基本的字符串到数组解析操作,必须使用全局shell设置是不可取的。

Another issue with this answer is that all empty fields will be lost. This may or may not be a problem, depending on the application.

这个答案的另一个问题是,所有的空字段都将丢失。根据应用程序的不同,这可能是一个问题,也可能不是问题。

Note: If you're going to use this solution, it's better to use the ${string//:/ } "pattern substitution" form of parameter expansion, rather than going to the trouble of invoking a command substitution (which forks the shell), starting up a pipeline, and running an external executable (tr or sed), since parameter expansion is purely a shell-internal operation. (Also, for the tr and sed solutions, the input variable should be double-quoted inside the command substitution; otherwise word splitting would take effect in the echo command and potentially mess with the field values. Also, the $(...) form of command substitution is preferable to the old `...` form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)

注意:如果你要使用这个解决方案中,最好使用$ {字符串/ /:/ }“模式替换”形式的参数扩展,而不是将调用命令替换的麻烦(叉shell),启动一个管道,并运行一个外部可执行(tr或sed),由于参数扩展是纯粹shell-internal操作。(同样,对于tr和sed解决方案,输入变量应该在命令替换中被重复引用;否则,单词拆分将在echo命令中生效,并可能会打乱字段值。另外,$(…)命令替换的形式比旧的更好。因为它简化了命令替换的嵌套,并允许文本编辑器更好的语法高亮显示。


Wrong answer #3

错误的答案# 3

str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

This answer is almost the same as #2. The difference is that the answerer has made the assumption that the fields are delimited by two characters, one of which being represented in the default $IFS, and the other not. He has solved this rather specific case by removing the non-IFS-represented character using a pattern substitution expansion and then using word splitting to split the fields on the surviving IFS-represented delimiter character.

这个答案几乎和#2一样。不同之处在于,应答器假设字段被两个字符分隔开,其中一个字符在默认$IFS中表示,另一个不表示。他通过使用模式替换扩展来移除非ifs表示的字符,然后使用单词拆分来拆分幸存的ifs -表示的分隔符字符,从而解决了这个相当具体的问题。

This is not a very generic solution. Furthermore, it can be argued that the comma is really the "primary" delimiter character here, and that stripping it and then depending on the space character for field splitting is simply wrong. Once again, consider my counterexample: 'Los Angeles, United States, North America'.

这不是一个非常通用的解决方案。此外,可以认为,逗号实际上是这里的“主”分隔符,而将其剥离,然后根据字段划分的空间字符,这是完全错误的。再一次,考虑我的反例:“洛杉矶,美国,北美”。

Also, again, filename expansion could corrupt the expanded words, but this can be prevented by temporarily disabling globbing for the assignment with set -f and then set +f.

同样,文件名扩展可能会损坏扩展的单词,但是可以通过设置-f并设置+f来临时禁用“globbing”,从而避免这一点。

Also, again, all empty fields will be lost, which may or may not be a problem depending on the application.

同样,所有的空字段都将丢失,这可能是或可能不是问题,取决于应用程序。


Wrong answer #4

错误的答案# 4

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

This is similar to #2 and #3 in that it uses word splitting to get the job done, only now the code explicitly sets $IFS to contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw with previous wrong answers, and there is only one level of splitting, as required.

这类似于#2和#3,因为它使用了单词拆分来完成任务,现在只有代码显式地设置$IFS来只包含输入字符串中存在的单字符字段分隔符。应该重复的是,这不能用于多字符字段分隔符,例如OP的逗号分隔符。但是对于像本例中使用的LF这样的单字符分隔符,它实际上接近完美。在我们看到之前错误的答案时,字段不能在中间被无意地分割,并且只有一个级别的分割,这是必需的。

One problem is that filename expansion will corrupt affected words as described earlier, although once again this can be solved by wrapping the critical statement in set -f and set +f.

一个问题是,文件名扩展将会像前面描述的那样损坏受影响的单词,尽管这可以通过在set -f和set +f中包装关键语句来解决。

Another potential problem is that, since LF qualifies as an "IFS whitespace character" as defined earlier, all empty fields will be lost, just as in #2 and #3. This would of course not be a problem if the delimiter happens to be a non-"IFS whitespace character", and depending on the application it may not matter anyway, but it does vitiate the generality of the solution.

另一个潜在的问题是,由于LF符合前面定义的“IFS空白字符”,所有的空字段都将丢失,就像#2和#3一样。如果分隔符恰好是一个非“IFS空白字符”,那么这当然不是一个问题,而且根据应用程序的不同,它可能无关紧要,但是它确实破坏了解决方案的通用性。

So, to sum up, assuming you have a one-character delimiter, and it is either a non-"IFS whitespace character" or you don't care about empty fields, and you wrap the critical statement in set -f and set +f, then this solution works, but otherwise not.

总结一下,假设你有一个字符分隔符,它不是一个“IFS空白字符”,或者你不关心空字段,然后在set -f和set +f中包装关键语句,然后这个解决方案有效,但除此之外没有。

(Also, for information's sake, assigning a LF to a variable in bash can be done more easily with the $'...' syntax, e.g. IFS=$'\n';.)

(同时,为了信息的缘故,在bash中为变量分配一个LF可以更容易地使用$'…'语法,例如IFS = $ ' \ n”。)


Wrong answer #5

错误的答案# 5

countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

Similar idea:

类似的想法:

IFS=', ' eval 'array=($string)'

This solution is effectively a cross between #1 (in that it sets $IFS to comma-space) and #2-4 (in that it uses word splitting to split the string into fields). Because of this, it suffers from most of the problems that afflict all of the above wrong answers, sort of like the worst of all worlds.

这个解决方案实际上是#1(在它将$IFS设置为逗号空间)和#2-4(在它使用单词拆分来将字符串分割为字段)之间的一个交叉。正因为如此,它遭受了许多困扰着所有错误答案的问题,就像世界上最糟糕的问题一样。

Also, regarding the second variant, it may seem like the eval call is completely unnecessary, since its argument is a single-quoted string literal, and therefore is statically known. But there's actually a very non-obvious benefit to using eval in this way. Normally, when you run a simple command which consists of a variable assignment only, meaning without an actual command word following it, the assignment takes effect in the shell environment:

另外,对于第二个变体,它可能看起来像eval调用完全没有必要,因为它的参数是一个单引号字符串文字,因此静态地知道。但是用这种方法使用eval实际上有一个非常不明显的好处。通常,当您运行一个简单的命令,该命令只包含一个变量赋值,意思是没有一个实际的命令字之后,赋值将在shell环境中生效:

IFS=', '; ## changes $IFS in the shell environment

This is true even if the simple command involves multiple variable assignments; again, as long as there's no command word, all variable assignments affect the shell environment:

即使简单的命令涉及多个变量赋值,这也是正确的;同样,只要没有命令字,所有的变量赋值都会影响shell环境:

IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environment

But, if the variable assignment is attached to a command name (I like to call this a "prefix assignment") then it does not affect the shell environment, and instead only affects the environment of the executed command, regardless whether it is a builtin or external:

但是,如果将变量赋值附加到一个命令名(我喜欢称之为“前缀赋值”),那么它不会影响shell环境,而只会影响执行命令的环境,不管它是构建还是外部:

IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it
IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive it

Relevant quote from the bash manual:

来自bash手册的相关引用:

If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.

如果没有命令名结果,则变量赋值会影响当前的shell环境。否则,变量将被添加到执行命令的环境中,不会影响当前的shell环境。

It is possible to exploit this feature of variable assignment to change $IFS only temporarily, which allows us to avoid the whole save-and-restore gambit like that which is being done with the $OIFS variable in the first variant. But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not involve a command word to make the $IFS assignment temporary. You might think to yourself, well why not just add a no-op command word to the statement like the : builtin to make the $IFS assignment temporary? This does not work because it would then make the $array assignment temporary as well:

可以利用变量赋值的这个特性来临时改变$IFS,这样我们就可以避免像在第一个变量中使用$OIFS变量那样的整个save-还原策略。但是我们在这里面临的挑战是,我们需要运行的命令本身只是一个变量赋值,因此它不会涉及一个命令字来临时执行$IFS分配。您可能会想,为什么不直接向语句添加一个“不操作”命令,比如:builtin使$IFS赋值是临时的?这并不起作用,因为它将使$array分配成为临时的:

IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : command

So, we're effectively at an impasse, a bit of a catch-22. But, when eval runs its code, it runs it in the shell environment, as if it was normal, static source code, and therefore we can run the $array assignment inside the eval argument to have it take effect in the shell environment, while the $IFS prefix assignment that is prefixed to the eval command will not outlive the eval command. This is exactly the trick that is being used in the second variant of this solution:

所以,我们实际上陷入了僵局,有点像第22条。但eval运行代码时,它运行在shell环境中,好像是正常的,静态源代码,因此我们可以运行中的数组分配美元eval参数生效shell环境,而美元IFS前缀分配前缀的eval命令不会比eval命令。这正是这个解决方案的第二种变体所使用的技巧:

IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array does

So, as you can see, it's actually quite a clever trick, and accomplishes exactly what is required (at least with respect to assignment effectation) in a rather non-obvious way. I'm actually not against this trick in general, despite the involvement of eval; just be careful to single-quote the argument string to guard against security threats.

因此,正如您所看到的,这实际上是一个非常聪明的技巧,并且完成了一种相当不明显的要求(至少是对分配效果的要求)。实际上,我并不反对这个技巧,尽管它涉及到eval;要注意单引号,以防止安全威胁。

But again, because of the "worst of all worlds" agglomeration of problems, this is still a wrong answer to the OP's requirement.

但是,由于“世界上最糟糕的”问题聚集在一起,这仍然是对OP的要求的错误答案。


Wrong answer #6

错误的答案# 6

IFS=', '; array=(Paris, France, Europe)

IFS=' ';declare -a array=(Paris France Europe)

Um... what? The OP has a string variable that needs to be parsed into an array. This "answer" starts with the verbatim contents of the input string pasted into an array literal. I guess that's one way to do it.

嗯…什么?OP有一个字符串变量,需要将其解析为一个数组。这个“答案”开始时,输入字符串的逐字内容被粘贴到一个数组文本中。我想这是一种方法。

It looks like the answerer may have assumed that the $IFS variable affects all bash parsing in all contexts, which is not true. From the bash manual:

看起来,答案可能是假设$IFS变量影响所有上下文中的所有bash解析,这是不正确的。从bash手册:

IFS    The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is <space><tab><newline>.

IFS是内部字段分隔符,用于在扩展后进行单词拆分,并将行拆分为单词与read builtin命令。默认值是

So the $IFS special variable is actually only used in two contexts: (1) word splitting that is performed after expansion (meaning not when parsing bash source code) and (2) for splitting input lines into words by the read builtin.

因此,$IFS特殊变量实际上只在两个上下文中使用:(1)在扩展后执行的单词分割(在解析bash源代码时不执行)和(2)将输入行拆分为read builtin的单词。

Let me try to make this clearer. I think it might be good to draw a distinction between parsing and execution. Bash must first parse the source code, which obviously is a parsing event, and then later it executes the code, which is when expansion comes into the picture. Expansion is really an execution event. Furthermore, I take issue with the description of the $IFS variable that I just quoted above; rather than saying that word splitting is performed after expansion, I would say that word splitting is performed during expansion, or, perhaps even more precisely, word splitting is part of the expansion process. The phrase "word splitting" refers only to this step of expansion; it should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the words "split" and "words" a lot. Here's a relevant excerpt from the linux.die.net version of the bash manual:

让我讲清楚一点。我认为在解析和执行之间进行区分可能很好。Bash必须首先解析源代码,这显然是一个解析事件,然后它执行代码,当扩展进入图片时。扩展实际上是一个执行事件。此外,我还对刚才引用的$IFS变量的描述进行了讨论;我不是说在扩展后执行单词拆分,而是在扩展过程中执行单词拆分,或者更准确地说,单词拆分是扩展过程的一部分。短语“分词”仅指这一扩张的步骤;它不应该用来指对bash源代码的解析,尽管很不幸的是,文档似乎会把“split”和“words”这两个词放在一起,这是一个有关bash手册的链接。

Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion.

扩展是在命令行上执行的,在它被拆分为单词之后。有7种扩展:支撑扩展、波浪扩展、参数和变量扩展、命令替换、算术扩展、分词和路径名扩展。

The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.

扩张的顺序是:支撑膨胀;tilde扩展、参数和变量扩展、算术扩展和命令替换(以从左到右的方式进行);分词;和路径名扩张。

You could argue the GNU version of the manual does slightly better, since it opts for the word "tokens" instead of "words" in the first sentence of the Expansion section:

你可以争辩说,GNU版本的手册做得稍微好一点,因为它选择了“代币”,而不是扩展部分第一句中的“单词”。

Expansion is performed on the command line after it has been split into tokens.

扩展是在命令行上执行的,在它被分割成令牌之后。

The important point is, $IFS does not change the way bash parses source code. Parsing of bash source code is actually a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule; for example, see the various compatxx shell settings, which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text that was parsed right off the source bytestream.

重要的一点是,$IFS不会改变bash解析源代码的方式。对bash源代码的解析实际上是一个非常复杂的过程,包括对shell语法的各种元素的识别,如命令序列、命令列表、管道、参数扩展、算术替换和命令替换。在大多数情况下,bash解析过程不能被像变量赋值这样的用户级操作改变(实际上,这条规则有一些小的例外;例如,可以看到各种compatxx shell设置,它可以动态地改变解析行为的某些方面。然后,根据上面的文档摘录中分解的“扩展”的一般过程,将这个复杂解析过程产生的上游“单词”/“令牌”展开,将扩展的(扩展的?)文本拆分为下游的单词,仅仅是这个过程的一个步骤。单词拆分只涉及从前面的扩展步骤中吐出的文本;它不会影响直接从源bytestream解析的文本文本。


Wrong answer #7

错误的答案# 7

string='first line
        second line
        third line'

while read -r line; do lines+=("$line"); done <<<"$string"

This is one of the best solutions. Notice that we're back to using read. Didn't I say earlier that read is inappropriate because it performs two levels of splitting, when we only need one? The trick here is that you can call read in such a way that it effectively only does one level of splitting, specifically by splitting off only one field per invocation, which necessitates the cost of having to call it repeatedly in a loop. It's a bit of a sleight of hand, but it works.

这是最好的解决方案之一。注意,我们回到了使用read。我之前没有说过,读是不合适的,因为当我们只需要一个时,它就会执行两个层次的分裂。这里的技巧是,您可以调用read,它实际上只执行一个级别的拆分,具体来说就是每次调用只分离一个字段,这就需要在循环中多次调用它。这是一种手法,但很有效。

But there are problems. First: When you provide at least one NAME argument to read, it automatically ignores leading and trailing whitespace in each field that is split off from the input string. This occurs whether $IFS is set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case, and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will want this. There is a solution, however: A somewhat non-obvious usage of read is to pass zero NAME arguments. In this case, read will store the entire input line that it gets from the input stream in a variable named $REPLY, and, as a bonus, it does not strip leading and trailing whitespace from the value. This is a very robust usage of read which I've exploited frequently in my shell programming career. Here's a demonstration of the difference in behavior:

但也有问题。首先:当您提供至少一个名称参数来读取时,它会自动忽略从输入字符串中分离的每个字段中的前导和尾随空格。这发生在$IFS是否设置为其默认值的情况下,如本文前面所述。现在,OP可能并不关心这个特定的用例,事实上,它可能是解析行为的一个理想特性。但并不是每个想要将字符串解析成字段的人都想要这个。然而,有一个解决方案:阅读的一些不明显的用法是传递zero NAME参数。在这种情况下,read将存储从输入流中获取的整个输入行,该变量名为$REPLY,并且作为一个额外的值,它不会从值中去掉引导和尾随空格。这是我在shell编程生涯中经常使用的一种非常健壮的阅读用法。这里展示了行为的不同:

string=$'  a  b  \n  c  d  \n  e  f  '; ## input string

a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a;
## declare -a a=([0]="a  b" [1]="c  d" [2]="e  f") ## read trimmed surrounding whitespace

a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="  a  b  " [1]="  c  d  " [2]="  e  f  ") ## no trimming

The second issue with this solution is that it does not actually address the case of a custom field separator, such as the OP's comma-space. As before, multicharacter separators are not supported, which is an unfortunate limitation of this solution. We could try to at least split on comma by specifying the separator to the -d option, but look what happens:

这个解决方案的第二个问题是,它实际上并没有处理自定义字段分隔符的情况,例如OP的逗号空间。与以前一样,不支持多字符分隔符,这是该解决方案的一个不幸限制。我们可以通过将分隔符指定为-d选项来尝试至少对逗号进行拆分,但是看看会发生什么:

string='Paris, France, Europe';
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France")

Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error: Europe is missing! What happened to it? The answer is that read returns a failing return code if it hits end-of-file (in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the while-loop to break prematurely and we lose the final field.

可以预见的是,未解释的周围空白被拉入了字段值,因此这需要通过修整操作来纠正(这也可以直接在while循环中完成)。但还有一个明显的错误:欧洲正在消失!发生了什么事吗?答案是,读取返回一个失败的返回代码,如果它到达文件结束(在本例中我们可以称之为字符串结束),而不会在最终字段中遇到最终的字段终止符。这导致了while循环过早地中断,我们失去了最后的字段。

Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken to be LF, which is the default when you don't specify the -d option, and the <<< ("here-string") mechanism automatically appends a LF to the string just before it feeds it as input to the command. Hence, in those cases, we sort of accidentally solved the problem of a dropped final field by unwittingly appending an additional dummy terminator to the input. Let's call this solution the "dummy-terminator" solution. We can apply the dummy-terminator solution manually for any custom delimiter by concatenating it against the input string ourselves when instantiating it in the here-string:

从技术上讲,同样的错误也困扰着前面的例子;这里的区别是,字段分隔符被当作LF,这是当您没有指定-d选项时的默认值,而<<<(“here-string”)机制在将LF作为输入发送给命令之前,自动将LF附加到字符串。因此,在这些情况下,我们不小心地通过在输入中添加了一个额外的假终止符,从而意外地解决掉了最后一个字段的问题。让我们把这个解决方案称为“dummy-terminator”解决方案。我们可以为任何自定义分隔符手动应用dumm -terminator解决方案,将其与输入字符串连接在一起,当在here-string中实例化它时:

a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; declare -p a;
declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

There, problem solved. Another solution is to only break the while-loop if both (1) read returned failure and (2) $REPLY is empty, meaning read was not able to read any characters prior to hitting end-of-file. Demo:

在那里,问题解决了。另一种解决方案是,如果两个(1)读返回失败,(2)$REPLY是空的,则只有中断while循环,这意味着read在结束文件之前不能读取任何字符。演示:

a=(); while read -rd,|| [[ -n "$REPLY" ]]; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

This approach also reveals the secretive LF that automatically gets appended to the here-string by the <<< redirection operator. It could of course be stripped off separately through an explicit trimming operation as described a moment ago, but obviously the manual dummy-terminator approach solves it directly, so we could just go with that. The manual dummy-terminator solution is actually quite convenient in that it solves both of these two problems (the dropped-final-field problem and the appended-LF problem) in one go.

这个方法还揭示了一个秘密的LF,它被< <重定向操作符(<< redirection操作符)自动添加到here-string中。当然,它可以通过像刚才描述的那样,通过一个显式的修整操作分开剥离,但是显然,手动的dummy-terminator方法可以直接解决它,所以我们可以直接使用它。手动的dummy-terminator解决方案实际上非常方便,它可以一次性解决这两个问题(dropped-final-field问题和appended-lf问题)。< p>

So, overall, this is quite a powerful solution. It's only remaining weakness is a lack of support for multicharacter delimiters, which I will address later.

所以,总的来说,这是一个非常强大的解决方案。它仅剩的缺点是缺少对多字符分隔符的支持,我将在后面介绍它。


Wrong answer #8

错误的答案# 8

string='first line
        second line
        third line'

readarray -t lines <<<"$string"

(This is actually from the same post as #7; the answerer provided two solutions in the same post.)

(这实际上是与#7相同的帖子;答案在同一篇文章中提供了两种解决方案。

The readarray builtin, which is a synonym for mapfile, is ideal. It's a builtin command which parses a bytestream into an array variable in one shot; no messing with loops, conditionals, substitutions, or anything else. And it doesn't surreptitiously strip any whitespace from the input string. And (if -O is not given) it conveniently clears the target array before assigning to it. But it's still not perfect, hence my criticism of it as a "wrong answer".

readarray builtin是mapfile的同义词,是理想的。它是一个内置命令,它将bytestream解析为一个数组变量;不要干扰循环,条件,替换,或者其他任何东西。而且它不会偷偷地从输入字符串中删除任何空格。并且(如果-O没有给出),它会在分配目标数组之前方便地清除目标数组。但它仍然不完美,因此我批评它是一个“错误的答案”。

First, just to get this out of the way, note that, just like the behavior of read when doing field-parsing, readarray drops the trailing field if it is empty. Again, this is probably not a concern for the OP, but it could be for some use-cases. I'll come back to this in a moment.

首先,请注意,就像在进行字段解析时读取的行为一样,readarray如果是空的,则删除尾随字段。同样,这可能不是OP的关注点,但它可能是一些用例。我一会儿再讲这个。

Second, as before, it does not support multicharacter delimiters. I'll give a fix for this in a moment as well.

第二,和以前一样,它不支持多字符分隔符。我一会儿也会给出一个解决方案。

Third, the solution as written does not parse the OP's input string, and in fact, it cannot be used as-is to parse it. I'll expand on this momentarily as well.

第三,编写的解决方案并没有解析OP的输入字符串,实际上,它不能用于解析它。我一会儿也会讲到。

For the above reasons, I still consider this to be a "wrong answer" to the OP's question. Below I'll give what I consider to be the right answer.

基于以上原因,我仍然认为这是对OP的问题的一个“错误的答案”。下面我将给出我认为正确的答案。


Right answer

正确的答案

Here's a naïve attempt to make #8 work by just specifying the -d option:

这里有一个简单的尝试,通过指定-d选项来做#8工作:

string='Paris, France, Europe';
readarray -td, a <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

We see the result is identical to the result we got from the double-conditional approach of the looping read solution discussed in #7. We can almost solve this with the manual dummy-terminator trick:

我们看到结果与#7中讨论的循环读取解决方案的双条件方法的结果相同。我们可以用手动的dummy-terminator技巧来解决这个问题:

readarray -td, a <<<"$string,"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe" [3]=$'\n')

The problem here is that readarray preserved the trailing field, since the <<< redirection operator appended the LF to the input string, and therefore the trailing field was not empty (otherwise it would've been dropped). We can take care of this by explicitly unsetting the final array element after-the-fact:

这里的问题是readarray保留了尾随字段,因为<< redirection操作符将LF添加到输入字符串,因此后面的字段不是空的(否则会被删除)。我们可以通过显式地取消最后的数组元素来解决这个问题:

readarray -td, a <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

The only two problems that remain, which are actually related, are (1) the extraneous whitespace that needs to be trimmed, and (2) the lack of support for multicharacter delimiters.

剩下的两个问题,实际上是相关的,是(1)需要修剪的无关的空白,以及(2)对多字符分隔符缺乏支持。

The whitespace could of course be trimmed afterward (for example, see How to trim whitespace from a Bash variable?). But if we can hack a multicharacter delimiter, then that would solve both problems in one shot.

当然,之后可以对空格进行修剪(例如,看看如何从Bash变量中削减空格)。但是,如果我们能够破解一个多字符分隔符,那么就可以一次性解决这两个问题。

Unfortunately, there's no direct way to get a multicharacter delimiter to work. The best solution I've thought of is to preprocess the input string to replace the multicharacter delimiter with a single-character delimiter that will be guaranteed not to collide with the contents of the input string. The only character that has this guarantee is the NUL byte. This is because, in bash (though not in zsh, incidentally), variables cannot contain the NUL byte. This preprocessing step can be done inline in a process substitution. Here's how to do it using awk:

不幸的是,没有直接的方法可以让多字符分隔符工作。我想到的最好的解决方案是对输入字符串进行预处理,以替换带有单字符分隔符的多字符分隔符,该分隔符将保证不会与输入字符串的内容发生冲突。唯一具有这种保证的字符是NUL字节。这是因为,在bash中(顺便说一下,在zsh中不是这样),变量不能包含NUL字节。这个预处理步骤可以在流程替换中内联。下面是如何使用awk:

readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; }' <<<"$string, "); unset 'a[-1]';
declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

There, finally! This solution will not erroneously split fields in the middle, will not cut out prematurely, will not drop empty fields, will not corrupt itself on filename expansions, will not automatically strip leading and trailing whitespace, will not leave a stowaway LF on the end, does not require loops, and does not settle for a single-character delimiter.

在那里,终于!这个解决方案不会错误地分割字段在中间,不会过早停止,不会下降空字段,不会腐败本身在文件名扩展,不会自动带前导和尾随空白,不会留下一个偷渡者低频结束,不需要循环,不满足于一个单字符分隔符。


Trimming solution

整理解决方案

Lastly, I wanted to demonstrate my own fairly intricate trimming solution using the obscure -C callback option of readarray. Unfortunately, I've run out of room against Stack Overflow's draconian 30,000 character post limit, so I won't be able to explain it. I'll leave that as an exercise for the reader.

最后,我想用readarray的模糊-C回调选项来演示我自己相当复杂的微调解决方案。不幸的是,我已经没有足够的空间来对付Stack Overflow的3万个字符限制,所以我无法解释它。我把它留给读者作为练习。

function mfcb { local val="$4"; "$1"; eval "$2[$3]=\$val;"; };
function val_ltrim { if [[ "$val" =~ ^[[:space:]]+ ]]; then val="${val:${#BASH_REMATCH[0]}}"; fi; };
function val_rtrim { if [[ "$val" =~ [[:space:]]+$ ]]; then val="${val:0:${#val}-${#BASH_REMATCH[0]}}"; fi; };
function val_trim { val_ltrim; val_rtrim; };
readarray -c1 -C 'mfcb val_trim a' -td, <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

#4


27  

Sometimes it happened to me that the method described in the accepted answer didn't work, especially if the separator is a carriage return.
In those cases I solved in this way:

有时,在我看来,被接受的答案中描述的方法不起作用,尤其是当分隔符是回车时。在那些情况下,我这样解决:

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

for line in "${lines[@]}"
    do
        echo "--> $line"
done

#5


26  

t="one,two,three"
a=($(echo "$t" | tr ',' '\n'))
echo "${a[2]}"

Prints three

打印三

#6


23  

The accepted answer works for values in one line.
If the variable has several lines:

被接受的答案在一行中的值。如果变量有几行:

string='first line
        second line
        third line'

We need a very different command to get all lines:

我们需要一个非常不同的命令来得到所有的线:

while read -r line; do lines+=("$line"); done <<<"$string"

而阅读- r线;做线+ =(“美元线”);完成了< < <字符串" $ "< p>

Or the much simpler bash readarray:

或者更简单的bash readarray:

readarray -t lines <<<"$string"

Printing all lines is very easy taking advantage of a printf feature:

打印所有的行很容易利用printf特性:

printf ">[%s]\n" "${lines[@]}"

>[first line]
>[        second line]
>[        third line]

#7


4  

This is similar to the approach by Jmoney38, but using sed:

这与Jmoney38的方法类似,但使用sed:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
echo ${array[0]}

Prints 1

打印1

#8


1  

Try this

试试这个

IFS=', '; array=(Paris, France, Europe)
for item in ${array[@]}; do echo $item; done

It's simple. If you want, you can also add a declare (and also remove the commas):

这很简单。如果需要,还可以添加声明(并删除逗号):

IFS=' ';declare -a array=(Paris France Europe)

The IFS is added to undo the above but it works without it in a fresh bash instance

添加了IFS来撤消上面的操作,但是它在一个新的bash实例中没有它。

#9


1  

The key to splitting your string into an array is the multi character delimiter of ", ". Any solution using IFS for multi character delimiters is inherently wrong since IFS is a set of those characters, not a string.

将字符串分割成数组的关键是“,”的多字符分隔符。对于多字符分隔符使用IFS的任何解决方案都是错误的,因为IFS是这些字符的集合,而不是字符串。

If you assign IFS=", " then the string will break on EITHER "," OR " " or any combination of them which is not an accurate representation of the two character delimiter of ", ".

如果你分配了IFS=",那么字符串将会中断","或" "或它们的任何组合,而不是","的两个字符分隔符的精确表示。

You can use awk or sed to split the string, with process substitution:

您可以使用awk或sed来拆分字符串,使用过程替换:

#!/bin/bash

str="Paris, France, Europe"
array=()
while read -r -d $'\0' each; do   # use a NUL terminated field separator 
    array+=("$each")
done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }')
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output

It is more efficient to use a regex you directly in Bash:

在Bash中直接使用正则表达式更有效:

#!/bin/bash

str="Paris, France, Europe"

array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
    array+=("${BASH_REMATCH[1]}")   # capture the field
    i=${#BASH_REMATCH}              # length of field + delimiter
    str=${str:i}                    # advance the string by that length
done                                # the loop deletes $str, so make a copy if needed

declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...

With the second form, there is no sub shell and it will be inherently faster.

有了第二种形式,就没有子壳了,而且它天生就更快。


Edit by bgoldst: Here are some benchmarks comparing my readarray solution to dawg's regex solution, and I also included the read solution for the heck of it (note: I slightly modified the regex solution for greater harmony with my solution) (also see my comments below the post):

下面是一些比较我的readarray解决方案和dawg的regex解决方案的基准,我还包括了它的阅读解决方案(注意:我稍微修改了regex解决方案,以使我的解决方案更协调)(也可以看到我在文章下面的评论):

## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; };
function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); };
function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\  ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; };

## helper functions
function rep {
    local -i i=-1;
    for ((i = 0; i<$1; ++i)); do
        printf %s "$2";
    done;
}; ## end rep()

function testAll {
    local funcs=();
    local args=();
    local func='';
    local -i rc=-1;
    while [[ "$1" != ':' ]]; do
        func="$1";
        if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then
            echo "bad function name: $func" >&2;
            return 2;
        fi;
        funcs+=("$func");
        shift;
    done;
    shift;
    args=("$@");
    for func in "${funcs[@]}"; do
        echo -n "$func ";
        { time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/';
        rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi;
    done| column -ts/;
}; ## end testAll()

function makeStringToSplit {
    local -i n=$1; ## number of fields
    if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi;
    if [[ $n -eq 0 ]]; then
        echo;
    elif [[ $n -eq 1 ]]; then
        echo 'first field';
    elif [[ "$n" -eq 2 ]]; then
        echo 'first field, last field';
    else
        echo "first field, $(rep $[$1-2] 'mid field, ')last field";
    fi;
}; ## end makeStringToSplit()

function testAll_splitIntoArray {
    local -i n=$1; ## number of fields in input string
    local s='';
    echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) =====";
    s="$(makeStringToSplit "$n")";
    testAll c_readarray c_read c_regex : "$s";
}; ## end testAll_splitIntoArray()

## results
testAll_splitIntoArray 1;
## ===== 1 field =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.000s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 10;
## ===== 10 fields =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.001s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 100;
## ===== 100 fields =====
## c_readarray   real  0m0.069s   user 0m0.000s   sys  0m0.062s
## c_read        real  0m0.065s   user 0m0.000s   sys  0m0.046s
## c_regex       real  0m0.005s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 1000;
## ===== 1000 fields =====
## c_readarray   real  0m0.084s   user 0m0.031s   sys  0m0.077s
## c_read        real  0m0.092s   user 0m0.031s   sys  0m0.046s
## c_regex       real  0m0.125s   user 0m0.125s   sys  0m0.000s
##
testAll_splitIntoArray 10000;
## ===== 10000 fields =====
## c_readarray   real  0m0.209s   user 0m0.093s   sys  0m0.108s
## c_read        real  0m0.333s   user 0m0.234s   sys  0m0.109s
## c_regex       real  0m9.095s   user 0m9.078s   sys  0m0.000s
##
testAll_splitIntoArray 100000;
## ===== 100000 fields =====
## c_readarray   real  0m1.460s   user 0m0.326s   sys  0m1.124s
## c_read        real  0m2.780s   user 0m1.686s   sys  0m1.092s
## c_regex       real  17m38.208s   user 15m16.359s   sys  2m19.375s
##

#10


0  

Use this:

用这个:

countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

#${array[1]} == Paris
#${array[2]} == France
#${array[3]} == Europe

#11


0  

Here's my hack!

这是我的攻击!

Splitting strings by strings is a pretty boring thing to do using bash. What happens is that we have limited approaches that only work in a few cases (split by ";", "/", "." and so on) or we have a variety of side effects in the outputs.

用字符串分割字符串是很无聊的事情。发生的情况是,我们的方法有限,只在少数情况下工作(由“;”或者我们在输出中有各种各样的副作用。

The approach below has required a number of maneuvers, but I believe it will work for most of our needs!

下面的方法需要一些机动,但我相信它将为我们的大部分需求工作!

#!/bin/bash

# --------------------------------------
# SPLIT FUNCTION
# ----------------

F_SPLIT_R=()
f_split() {
    : 'It does a "split" into a given string and returns an array.

    Args:
        TARGET_P (str): Target string to "split".
        DELIMITER_P (Optional[str]): Delimiter used to "split". If not 
    informed the split will be done by spaces.

    Returns:
        F_SPLIT_R (array): Array with the provided string separated by the 
    informed delimiter.
    '

    F_SPLIT_R=()
    TARGET_P=$1
    DELIMITER_P=$2
    if [ -z "$DELIMITER_P" ] ; then
        DELIMITER_P=" "
    fi

    REMOVE_N=1
    if [ "$DELIMITER_P" == "\n" ] ; then
        REMOVE_N=0
    fi

    # NOTE: This was the only parameter that has been a problem so far! 
    # By Questor
    # [Ref.: https://unix.stackexchange.com/a/390732/61742]
    if [ "$DELIMITER_P" == "./" ] ; then
        DELIMITER_P="[.]/"
    fi

    if [ ${REMOVE_N} -eq 1 ] ; then

        # NOTE: Due to bash limitations we have some problems getting the 
        # output of a split by awk inside an array and so we need to use 
        # "line break" (\n) to succeed. Seen this, we remove the line breaks 
        # momentarily afterwards we reintegrate them. The problem is that if 
        # there is a line break in the "string" informed, this line break will 
        # be lost, that is, it is erroneously removed in the output! 
        # By Questor
        TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")

    fi

    # NOTE: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results 
    # in more occurrences of "3F2C417D448C46918289218B7337FCAF" than the 
    # amount of "\n" that there was originally in the string (one more 
    # occurrence at the end of the string)! We can not explain the reason for 
    # this side effect. The line below corrects this problem! By Questor
    TARGET_P=${TARGET_P%????????????????????????????????}

    SPLIT_NOW=$(awk -F"$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}")

    while IFS= read -r LINE_NOW ; do
        if [ ${REMOVE_N} -eq 1 ] ; then

            # NOTE: We use "'" to prevent blank lines with no other characters 
            # in the sequence being erroneously removed! We do not know the 
            # reason for this side effect! By Questor
            LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'")

            # NOTE: We use the commands below to revert the intervention made 
            # immediately above! By Questor
            LN_NOW_WITH_N=${LN_NOW_WITH_N%?}
            LN_NOW_WITH_N=${LN_NOW_WITH_N#?}

            F_SPLIT_R+=("$LN_NOW_WITH_N")
        else
            F_SPLIT_R+=("$LINE_NOW")
        fi
    done <<< "$SPLIT_NOW"
}

# --------------------------------------
# HOW TO USE
# ----------------

STRING_TO_SPLIT="
 * How do I list all databases and tables using psql?

\"
sudo -u postgres /usr/pgsql-9.4/bin/psql -c \"\l\"
sudo -u postgres /usr/pgsql-9.4/bin/psql <DB_NAME> -c \"\dt\"
\"

\"
\list or \l: list all databases
\dt: list all tables in the current database
\"

[Ref.: https://dba.stackexchange.com/questions/1285/how-do-i-list-all-databases-and-tables-using-psql]


"

f_split "$STRING_TO_SPLIT" "bin/psql -c"

# --------------------------------------
# OUTPUT AND TEST
# ----------------

ARR_LENGTH=${#F_SPLIT_R[*]}
for (( i=0; i<=$(( $ARR_LENGTH -1 )); i++ )) ; do
    echo " > -----------------------------------------"
    echo "${F_SPLIT_R[$i]}"
    echo " < -----------------------------------------"
done

if [ "$STRING_TO_SPLIT" == "${F_SPLIT_R[0]}bin/psql -c${F_SPLIT_R[1]}" ] ; then
    echo " > -----------------------------------------"
    echo "The strings are the same!"
    echo " < -----------------------------------------"
fi

#12


-1  

Another approach can be:

另一种方法可以是:

str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

After this 'arr' is an array with four strings. This doesn't require dealing IFS or read or any other special stuff hence much simpler and direct.

在这个“arr”之后是一个有四个字符串的数组。这并不需要处理IFS或读取或任何其他特殊的东西,因此更简单直接。

#13


-1  

UPDATE: Don't do this, due to problems with eval.

更新:不要这样做,因为eval的问题。

With slightly less ceremony:

稍微不那么仪式:

IFS=', ' eval 'array=($string)'

e.g.

如。

string="foo, bar,baz"
IFS=', ' eval 'array=($string)'
echo ${array[1]} # -> bar

#14


-1  

Another way would be:

另一种方式是:

string="Paris, France, Europe"
IFS=', ' arr=(${string})

Now your elements are stored in "arr" array. To iterate through the elements:

现在,元素存储在“arr”数组中。遍历元素:

for i in ${arr[@]}; do echo $i; done

#1


732  

IFS=', ' read -r -a array <<< "$string"

Note that the characters in $IFS are treated individually as separators so that in this case fields may be separated by either a comma or a space rather than the sequence of the two characters. Interestingly though, empty fields aren't created when comma-space appears in the input because the space is treated specially.

请注意,$IFS中的字符被单独作为分隔符处理,以便在这种情况下,字段可以由逗号或空格分隔,而不是由两个字符的序列分隔。有趣的是,当在输入中出现逗号空间时,没有创建空字段,因为空间是专门处理的。

To access an individual element:

访问单个元素:

echo "${array[0]}"

To iterate over the elements:

迭代元素:

for element in "${array[@]}"
do
    echo "$element"
done

To get both the index and the value:

获得索引和值:

for index in "${!array[@]}"
do
    echo "$index ${array[index]}"
done

The last example is useful because Bash arrays are sparse. In other words, you can delete an element or add an element and then the indices are not contiguous.

最后一个例子很有用,因为Bash数组很稀疏。换句话说,您可以删除一个元素或添加一个元素,然后索引不是连续的。

unset "array[1]"
array[42]=Earth

To get the number of elements in an array:

要获取数组中的元素个数:

echo "${#array[@]}"

As mentioned above, arrays can be sparse so you shouldn't use the length to get the last element. Here's how you can in Bash 4.2 and later:

如上所述,数组可以是稀疏的,因此您不应该使用长度来获得最后一个元素。下面是如何在Bash 4.2和稍后:

echo "${array[-1]}"

in any version of Bash (from somewhere after 2.05b):

在任何形式的Bash(从2.05b后的某个地方):

echo "${array[@]: -1:1}"

Larger negative offsets select farther from the end of the array. Note the space before the minus sign in the older form. It is required.

较大的负偏移距在数组的末尾处选择得更远。注意前面的空格前面的负号。它是必需的。

#2


179  

Here is a way without setting IFS:

这里有一种不设置IFS的方法:

string="1:2:3:4:5"
set -f                      # avoid globbing (expansion of *).
array=(${string//:/ })
for i in "${!array[@]}"
do
    echo "$i=>${array[i]}"
done

The idea is using string replacement:

这个想法是用字符串替换:

${string//substring/replacement}

to replace all matches of $substring with white space and then using the substituted string to initialize a array:

将$substring的所有匹配替换为空白,然后使用替换字符串初始化一个数组:

(element1 element2 ... elementN)

Note: this answer makes use of the split+glob operator. Thus, to prevent expansion of some characters (such as *) it is a good idea to pause globbing for this script.

注意:这个答案使用了split+glob运算符。因此,为了防止某些字符(比如*)的扩展,应该暂停对该脚本的globbing。

#3


89  

All of the answers to this question are wrong in one way or another.

这个问题的所有答案在某种程度上都是错误的。


Wrong answer #1

错误的答案# 1

IFS=', ' read -r -a array <<< "$string"

1: This is a misuse of $IFS. The value of the $IFS variable is not taken as a single variable-length string separator, rather it is taken as a set of single-character string separators, where each field that read splits off from the input line can be terminated by any character in the set (comma or space, in this example).

1:这是对$IFS的误用。$ IFS变量的值不是作为一个单一的变长字符串分隔符,而是作为一组单个字符的字符串分隔符,其中每个字段读分裂从输入行可以终止任何字符的集合(逗号或空间,在这个例子中)。

Actually, for the real sticklers out there, the full meaning of $IFS is slightly more involved. From the bash manual:

实际上,对于真正的sticklers来说,$IFS的全部含义稍微复杂一点。从bash手册:

The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. If IFS is unset, or its value is exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters <space>, <tab>, and <newline> are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.

shell将IFS的每个字符视为分隔符,并将其他扩展的结果用这些字符作为字段终止符来分隔。如果IFS是未设置的,或者它的值是 < / > ,默认值,那么 >的序列在前一个扩展的开始和结束时都被忽略,并且任何一个IFS字符序列在开始或结束时都不能用来分隔单词。如果IFS除了默认值之外还有一个值,那么空格字符 ,和 >的序列在单词的开头和结尾都被忽略,只要空格字符是IFS(一个IFS空白字符)的值。如果IFS中的任何字符都不是IFS空格,以及任何相邻的IFS空格字符,那么就会出现一个字段。将IFS空白字符序列作为分隔符处理。如果IFS的值为null,则不会出现任何消息分裂。

Basically, for non-default non-null values of $IFS, fields can be separated with either (1) a sequence of one or more characters that are all from the set of "IFS whitespace characters" (that is, whichever of <space>, <tab>, and <newline> ("newline" meaning line feed (LF)) are present anywhere in $IFS), or (2) any non-"IFS whitespace character" that's present in $IFS along with whatever "IFS whitespace characters" surround it in the input line.

基本上,对于美元IFS的非null值,字段可以被分离与(1)的一个或多个字符序列都是设置的“IFS空格字符”(也就是说,无论 <空位> , <选项卡> ,和 <换行符> (“换行”意思换行(低频))出现在美元IFS),或(2)的任何非“IFS空格字符”出现在美元IFS连同任何“IFS空格字符”环绕在输入行。

For the OP, it's possible that the second separation mode I described in the previous paragraph is exactly what he wants for his input string, but we can be pretty confident that the first separation mode I described is not correct at all. For example, what if his input string was 'Los Angeles, United States, North America'?

对于OP,我在前一段中描述的第二种分离模式可能正是他想要的输入字符串,但是我们可以非常确信,我所描述的第一个分离模式是完全不正确的。例如,如果他的输入字符串是“Los Angeles, United States, North America”,那该怎么办?

IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a;
## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")

2: Even if you were to use this solution with a single-character separator (such as a comma by itself, that is, with no following space or other baggage), if the value of the $string variable happens to contain any LFs, then read will stop processing once it encounters the first LF. The read builtin only processes one line per invocation. This is true even if you are piping or redirecting input only to the read statement, as we are doing in this example with the here-string mechanism, and thus unprocessed input is guaranteed to be lost. The code that powers the read builtin has no knowledge of the data flow within its containing command structure.

2:即使你使用这个解决方案与一个单字符分隔符(如逗号,也就是说,没有空间或其他行李)后,如果该值的字符串变量恰好包含任何LFs,然后阅读将停止处理一旦遇到第一个低频。read builtin只处理每次调用的一行。这是正确的,即使您只对read语句进行管道或重定向输入,就像我们在这个示例中使用here-string机制所做的那样,因此未处理的输入肯定会丢失。读取builtin的代码不知道其包含的命令结构中的数据流。

You could argue that this is unlikely to cause a problem, but still, it's a subtle hazard that should be avoided if possible. It is caused by the fact that the read builtin actually does two levels of input splitting: first into lines, then into fields. Since the OP only wants one level of splitting, this usage of the read builtin is not appropriate, and we should avoid it.

你可能会认为这不太可能造成问题,但是,如果可能的话,这是一种微妙的危险。这是由读取的builtin实际上执行了两层输入拆分的原因造成的:首先是行,然后是字段。由于OP只需要一个层次的分割,所以使用read builtin是不合适的,我们应该避免使用它。

3: A non-obvious potential issue with this solution is that read always drops the trailing field if it is empty, although it preserves empty fields otherwise. Here's a demo:

3:这个解决方案的一个不明显的潜在问题是,如果它是空的,read总是会删除尾随字段,尽管它保留了空字段。这里有一个演示:

string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a;
## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")

Maybe the OP wouldn't care about this, but it's still a limitation worth knowing about. It reduces the robustness and generality of the solution.

也许OP不关心这个,但它仍然是一个值得了解的限制。它降低了解决方案的健壮性和通用性。

This problem can be solved by appending a dummy trailing delimiter to the input string just prior to feeding it to read, as I will demonstrate later.

这个问题可以通过在输入字符串之前添加一个虚拟的拖尾分隔符来解决,就像我稍后将演示的那样。


Wrong answer #2

错误的答案# 2

string="1:2:3:4:5"
set -f                     # avoid globbing (expansion of *).
array=(${string//:/ })

Similar idea:

类似的想法:

t="one,two,three"
a=($(echo $t | tr ',' "\n"))

(Note: I added the missing parentheses around the command substitution which the answerer seems to have omitted.)

(注意:我在命令替换周围添加了缺失的括号,而答案似乎省略了。)

Similar idea:

类似的想法:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)

These solutions leverage word splitting in an array assignment to split the string into fields. Funnily enough, just like read, general word splitting also uses the $IFS special variable, although in this case it is implied that it is set to its default value of <space><tab><newline>, and therefore any sequence of one or more IFS characters (which are all whitespace characters now) is considered to be a field delimiter.

这些解决方案利用数组分配中的单词分割来将字符串分割成字段。可笑的是,就像阅读,一般分词也使用$ IFS特殊变量,尽管在这种情况下这是暗示,它被设置为默认值 <空位> <选项卡> <换行符> ,因此任何序列的一个或多个IFS字符(现在所有空格字符)被认为是一个字段分隔符。

This solves the problem of two levels of splitting committed by read, since word splitting by itself constitutes only one level of splitting. But just as before, the problem here is that the individual fields in the input string can already contain $IFS characters, and thus they would be improperly split during the word splitting operation. This happens to not be the case for any of the sample input strings provided by these answerers (how convenient...), but of course that doesn't change the fact that any code base that used this idiom would then run the risk of blowing up if this assumption were ever violated at some point down the line. Once again, consider my counterexample of 'Los Angeles, United States, North America' (or 'Los Angeles:United States:North America').

这就解决了由read导致的两个层次的分裂问题,因为单词分裂本身只构成了一个层次的分裂。但是和前面一样,这里的问题是输入字符串中的各个字段可能已经包含$IFS字符,因此在拆分操作期间它们会被错误地分割。这种情况不适合任何这些回答者提供的样例输入字符串(方便…),当然这并没有改变这一事实的任何代码库使用这个成语会爆炸的风险如果这种假设违反了在某种程度上。再一次,考虑我的反例:“洛杉矶,美国,北美”(或“洛杉矶:美国:北美”)。

Also, word splitting is normally followed by filename expansion (aka pathname expansion aka globbing), which, if done, would potentially corrupt words containing the characters *, ?, or [ followed by ] (and, if extglob is set, parenthesized fragments preceded by ?, *, +, @, or !) by matching them against file system objects and expanding the words ("globs") accordingly. The first of these three answerers has cleverly undercut this problem by running set -f beforehand to disable globbing. Technically this works (although you should probably add set +f afterward to reenable globbing for subsequent code which may depend on it), but it's undesirable to have to mess with global shell settings in order to hack a basic string-to-array parsing operation in local code.

同时,分词通常是紧随其后的是文件名(即路径名即globbing)扩张,扩张,如果做,可能腐败的单词包含字符*,?,或者(随后)(如果extglob设置,括号之前碎片?,*,+,@,或!)匹配他们对文件系统对象和相应扩大词(“粘稠”)。这三个答案中的第一个巧妙地通过运行set -f来消除这个问题,从而使globbing失效。从技术上讲,这是可行的(尽管您应该在以后添加set +f来重新启用可能依赖于它的后续代码),但是为了破解本地代码中基本的字符串到数组解析操作,必须使用全局shell设置是不可取的。

Another issue with this answer is that all empty fields will be lost. This may or may not be a problem, depending on the application.

这个答案的另一个问题是,所有的空字段都将丢失。根据应用程序的不同,这可能是一个问题,也可能不是问题。

Note: If you're going to use this solution, it's better to use the ${string//:/ } "pattern substitution" form of parameter expansion, rather than going to the trouble of invoking a command substitution (which forks the shell), starting up a pipeline, and running an external executable (tr or sed), since parameter expansion is purely a shell-internal operation. (Also, for the tr and sed solutions, the input variable should be double-quoted inside the command substitution; otherwise word splitting would take effect in the echo command and potentially mess with the field values. Also, the $(...) form of command substitution is preferable to the old `...` form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)

注意:如果你要使用这个解决方案中,最好使用$ {字符串/ /:/ }“模式替换”形式的参数扩展,而不是将调用命令替换的麻烦(叉shell),启动一个管道,并运行一个外部可执行(tr或sed),由于参数扩展是纯粹shell-internal操作。(同样,对于tr和sed解决方案,输入变量应该在命令替换中被重复引用;否则,单词拆分将在echo命令中生效,并可能会打乱字段值。另外,$(…)命令替换的形式比旧的更好。因为它简化了命令替换的嵌套,并允许文本编辑器更好的语法高亮显示。


Wrong answer #3

错误的答案# 3

str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

This answer is almost the same as #2. The difference is that the answerer has made the assumption that the fields are delimited by two characters, one of which being represented in the default $IFS, and the other not. He has solved this rather specific case by removing the non-IFS-represented character using a pattern substitution expansion and then using word splitting to split the fields on the surviving IFS-represented delimiter character.

这个答案几乎和#2一样。不同之处在于,应答器假设字段被两个字符分隔开,其中一个字符在默认$IFS中表示,另一个不表示。他通过使用模式替换扩展来移除非ifs表示的字符,然后使用单词拆分来拆分幸存的ifs -表示的分隔符字符,从而解决了这个相当具体的问题。

This is not a very generic solution. Furthermore, it can be argued that the comma is really the "primary" delimiter character here, and that stripping it and then depending on the space character for field splitting is simply wrong. Once again, consider my counterexample: 'Los Angeles, United States, North America'.

这不是一个非常通用的解决方案。此外,可以认为,逗号实际上是这里的“主”分隔符,而将其剥离,然后根据字段划分的空间字符,这是完全错误的。再一次,考虑我的反例:“洛杉矶,美国,北美”。

Also, again, filename expansion could corrupt the expanded words, but this can be prevented by temporarily disabling globbing for the assignment with set -f and then set +f.

同样,文件名扩展可能会损坏扩展的单词,但是可以通过设置-f并设置+f来临时禁用“globbing”,从而避免这一点。

Also, again, all empty fields will be lost, which may or may not be a problem depending on the application.

同样,所有的空字段都将丢失,这可能是或可能不是问题,取决于应用程序。


Wrong answer #4

错误的答案# 4

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

This is similar to #2 and #3 in that it uses word splitting to get the job done, only now the code explicitly sets $IFS to contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw with previous wrong answers, and there is only one level of splitting, as required.

这类似于#2和#3,因为它使用了单词拆分来完成任务,现在只有代码显式地设置$IFS来只包含输入字符串中存在的单字符字段分隔符。应该重复的是,这不能用于多字符字段分隔符,例如OP的逗号分隔符。但是对于像本例中使用的LF这样的单字符分隔符,它实际上接近完美。在我们看到之前错误的答案时,字段不能在中间被无意地分割,并且只有一个级别的分割,这是必需的。

One problem is that filename expansion will corrupt affected words as described earlier, although once again this can be solved by wrapping the critical statement in set -f and set +f.

一个问题是,文件名扩展将会像前面描述的那样损坏受影响的单词,尽管这可以通过在set -f和set +f中包装关键语句来解决。

Another potential problem is that, since LF qualifies as an "IFS whitespace character" as defined earlier, all empty fields will be lost, just as in #2 and #3. This would of course not be a problem if the delimiter happens to be a non-"IFS whitespace character", and depending on the application it may not matter anyway, but it does vitiate the generality of the solution.

另一个潜在的问题是,由于LF符合前面定义的“IFS空白字符”,所有的空字段都将丢失,就像#2和#3一样。如果分隔符恰好是一个非“IFS空白字符”,那么这当然不是一个问题,而且根据应用程序的不同,它可能无关紧要,但是它确实破坏了解决方案的通用性。

So, to sum up, assuming you have a one-character delimiter, and it is either a non-"IFS whitespace character" or you don't care about empty fields, and you wrap the critical statement in set -f and set +f, then this solution works, but otherwise not.

总结一下,假设你有一个字符分隔符,它不是一个“IFS空白字符”,或者你不关心空字段,然后在set -f和set +f中包装关键语句,然后这个解决方案有效,但除此之外没有。

(Also, for information's sake, assigning a LF to a variable in bash can be done more easily with the $'...' syntax, e.g. IFS=$'\n';.)

(同时,为了信息的缘故,在bash中为变量分配一个LF可以更容易地使用$'…'语法,例如IFS = $ ' \ n”。)


Wrong answer #5

错误的答案# 5

countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

Similar idea:

类似的想法:

IFS=', ' eval 'array=($string)'

This solution is effectively a cross between #1 (in that it sets $IFS to comma-space) and #2-4 (in that it uses word splitting to split the string into fields). Because of this, it suffers from most of the problems that afflict all of the above wrong answers, sort of like the worst of all worlds.

这个解决方案实际上是#1(在它将$IFS设置为逗号空间)和#2-4(在它使用单词拆分来将字符串分割为字段)之间的一个交叉。正因为如此,它遭受了许多困扰着所有错误答案的问题,就像世界上最糟糕的问题一样。

Also, regarding the second variant, it may seem like the eval call is completely unnecessary, since its argument is a single-quoted string literal, and therefore is statically known. But there's actually a very non-obvious benefit to using eval in this way. Normally, when you run a simple command which consists of a variable assignment only, meaning without an actual command word following it, the assignment takes effect in the shell environment:

另外,对于第二个变体,它可能看起来像eval调用完全没有必要,因为它的参数是一个单引号字符串文字,因此静态地知道。但是用这种方法使用eval实际上有一个非常不明显的好处。通常,当您运行一个简单的命令,该命令只包含一个变量赋值,意思是没有一个实际的命令字之后,赋值将在shell环境中生效:

IFS=', '; ## changes $IFS in the shell environment

This is true even if the simple command involves multiple variable assignments; again, as long as there's no command word, all variable assignments affect the shell environment:

即使简单的命令涉及多个变量赋值,这也是正确的;同样,只要没有命令字,所有的变量赋值都会影响shell环境:

IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environment

But, if the variable assignment is attached to a command name (I like to call this a "prefix assignment") then it does not affect the shell environment, and instead only affects the environment of the executed command, regardless whether it is a builtin or external:

但是,如果将变量赋值附加到一个命令名(我喜欢称之为“前缀赋值”),那么它不会影响shell环境,而只会影响执行命令的环境,不管它是构建还是外部:

IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it
IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive it

Relevant quote from the bash manual:

来自bash手册的相关引用:

If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.

如果没有命令名结果,则变量赋值会影响当前的shell环境。否则,变量将被添加到执行命令的环境中,不会影响当前的shell环境。

It is possible to exploit this feature of variable assignment to change $IFS only temporarily, which allows us to avoid the whole save-and-restore gambit like that which is being done with the $OIFS variable in the first variant. But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not involve a command word to make the $IFS assignment temporary. You might think to yourself, well why not just add a no-op command word to the statement like the : builtin to make the $IFS assignment temporary? This does not work because it would then make the $array assignment temporary as well:

可以利用变量赋值的这个特性来临时改变$IFS,这样我们就可以避免像在第一个变量中使用$OIFS变量那样的整个save-还原策略。但是我们在这里面临的挑战是,我们需要运行的命令本身只是一个变量赋值,因此它不会涉及一个命令字来临时执行$IFS分配。您可能会想,为什么不直接向语句添加一个“不操作”命令,比如:builtin使$IFS赋值是临时的?这并不起作用,因为它将使$array分配成为临时的:

IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : command

So, we're effectively at an impasse, a bit of a catch-22. But, when eval runs its code, it runs it in the shell environment, as if it was normal, static source code, and therefore we can run the $array assignment inside the eval argument to have it take effect in the shell environment, while the $IFS prefix assignment that is prefixed to the eval command will not outlive the eval command. This is exactly the trick that is being used in the second variant of this solution:

所以,我们实际上陷入了僵局,有点像第22条。但eval运行代码时,它运行在shell环境中,好像是正常的,静态源代码,因此我们可以运行中的数组分配美元eval参数生效shell环境,而美元IFS前缀分配前缀的eval命令不会比eval命令。这正是这个解决方案的第二种变体所使用的技巧:

IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array does

So, as you can see, it's actually quite a clever trick, and accomplishes exactly what is required (at least with respect to assignment effectation) in a rather non-obvious way. I'm actually not against this trick in general, despite the involvement of eval; just be careful to single-quote the argument string to guard against security threats.

因此,正如您所看到的,这实际上是一个非常聪明的技巧,并且完成了一种相当不明显的要求(至少是对分配效果的要求)。实际上,我并不反对这个技巧,尽管它涉及到eval;要注意单引号,以防止安全威胁。

But again, because of the "worst of all worlds" agglomeration of problems, this is still a wrong answer to the OP's requirement.

但是,由于“世界上最糟糕的”问题聚集在一起,这仍然是对OP的要求的错误答案。


Wrong answer #6

错误的答案# 6

IFS=', '; array=(Paris, France, Europe)

IFS=' ';declare -a array=(Paris France Europe)

Um... what? The OP has a string variable that needs to be parsed into an array. This "answer" starts with the verbatim contents of the input string pasted into an array literal. I guess that's one way to do it.

嗯…什么?OP有一个字符串变量,需要将其解析为一个数组。这个“答案”开始时,输入字符串的逐字内容被粘贴到一个数组文本中。我想这是一种方法。

It looks like the answerer may have assumed that the $IFS variable affects all bash parsing in all contexts, which is not true. From the bash manual:

看起来,答案可能是假设$IFS变量影响所有上下文中的所有bash解析,这是不正确的。从bash手册:

IFS    The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is <space><tab><newline>.

IFS是内部字段分隔符,用于在扩展后进行单词拆分,并将行拆分为单词与read builtin命令。默认值是

So the $IFS special variable is actually only used in two contexts: (1) word splitting that is performed after expansion (meaning not when parsing bash source code) and (2) for splitting input lines into words by the read builtin.

因此,$IFS特殊变量实际上只在两个上下文中使用:(1)在扩展后执行的单词分割(在解析bash源代码时不执行)和(2)将输入行拆分为read builtin的单词。

Let me try to make this clearer. I think it might be good to draw a distinction between parsing and execution. Bash must first parse the source code, which obviously is a parsing event, and then later it executes the code, which is when expansion comes into the picture. Expansion is really an execution event. Furthermore, I take issue with the description of the $IFS variable that I just quoted above; rather than saying that word splitting is performed after expansion, I would say that word splitting is performed during expansion, or, perhaps even more precisely, word splitting is part of the expansion process. The phrase "word splitting" refers only to this step of expansion; it should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the words "split" and "words" a lot. Here's a relevant excerpt from the linux.die.net version of the bash manual:

让我讲清楚一点。我认为在解析和执行之间进行区分可能很好。Bash必须首先解析源代码,这显然是一个解析事件,然后它执行代码,当扩展进入图片时。扩展实际上是一个执行事件。此外,我还对刚才引用的$IFS变量的描述进行了讨论;我不是说在扩展后执行单词拆分,而是在扩展过程中执行单词拆分,或者更准确地说,单词拆分是扩展过程的一部分。短语“分词”仅指这一扩张的步骤;它不应该用来指对bash源代码的解析,尽管很不幸的是,文档似乎会把“split”和“words”这两个词放在一起,这是一个有关bash手册的链接。

Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion.

扩展是在命令行上执行的,在它被拆分为单词之后。有7种扩展:支撑扩展、波浪扩展、参数和变量扩展、命令替换、算术扩展、分词和路径名扩展。

The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.

扩张的顺序是:支撑膨胀;tilde扩展、参数和变量扩展、算术扩展和命令替换(以从左到右的方式进行);分词;和路径名扩张。

You could argue the GNU version of the manual does slightly better, since it opts for the word "tokens" instead of "words" in the first sentence of the Expansion section:

你可以争辩说,GNU版本的手册做得稍微好一点,因为它选择了“代币”,而不是扩展部分第一句中的“单词”。

Expansion is performed on the command line after it has been split into tokens.

扩展是在命令行上执行的,在它被分割成令牌之后。

The important point is, $IFS does not change the way bash parses source code. Parsing of bash source code is actually a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule; for example, see the various compatxx shell settings, which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text that was parsed right off the source bytestream.

重要的一点是,$IFS不会改变bash解析源代码的方式。对bash源代码的解析实际上是一个非常复杂的过程,包括对shell语法的各种元素的识别,如命令序列、命令列表、管道、参数扩展、算术替换和命令替换。在大多数情况下,bash解析过程不能被像变量赋值这样的用户级操作改变(实际上,这条规则有一些小的例外;例如,可以看到各种compatxx shell设置,它可以动态地改变解析行为的某些方面。然后,根据上面的文档摘录中分解的“扩展”的一般过程,将这个复杂解析过程产生的上游“单词”/“令牌”展开,将扩展的(扩展的?)文本拆分为下游的单词,仅仅是这个过程的一个步骤。单词拆分只涉及从前面的扩展步骤中吐出的文本;它不会影响直接从源bytestream解析的文本文本。


Wrong answer #7

错误的答案# 7

string='first line
        second line
        third line'

while read -r line; do lines+=("$line"); done <<<"$string"

This is one of the best solutions. Notice that we're back to using read. Didn't I say earlier that read is inappropriate because it performs two levels of splitting, when we only need one? The trick here is that you can call read in such a way that it effectively only does one level of splitting, specifically by splitting off only one field per invocation, which necessitates the cost of having to call it repeatedly in a loop. It's a bit of a sleight of hand, but it works.

这是最好的解决方案之一。注意,我们回到了使用read。我之前没有说过,读是不合适的,因为当我们只需要一个时,它就会执行两个层次的分裂。这里的技巧是,您可以调用read,它实际上只执行一个级别的拆分,具体来说就是每次调用只分离一个字段,这就需要在循环中多次调用它。这是一种手法,但很有效。

But there are problems. First: When you provide at least one NAME argument to read, it automatically ignores leading and trailing whitespace in each field that is split off from the input string. This occurs whether $IFS is set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case, and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will want this. There is a solution, however: A somewhat non-obvious usage of read is to pass zero NAME arguments. In this case, read will store the entire input line that it gets from the input stream in a variable named $REPLY, and, as a bonus, it does not strip leading and trailing whitespace from the value. This is a very robust usage of read which I've exploited frequently in my shell programming career. Here's a demonstration of the difference in behavior:

但也有问题。首先:当您提供至少一个名称参数来读取时,它会自动忽略从输入字符串中分离的每个字段中的前导和尾随空格。这发生在$IFS是否设置为其默认值的情况下,如本文前面所述。现在,OP可能并不关心这个特定的用例,事实上,它可能是解析行为的一个理想特性。但并不是每个想要将字符串解析成字段的人都想要这个。然而,有一个解决方案:阅读的一些不明显的用法是传递zero NAME参数。在这种情况下,read将存储从输入流中获取的整个输入行,该变量名为$REPLY,并且作为一个额外的值,它不会从值中去掉引导和尾随空格。这是我在shell编程生涯中经常使用的一种非常健壮的阅读用法。这里展示了行为的不同:

string=$'  a  b  \n  c  d  \n  e  f  '; ## input string

a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a;
## declare -a a=([0]="a  b" [1]="c  d" [2]="e  f") ## read trimmed surrounding whitespace

a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="  a  b  " [1]="  c  d  " [2]="  e  f  ") ## no trimming

The second issue with this solution is that it does not actually address the case of a custom field separator, such as the OP's comma-space. As before, multicharacter separators are not supported, which is an unfortunate limitation of this solution. We could try to at least split on comma by specifying the separator to the -d option, but look what happens:

这个解决方案的第二个问题是,它实际上并没有处理自定义字段分隔符的情况,例如OP的逗号空间。与以前一样,不支持多字符分隔符,这是该解决方案的一个不幸限制。我们可以通过将分隔符指定为-d选项来尝试至少对逗号进行拆分,但是看看会发生什么:

string='Paris, France, Europe';
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France")

Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error: Europe is missing! What happened to it? The answer is that read returns a failing return code if it hits end-of-file (in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the while-loop to break prematurely and we lose the final field.

可以预见的是,未解释的周围空白被拉入了字段值,因此这需要通过修整操作来纠正(这也可以直接在while循环中完成)。但还有一个明显的错误:欧洲正在消失!发生了什么事吗?答案是,读取返回一个失败的返回代码,如果它到达文件结束(在本例中我们可以称之为字符串结束),而不会在最终字段中遇到最终的字段终止符。这导致了while循环过早地中断,我们失去了最后的字段。

Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken to be LF, which is the default when you don't specify the -d option, and the <<< ("here-string") mechanism automatically appends a LF to the string just before it feeds it as input to the command. Hence, in those cases, we sort of accidentally solved the problem of a dropped final field by unwittingly appending an additional dummy terminator to the input. Let's call this solution the "dummy-terminator" solution. We can apply the dummy-terminator solution manually for any custom delimiter by concatenating it against the input string ourselves when instantiating it in the here-string:

从技术上讲,同样的错误也困扰着前面的例子;这里的区别是,字段分隔符被当作LF,这是当您没有指定-d选项时的默认值,而<<<(“here-string”)机制在将LF作为输入发送给命令之前,自动将LF附加到字符串。因此,在这些情况下,我们不小心地通过在输入中添加了一个额外的假终止符,从而意外地解决掉了最后一个字段的问题。让我们把这个解决方案称为“dummy-terminator”解决方案。我们可以为任何自定义分隔符手动应用dumm -terminator解决方案,将其与输入字符串连接在一起,当在here-string中实例化它时:

a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; declare -p a;
declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

There, problem solved. Another solution is to only break the while-loop if both (1) read returned failure and (2) $REPLY is empty, meaning read was not able to read any characters prior to hitting end-of-file. Demo:

在那里,问题解决了。另一种解决方案是,如果两个(1)读返回失败,(2)$REPLY是空的,则只有中断while循环,这意味着read在结束文件之前不能读取任何字符。演示:

a=(); while read -rd,|| [[ -n "$REPLY" ]]; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

This approach also reveals the secretive LF that automatically gets appended to the here-string by the <<< redirection operator. It could of course be stripped off separately through an explicit trimming operation as described a moment ago, but obviously the manual dummy-terminator approach solves it directly, so we could just go with that. The manual dummy-terminator solution is actually quite convenient in that it solves both of these two problems (the dropped-final-field problem and the appended-LF problem) in one go.

这个方法还揭示了一个秘密的LF,它被< <重定向操作符(<< redirection操作符)自动添加到here-string中。当然,它可以通过像刚才描述的那样,通过一个显式的修整操作分开剥离,但是显然,手动的dummy-terminator方法可以直接解决它,所以我们可以直接使用它。手动的dummy-terminator解决方案实际上非常方便,它可以一次性解决这两个问题(dropped-final-field问题和appended-lf问题)。< p>

So, overall, this is quite a powerful solution. It's only remaining weakness is a lack of support for multicharacter delimiters, which I will address later.

所以,总的来说,这是一个非常强大的解决方案。它仅剩的缺点是缺少对多字符分隔符的支持,我将在后面介绍它。


Wrong answer #8

错误的答案# 8

string='first line
        second line
        third line'

readarray -t lines <<<"$string"

(This is actually from the same post as #7; the answerer provided two solutions in the same post.)

(这实际上是与#7相同的帖子;答案在同一篇文章中提供了两种解决方案。

The readarray builtin, which is a synonym for mapfile, is ideal. It's a builtin command which parses a bytestream into an array variable in one shot; no messing with loops, conditionals, substitutions, or anything else. And it doesn't surreptitiously strip any whitespace from the input string. And (if -O is not given) it conveniently clears the target array before assigning to it. But it's still not perfect, hence my criticism of it as a "wrong answer".

readarray builtin是mapfile的同义词,是理想的。它是一个内置命令,它将bytestream解析为一个数组变量;不要干扰循环,条件,替换,或者其他任何东西。而且它不会偷偷地从输入字符串中删除任何空格。并且(如果-O没有给出),它会在分配目标数组之前方便地清除目标数组。但它仍然不完美,因此我批评它是一个“错误的答案”。

First, just to get this out of the way, note that, just like the behavior of read when doing field-parsing, readarray drops the trailing field if it is empty. Again, this is probably not a concern for the OP, but it could be for some use-cases. I'll come back to this in a moment.

首先,请注意,就像在进行字段解析时读取的行为一样,readarray如果是空的,则删除尾随字段。同样,这可能不是OP的关注点,但它可能是一些用例。我一会儿再讲这个。

Second, as before, it does not support multicharacter delimiters. I'll give a fix for this in a moment as well.

第二,和以前一样,它不支持多字符分隔符。我一会儿也会给出一个解决方案。

Third, the solution as written does not parse the OP's input string, and in fact, it cannot be used as-is to parse it. I'll expand on this momentarily as well.

第三,编写的解决方案并没有解析OP的输入字符串,实际上,它不能用于解析它。我一会儿也会讲到。

For the above reasons, I still consider this to be a "wrong answer" to the OP's question. Below I'll give what I consider to be the right answer.

基于以上原因,我仍然认为这是对OP的问题的一个“错误的答案”。下面我将给出我认为正确的答案。


Right answer

正确的答案

Here's a naïve attempt to make #8 work by just specifying the -d option:

这里有一个简单的尝试,通过指定-d选项来做#8工作:

string='Paris, France, Europe';
readarray -td, a <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

We see the result is identical to the result we got from the double-conditional approach of the looping read solution discussed in #7. We can almost solve this with the manual dummy-terminator trick:

我们看到结果与#7中讨论的循环读取解决方案的双条件方法的结果相同。我们可以用手动的dummy-terminator技巧来解决这个问题:

readarray -td, a <<<"$string,"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe" [3]=$'\n')

The problem here is that readarray preserved the trailing field, since the <<< redirection operator appended the LF to the input string, and therefore the trailing field was not empty (otherwise it would've been dropped). We can take care of this by explicitly unsetting the final array element after-the-fact:

这里的问题是readarray保留了尾随字段,因为<< redirection操作符将LF添加到输入字符串,因此后面的字段不是空的(否则会被删除)。我们可以通过显式地取消最后的数组元素来解决这个问题:

readarray -td, a <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

The only two problems that remain, which are actually related, are (1) the extraneous whitespace that needs to be trimmed, and (2) the lack of support for multicharacter delimiters.

剩下的两个问题,实际上是相关的,是(1)需要修剪的无关的空白,以及(2)对多字符分隔符缺乏支持。

The whitespace could of course be trimmed afterward (for example, see How to trim whitespace from a Bash variable?). But if we can hack a multicharacter delimiter, then that would solve both problems in one shot.

当然,之后可以对空格进行修剪(例如,看看如何从Bash变量中削减空格)。但是,如果我们能够破解一个多字符分隔符,那么就可以一次性解决这两个问题。

Unfortunately, there's no direct way to get a multicharacter delimiter to work. The best solution I've thought of is to preprocess the input string to replace the multicharacter delimiter with a single-character delimiter that will be guaranteed not to collide with the contents of the input string. The only character that has this guarantee is the NUL byte. This is because, in bash (though not in zsh, incidentally), variables cannot contain the NUL byte. This preprocessing step can be done inline in a process substitution. Here's how to do it using awk:

不幸的是,没有直接的方法可以让多字符分隔符工作。我想到的最好的解决方案是对输入字符串进行预处理,以替换带有单字符分隔符的多字符分隔符,该分隔符将保证不会与输入字符串的内容发生冲突。唯一具有这种保证的字符是NUL字节。这是因为,在bash中(顺便说一下,在zsh中不是这样),变量不能包含NUL字节。这个预处理步骤可以在流程替换中内联。下面是如何使用awk:

readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; }' <<<"$string, "); unset 'a[-1]';
declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

There, finally! This solution will not erroneously split fields in the middle, will not cut out prematurely, will not drop empty fields, will not corrupt itself on filename expansions, will not automatically strip leading and trailing whitespace, will not leave a stowaway LF on the end, does not require loops, and does not settle for a single-character delimiter.

在那里,终于!这个解决方案不会错误地分割字段在中间,不会过早停止,不会下降空字段,不会腐败本身在文件名扩展,不会自动带前导和尾随空白,不会留下一个偷渡者低频结束,不需要循环,不满足于一个单字符分隔符。


Trimming solution

整理解决方案

Lastly, I wanted to demonstrate my own fairly intricate trimming solution using the obscure -C callback option of readarray. Unfortunately, I've run out of room against Stack Overflow's draconian 30,000 character post limit, so I won't be able to explain it. I'll leave that as an exercise for the reader.

最后,我想用readarray的模糊-C回调选项来演示我自己相当复杂的微调解决方案。不幸的是,我已经没有足够的空间来对付Stack Overflow的3万个字符限制,所以我无法解释它。我把它留给读者作为练习。

function mfcb { local val="$4"; "$1"; eval "$2[$3]=\$val;"; };
function val_ltrim { if [[ "$val" =~ ^[[:space:]]+ ]]; then val="${val:${#BASH_REMATCH[0]}}"; fi; };
function val_rtrim { if [[ "$val" =~ [[:space:]]+$ ]]; then val="${val:0:${#val}-${#BASH_REMATCH[0]}}"; fi; };
function val_trim { val_ltrim; val_rtrim; };
readarray -c1 -C 'mfcb val_trim a' -td, <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

#4


27  

Sometimes it happened to me that the method described in the accepted answer didn't work, especially if the separator is a carriage return.
In those cases I solved in this way:

有时,在我看来,被接受的答案中描述的方法不起作用,尤其是当分隔符是回车时。在那些情况下,我这样解决:

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

for line in "${lines[@]}"
    do
        echo "--> $line"
done

#5


26  

t="one,two,three"
a=($(echo "$t" | tr ',' '\n'))
echo "${a[2]}"

Prints three

打印三

#6


23  

The accepted answer works for values in one line.
If the variable has several lines:

被接受的答案在一行中的值。如果变量有几行:

string='first line
        second line
        third line'

We need a very different command to get all lines:

我们需要一个非常不同的命令来得到所有的线:

while read -r line; do lines+=("$line"); done <<<"$string"

而阅读- r线;做线+ =(“美元线”);完成了< < <字符串" $ "< p>

Or the much simpler bash readarray:

或者更简单的bash readarray:

readarray -t lines <<<"$string"

Printing all lines is very easy taking advantage of a printf feature:

打印所有的行很容易利用printf特性:

printf ">[%s]\n" "${lines[@]}"

>[first line]
>[        second line]
>[        third line]

#7


4  

This is similar to the approach by Jmoney38, but using sed:

这与Jmoney38的方法类似,但使用sed:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
echo ${array[0]}

Prints 1

打印1

#8


1  

Try this

试试这个

IFS=', '; array=(Paris, France, Europe)
for item in ${array[@]}; do echo $item; done

It's simple. If you want, you can also add a declare (and also remove the commas):

这很简单。如果需要,还可以添加声明(并删除逗号):

IFS=' ';declare -a array=(Paris France Europe)

The IFS is added to undo the above but it works without it in a fresh bash instance

添加了IFS来撤消上面的操作,但是它在一个新的bash实例中没有它。

#9


1  

The key to splitting your string into an array is the multi character delimiter of ", ". Any solution using IFS for multi character delimiters is inherently wrong since IFS is a set of those characters, not a string.

将字符串分割成数组的关键是“,”的多字符分隔符。对于多字符分隔符使用IFS的任何解决方案都是错误的,因为IFS是这些字符的集合,而不是字符串。

If you assign IFS=", " then the string will break on EITHER "," OR " " or any combination of them which is not an accurate representation of the two character delimiter of ", ".

如果你分配了IFS=",那么字符串将会中断","或" "或它们的任何组合,而不是","的两个字符分隔符的精确表示。

You can use awk or sed to split the string, with process substitution:

您可以使用awk或sed来拆分字符串,使用过程替换:

#!/bin/bash

str="Paris, France, Europe"
array=()
while read -r -d $'\0' each; do   # use a NUL terminated field separator 
    array+=("$each")
done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }')
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output

It is more efficient to use a regex you directly in Bash:

在Bash中直接使用正则表达式更有效:

#!/bin/bash

str="Paris, France, Europe"

array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
    array+=("${BASH_REMATCH[1]}")   # capture the field
    i=${#BASH_REMATCH}              # length of field + delimiter
    str=${str:i}                    # advance the string by that length
done                                # the loop deletes $str, so make a copy if needed

declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...

With the second form, there is no sub shell and it will be inherently faster.

有了第二种形式,就没有子壳了,而且它天生就更快。


Edit by bgoldst: Here are some benchmarks comparing my readarray solution to dawg's regex solution, and I also included the read solution for the heck of it (note: I slightly modified the regex solution for greater harmony with my solution) (also see my comments below the post):

下面是一些比较我的readarray解决方案和dawg的regex解决方案的基准,我还包括了它的阅读解决方案(注意:我稍微修改了regex解决方案,以使我的解决方案更协调)(也可以看到我在文章下面的评论):

## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; };
function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); };
function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\  ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; };

## helper functions
function rep {
    local -i i=-1;
    for ((i = 0; i<$1; ++i)); do
        printf %s "$2";
    done;
}; ## end rep()

function testAll {
    local funcs=();
    local args=();
    local func='';
    local -i rc=-1;
    while [[ "$1" != ':' ]]; do
        func="$1";
        if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then
            echo "bad function name: $func" >&2;
            return 2;
        fi;
        funcs+=("$func");
        shift;
    done;
    shift;
    args=("$@");
    for func in "${funcs[@]}"; do
        echo -n "$func ";
        { time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/';
        rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi;
    done| column -ts/;
}; ## end testAll()

function makeStringToSplit {
    local -i n=$1; ## number of fields
    if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi;
    if [[ $n -eq 0 ]]; then
        echo;
    elif [[ $n -eq 1 ]]; then
        echo 'first field';
    elif [[ "$n" -eq 2 ]]; then
        echo 'first field, last field';
    else
        echo "first field, $(rep $[$1-2] 'mid field, ')last field";
    fi;
}; ## end makeStringToSplit()

function testAll_splitIntoArray {
    local -i n=$1; ## number of fields in input string
    local s='';
    echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) =====";
    s="$(makeStringToSplit "$n")";
    testAll c_readarray c_read c_regex : "$s";
}; ## end testAll_splitIntoArray()

## results
testAll_splitIntoArray 1;
## ===== 1 field =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.000s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 10;
## ===== 10 fields =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.001s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 100;
## ===== 100 fields =====
## c_readarray   real  0m0.069s   user 0m0.000s   sys  0m0.062s
## c_read        real  0m0.065s   user 0m0.000s   sys  0m0.046s
## c_regex       real  0m0.005s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 1000;
## ===== 1000 fields =====
## c_readarray   real  0m0.084s   user 0m0.031s   sys  0m0.077s
## c_read        real  0m0.092s   user 0m0.031s   sys  0m0.046s
## c_regex       real  0m0.125s   user 0m0.125s   sys  0m0.000s
##
testAll_splitIntoArray 10000;
## ===== 10000 fields =====
## c_readarray   real  0m0.209s   user 0m0.093s   sys  0m0.108s
## c_read        real  0m0.333s   user 0m0.234s   sys  0m0.109s
## c_regex       real  0m9.095s   user 0m9.078s   sys  0m0.000s
##
testAll_splitIntoArray 100000;
## ===== 100000 fields =====
## c_readarray   real  0m1.460s   user 0m0.326s   sys  0m1.124s
## c_read        real  0m2.780s   user 0m1.686s   sys  0m1.092s
## c_regex       real  17m38.208s   user 15m16.359s   sys  2m19.375s
##

#10


0  

Use this:

用这个:

countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

#${array[1]} == Paris
#${array[2]} == France
#${array[3]} == Europe

#11


0  

Here's my hack!

这是我的攻击!

Splitting strings by strings is a pretty boring thing to do using bash. What happens is that we have limited approaches that only work in a few cases (split by ";", "/", "." and so on) or we have a variety of side effects in the outputs.

用字符串分割字符串是很无聊的事情。发生的情况是,我们的方法有限,只在少数情况下工作(由“;”或者我们在输出中有各种各样的副作用。

The approach below has required a number of maneuvers, but I believe it will work for most of our needs!

下面的方法需要一些机动,但我相信它将为我们的大部分需求工作!

#!/bin/bash

# --------------------------------------
# SPLIT FUNCTION
# ----------------

F_SPLIT_R=()
f_split() {
    : 'It does a "split" into a given string and returns an array.

    Args:
        TARGET_P (str): Target string to "split".
        DELIMITER_P (Optional[str]): Delimiter used to "split". If not 
    informed the split will be done by spaces.

    Returns:
        F_SPLIT_R (array): Array with the provided string separated by the 
    informed delimiter.
    '

    F_SPLIT_R=()
    TARGET_P=$1
    DELIMITER_P=$2
    if [ -z "$DELIMITER_P" ] ; then
        DELIMITER_P=" "
    fi

    REMOVE_N=1
    if [ "$DELIMITER_P" == "\n" ] ; then
        REMOVE_N=0
    fi

    # NOTE: This was the only parameter that has been a problem so far! 
    # By Questor
    # [Ref.: https://unix.stackexchange.com/a/390732/61742]
    if [ "$DELIMITER_P" == "./" ] ; then
        DELIMITER_P="[.]/"
    fi

    if [ ${REMOVE_N} -eq 1 ] ; then

        # NOTE: Due to bash limitations we have some problems getting the 
        # output of a split by awk inside an array and so we need to use 
        # "line break" (\n) to succeed. Seen this, we remove the line breaks 
        # momentarily afterwards we reintegrate them. The problem is that if 
        # there is a line break in the "string" informed, this line break will 
        # be lost, that is, it is erroneously removed in the output! 
        # By Questor
        TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")

    fi

    # NOTE: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results 
    # in more occurrences of "3F2C417D448C46918289218B7337FCAF" than the 
    # amount of "\n" that there was originally in the string (one more 
    # occurrence at the end of the string)! We can not explain the reason for 
    # this side effect. The line below corrects this problem! By Questor
    TARGET_P=${TARGET_P%????????????????????????????????}

    SPLIT_NOW=$(awk -F"$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}")

    while IFS= read -r LINE_NOW ; do
        if [ ${REMOVE_N} -eq 1 ] ; then

            # NOTE: We use "'" to prevent blank lines with no other characters 
            # in the sequence being erroneously removed! We do not know the 
            # reason for this side effect! By Questor
            LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'")

            # NOTE: We use the commands below to revert the intervention made 
            # immediately above! By Questor
            LN_NOW_WITH_N=${LN_NOW_WITH_N%?}
            LN_NOW_WITH_N=${LN_NOW_WITH_N#?}

            F_SPLIT_R+=("$LN_NOW_WITH_N")
        else
            F_SPLIT_R+=("$LINE_NOW")
        fi
    done <<< "$SPLIT_NOW"
}

# --------------------------------------
# HOW TO USE
# ----------------

STRING_TO_SPLIT="
 * How do I list all databases and tables using psql?

\"
sudo -u postgres /usr/pgsql-9.4/bin/psql -c \"\l\"
sudo -u postgres /usr/pgsql-9.4/bin/psql <DB_NAME> -c \"\dt\"
\"

\"
\list or \l: list all databases
\dt: list all tables in the current database
\"

[Ref.: https://dba.stackexchange.com/questions/1285/how-do-i-list-all-databases-and-tables-using-psql]


"

f_split "$STRING_TO_SPLIT" "bin/psql -c"

# --------------------------------------
# OUTPUT AND TEST
# ----------------

ARR_LENGTH=${#F_SPLIT_R[*]}
for (( i=0; i<=$(( $ARR_LENGTH -1 )); i++ )) ; do
    echo " > -----------------------------------------"
    echo "${F_SPLIT_R[$i]}"
    echo " < -----------------------------------------"
done

if [ "$STRING_TO_SPLIT" == "${F_SPLIT_R[0]}bin/psql -c${F_SPLIT_R[1]}" ] ; then
    echo " > -----------------------------------------"
    echo "The strings are the same!"
    echo " < -----------------------------------------"
fi

#12


-1  

Another approach can be:

另一种方法可以是:

str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

After this 'arr' is an array with four strings. This doesn't require dealing IFS or read or any other special stuff hence much simpler and direct.

在这个“arr”之后是一个有四个字符串的数组。这并不需要处理IFS或读取或任何其他特殊的东西,因此更简单直接。

#13


-1  

UPDATE: Don't do this, due to problems with eval.

更新:不要这样做,因为eval的问题。

With slightly less ceremony:

稍微不那么仪式:

IFS=', ' eval 'array=($string)'

e.g.

如。

string="foo, bar,baz"
IFS=', ' eval 'array=($string)'
echo ${array[1]} # -> bar

#14


-1  

Another way would be:

另一种方式是:

string="Paris, France, Europe"
IFS=', ' arr=(${string})

Now your elements are stored in "arr" array. To iterate through the elements:

现在,元素存储在“arr”数组中。遍历元素:

for i in ${arr[@]}; do echo $i; done