I am trying to output a string that contains everything between two words of a string:
我正在尝试输出一个字符串,它包含了字符串中两个单词之间的所有内容:
input:
输入:
"Here is a String"
output:
输出:
"is a"
Using:
使用:
sed -n '/Here/,/String/p'
includes the endpoints, but I don't want to include them.
包括端点,但我不想包含它们。
10 个解决方案
#1
66
sed -e 's/Here\(.*\)String/\1/'
#2
110
Simple grep can also support positive & negative look-ahead & look-back: For your case, the command would be:
简单的grep也可以支持正面和负面的查找和查询:对于您的情况,命令将是:
echo "Here is a string" | grep -o -P '(?<=Here).*(?=string)'
#3
24
You can strip strings in Bash alone:
可以在Bash中单独使用字符串:
$ foo="Here is a String"
$ foo=${foo##*Here }
$ echo "$foo"
is a String
$ foo=${foo%% String*}
$ echo "$foo"
is a
$
And if you have a GNU grep that includes PCRE, you can use a zero-width assertion:
如果你有一个包含PCRE的GNU grep,你可以使用一个零宽度的断言:
$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a
#4
20
The accepted answer does not remove text that could be before Here
or after String
. This will:
被接受的答案不会删除在此之前或在字符串之后的文本。这将:
sed -e 's/.*Here\(.*\)String.*/\1/'
The main difference is the addition of .*
immediately before Here
and after String
.
主要的区别是添加了。*在这里和之后的字符串。
#5
16
Through GNU awk,
通过GNU awk,
$ echo "Here is a string" | awk -v FS="(Here|string)" '{print $2}'
is a
grep with -P
(perl-regexp) parameter supports \K
, which helps in discarding the previously matched characters. In our case , the previously matched string was Here
so it got discarded from the final output.
带有-P(perl-regexp)参数的grep支持\K,这有助于丢弃以前匹配的字符。在我们的例子中,先前匹配的字符串在这里,所以它从最终输出中被丢弃。
$ echo "Here is a string" | grep -oP 'Here\K.*(?=string)'
is a
$ echo "Here is a string" | grep -oP 'Here\K(?:(?!string).)*'
is a
If you want the output to be is a
then you could try the below,
如果你希望输出是a,你可以试试下面的,
$ echo "Here is a string" | grep -oP 'Here\s*\K.*(?=\s+string)'
is a
$ echo "Here is a string" | grep -oP 'Here\s*\K(?:(?!\s+string).)*'
is a
#6
15
If you have a long file with many multi-line ocurrences, it is useful to first print number lines:
如果您有一个长文件,并且有许多多行代码,那么第一个打印数字行是有用的:
cat -n file | sed -n '/Here/,/String/p'
#7
6
This might work for you (GNU sed):
这可能对你有用(GNU sed):
sed '/Here/!d;s//&\n/;s/.*\n//;:a;/String/bb;$!{n;ba};:b;s//\n&/;P;D' file
This presents each representation of text between two markers (in this instance Here
and String
) on a newline and preserves newlines within the text.
这将在新行上显示两个标记之间的文本(在这个实例中是字符串),并在文本中保留换行符。
#8
3
All the above solutions have deficiencies where the last search string is repeated elsewhere in the string. I found it best to write a bash function.
上面所有的解决方案都有不足之处,最后一个搜索字符串在字符串的其他地方重复出现。我发现最好编写bash函数。
function str_str {
local str
str="${1#*${2}}"
str="${str%%$3*}"
echo -n "$str"
}
# test it ...
mystr="this is a string"
str_str "$mystr" "this " " string"
#9
1
You can use \1
(refer to http://www.grymoire.com/Unix/Sed.html#uh-4):
您可以使用\1(请参阅http://www.grymoire.com/Unix/Sed.html# -4):
echo "Hello is a String" | sed 's/Hello\(.*\)String/\1/g'
The contents that is inside the brackets will be stored as \1
.
括号内的内容将存储为\1。
#10
0
Problem. My stored Claws Mail messages are wrapped as follows, and I am trying to extract the Subject lines:
问题。我的存储的爪子邮件是这样包装的,我试图提取主题行:
Subject: [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular
link in major cell growth pathway: Findings point to new potential
therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is
Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as
a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway
identified [Lysosomal amino acid transporter SLC38A9 signals arginine
sufficiency to mTORC1]]
Message-ID: <20171019190902.18741771@VictoriasJourney.com>
Per A2 in this thread, How to use sed/grep to extract text between two words? the first expression, below, "works" as long as the matched text does not contain a newline:
在这个线程中,每A2,如何使用sed/grep来提取两个单词之间的文本?第一个表达式,下面的“works”,只要匹配的文本不包含换行符:
grep -o -P '(?<=Subject: ).*(?=molecular)' corpus/01
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key
However, despite trying numerous variants (.+?; /s; ...
), I could not get these to work:
然而,尽管尝试了许多变体(.+?/ s;我无法让这些工作:
grep -o -P '(?<=Subject: ).*(?=link)' corpus/01
grep -o -P '(?<=Subject: ).*(?=therapeutic)' corpus/01
etc.
Solution 1.
解决方案1。
Per Extract text between two strings on different lines
每个提取文本在两个字符串之间的不同行。
sed -n '/Subject: /{:a;N;/Message-ID:/!ba; s/\n/ /g; s/\s\s*/ /g; s/.*Subject: \|Message-ID:.*//g;p}' corpus/01
which gives
这给了
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
Solution 2.*
解决方案2。*
Per How can I replace a newline (\n) using sed?
如何使用sed替换新行(\n) ?
sed ':a;N;$!ba;s/\n/ /g' corpus/01
will replace newlines with a space.
将用空格替换换行符。
Chaining that with A2 in How to use sed/grep to extract text between two words?, we get:
用A2在如何使用sed/grep来提取两个单词之间的文本?,我们得到:
sed ':a;N;$!ba;s/\n/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
which gives
这给了
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
This variant removes double spaces:
该变体删除了双空间:
sed ':a;N;$!ba;s/\n/ /g; s/\s\s*/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
giving
给
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
#1
66
sed -e 's/Here\(.*\)String/\1/'
#2
110
Simple grep can also support positive & negative look-ahead & look-back: For your case, the command would be:
简单的grep也可以支持正面和负面的查找和查询:对于您的情况,命令将是:
echo "Here is a string" | grep -o -P '(?<=Here).*(?=string)'
#3
24
You can strip strings in Bash alone:
可以在Bash中单独使用字符串:
$ foo="Here is a String"
$ foo=${foo##*Here }
$ echo "$foo"
is a String
$ foo=${foo%% String*}
$ echo "$foo"
is a
$
And if you have a GNU grep that includes PCRE, you can use a zero-width assertion:
如果你有一个包含PCRE的GNU grep,你可以使用一个零宽度的断言:
$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a
#4
20
The accepted answer does not remove text that could be before Here
or after String
. This will:
被接受的答案不会删除在此之前或在字符串之后的文本。这将:
sed -e 's/.*Here\(.*\)String.*/\1/'
The main difference is the addition of .*
immediately before Here
and after String
.
主要的区别是添加了。*在这里和之后的字符串。
#5
16
Through GNU awk,
通过GNU awk,
$ echo "Here is a string" | awk -v FS="(Here|string)" '{print $2}'
is a
grep with -P
(perl-regexp) parameter supports \K
, which helps in discarding the previously matched characters. In our case , the previously matched string was Here
so it got discarded from the final output.
带有-P(perl-regexp)参数的grep支持\K,这有助于丢弃以前匹配的字符。在我们的例子中,先前匹配的字符串在这里,所以它从最终输出中被丢弃。
$ echo "Here is a string" | grep -oP 'Here\K.*(?=string)'
is a
$ echo "Here is a string" | grep -oP 'Here\K(?:(?!string).)*'
is a
If you want the output to be is a
then you could try the below,
如果你希望输出是a,你可以试试下面的,
$ echo "Here is a string" | grep -oP 'Here\s*\K.*(?=\s+string)'
is a
$ echo "Here is a string" | grep -oP 'Here\s*\K(?:(?!\s+string).)*'
is a
#6
15
If you have a long file with many multi-line ocurrences, it is useful to first print number lines:
如果您有一个长文件,并且有许多多行代码,那么第一个打印数字行是有用的:
cat -n file | sed -n '/Here/,/String/p'
#7
6
This might work for you (GNU sed):
这可能对你有用(GNU sed):
sed '/Here/!d;s//&\n/;s/.*\n//;:a;/String/bb;$!{n;ba};:b;s//\n&/;P;D' file
This presents each representation of text between two markers (in this instance Here
and String
) on a newline and preserves newlines within the text.
这将在新行上显示两个标记之间的文本(在这个实例中是字符串),并在文本中保留换行符。
#8
3
All the above solutions have deficiencies where the last search string is repeated elsewhere in the string. I found it best to write a bash function.
上面所有的解决方案都有不足之处,最后一个搜索字符串在字符串的其他地方重复出现。我发现最好编写bash函数。
function str_str {
local str
str="${1#*${2}}"
str="${str%%$3*}"
echo -n "$str"
}
# test it ...
mystr="this is a string"
str_str "$mystr" "this " " string"
#9
1
You can use \1
(refer to http://www.grymoire.com/Unix/Sed.html#uh-4):
您可以使用\1(请参阅http://www.grymoire.com/Unix/Sed.html# -4):
echo "Hello is a String" | sed 's/Hello\(.*\)String/\1/g'
The contents that is inside the brackets will be stored as \1
.
括号内的内容将存储为\1。
#10
0
Problem. My stored Claws Mail messages are wrapped as follows, and I am trying to extract the Subject lines:
问题。我的存储的爪子邮件是这样包装的,我试图提取主题行:
Subject: [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular
link in major cell growth pathway: Findings point to new potential
therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is
Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as
a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway
identified [Lysosomal amino acid transporter SLC38A9 signals arginine
sufficiency to mTORC1]]
Message-ID: <20171019190902.18741771@VictoriasJourney.com>
Per A2 in this thread, How to use sed/grep to extract text between two words? the first expression, below, "works" as long as the matched text does not contain a newline:
在这个线程中,每A2,如何使用sed/grep来提取两个单词之间的文本?第一个表达式,下面的“works”,只要匹配的文本不包含换行符:
grep -o -P '(?<=Subject: ).*(?=molecular)' corpus/01
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key
However, despite trying numerous variants (.+?; /s; ...
), I could not get these to work:
然而,尽管尝试了许多变体(.+?/ s;我无法让这些工作:
grep -o -P '(?<=Subject: ).*(?=link)' corpus/01
grep -o -P '(?<=Subject: ).*(?=therapeutic)' corpus/01
etc.
Solution 1.
解决方案1。
Per Extract text between two strings on different lines
每个提取文本在两个字符串之间的不同行。
sed -n '/Subject: /{:a;N;/Message-ID:/!ba; s/\n/ /g; s/\s\s*/ /g; s/.*Subject: \|Message-ID:.*//g;p}' corpus/01
which gives
这给了
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
Solution 2.*
解决方案2。*
Per How can I replace a newline (\n) using sed?
如何使用sed替换新行(\n) ?
sed ':a;N;$!ba;s/\n/ /g' corpus/01
will replace newlines with a space.
将用空格替换换行符。
Chaining that with A2 in How to use sed/grep to extract text between two words?, we get:
用A2在如何使用sed/grep来提取两个单词之间的文本?,我们得到:
sed ':a;N;$!ba;s/\n/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
which gives
这给了
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
This variant removes double spaces:
该变体删除了双空间:
sed ':a;N;$!ba;s/\n/ /g; s/\s\s*/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
giving
给
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]