使用sed或awk重复提取两个字符串之间的文本？

I have a file called 'plainlinks' that looks like this:

我有一个名为'plainlinks'的文件，如下所示：

13080. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94092-2012.gz
13081. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94094-2012.gz
13082. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94096-2012.gz
13083. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94097-2012.gz
13084. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94098-2012.gz
13085. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94644-2012.gz
13086. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94645-2012.gz
13087. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94995-2012.gz
13088. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94996-2012.gz
13089. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-96404-2012.gz

I need to produce output that looks like this:

我需要生成如下所示的输出：

999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

5 个解决方案

#1

Using sed:

使用sed：

sed -E 's/.*\/(.*)-.*/\1/' plainlinks

Output:

输出：

999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

To save the changes to the file use the -i option:

要将更改保存到文件，请使用-i选项：

sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks

Or to save to a new file then redirect:

或者保存到新文件然后重定向：

sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt

Explanation:

说明：

s/    # subsitution
.*    # match anything
\/    # upto the last forward-slash (escaped to not confused a sed)
(.*)  # anything after the last forward-slash (captured in brackets)
-     # upto a hypen
.*    # anything else left on line
/     # end match; start replace 
\1    # the value captured in the first (only) set of brackets
/     # end

#2

Just for fun.

只是为了好玩。

awk -F\/ '{print substr($7,0,12)}' plainlinks

awk -F \ /'{print substr（$ 7,0,12）}'plainlinks

or with grep

或者用grep

grep -Eo '[0-9]{6}-[0-9]{5}' plainlinks

grep -Eo'[0-9] {6} - [0-9] {5}'plainlinks

#3

Assuming the format stays consistent as you have described, you can do it with awk:

假设格式保持一致，如您所述，您可以使用awk执行此操作：

awk 'BEGIN{FS="[/-]"; OFS="-"} {print $7, $8}' plainlinks > output_file

Output:

输出：

999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

Explanation:

说明：

awk reads your input file one line at a time, breaking each line into "fields"
awk一次读取一行输入文件，将每行分成“字段”
'BEGIN{FS="[/-]"; OFS="-"} specifies that delimiter used on the input lines should be either / or -, it also specifies that the output should be delimited by -
“BEGIN {FS = “[/ - ]”; OFS =“ - ”}指定在输入行上使用的分隔符应该是/或 - ，它还指定输出应该由 - 分隔 -
{print $7, $8}' tells awk to print the 7th and 8th field of each line, in this case 999999 and 9xxxx
{print $ 7，$ 8}'告诉awk打印每行的第7和第8个字段，在本例中为999999和9xxxx
plainlinks is the where the name of the input file would go
plainlinks是输入文件名称的去向
> output_file redirects output to a file named output_file
> output_file将输出重定向到名为output_file的文件

#4

Just with the shell's parameter expansion:

只需使用shell的参数扩展：

while IFS= read -r line; do
    tmp=${line##*noaa/}
    echo ${tmp%-????.gz}
done < plainlinks

#5

If the format stays the same, no need for sed or awk:

如果格式保持不变，则不需要sed或awk：

cat your_file | cut -d "/" -f 7- | cut -d "-" -f 1,2

#1

Using sed:

使用sed：

sed -E 's/.*\/(.*)-.*/\1/' plainlinks

Output:

输出：

999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

To save the changes to the file use the -i option:

要将更改保存到文件，请使用-i选项：

sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks

Or to save to a new file then redirect:

或者保存到新文件然后重定向：

sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt

Explanation:

说明：

s/    # subsitution
.*    # match anything
\/    # upto the last forward-slash (escaped to not confused a sed)
(.*)  # anything after the last forward-slash (captured in brackets)
-     # upto a hypen
.*    # anything else left on line
/     # end match; start replace 
\1    # the value captured in the first (only) set of brackets
/     # end

#2

Just for fun.

只是为了好玩。

awk -F\/ '{print substr($7,0,12)}' plainlinks

awk -F \ /'{print substr（$ 7,0,12）}'plainlinks

or with grep

或者用grep

grep -Eo '[0-9]{6}-[0-9]{5}' plainlinks

grep -Eo'[0-9] {6} - [0-9] {5}'plainlinks

#3

Assuming the format stays consistent as you have described, you can do it with awk:

假设格式保持一致，如您所述，您可以使用awk执行此操作：

awk 'BEGIN{FS="[/-]"; OFS="-"} {print $7, $8}' plainlinks > output_file

Output:

输出：

999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

Explanation:

说明：

awk reads your input file one line at a time, breaking each line into "fields"
awk一次读取一行输入文件，将每行分成“字段”
'BEGIN{FS="[/-]"; OFS="-"} specifies that delimiter used on the input lines should be either / or -, it also specifies that the output should be delimited by -
“BEGIN {FS = “[/ - ]”; OFS =“ - ”}指定在输入行上使用的分隔符应该是/或 - ，它还指定输出应该由 - 分隔 -
{print $7, $8}' tells awk to print the 7th and 8th field of each line, in this case 999999 and 9xxxx
{print $ 7，$ 8}'告诉awk打印每行的第7和第8个字段，在本例中为999999和9xxxx
plainlinks is the where the name of the input file would go
plainlinks是输入文件名称的去向
> output_file redirects output to a file named output_file
> output_file将输出重定向到名为output_file的文件

#4

Just with the shell's parameter expansion:

只需使用shell的参数扩展：

while IFS= read -r line; do
    tmp=${line##*noaa/}
    echo ${tmp%-????.gz}
done < plainlinks

#5

If the format stays the same, no need for sed or awk:

如果格式保持不变，则不需要sed或awk：

cat your_file | cut -d "/" -f 7- | cut -d "-" -f 1,2

秒客网

使用sed或awk重复提取两个字符串之间的文本？

5 个解决方案

#1

#2

#3

#4

#5

#1

#2

#3

#4

#5

相关文章