I have a file called 'plainlinks' that looks like this:
我有一个名为'plainlinks'的文件,如下所示:
13080. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94092-2012.gz
13081. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94094-2012.gz
13082. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94096-2012.gz
13083. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94097-2012.gz
13084. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94098-2012.gz
13085. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94644-2012.gz
13086. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94645-2012.gz
13087. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94995-2012.gz
13088. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94996-2012.gz
13089. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-96404-2012.gz
I need to produce output that looks like this:
我需要生成如下所示的输出:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
5 个解决方案
#1
11
Using sed
:
使用sed:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks
Output:
输出:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
To save the changes to the file use the -i
option:
要将更改保存到文件,请使用-i选项:
sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks
Or to save to a new file then redirect:
或者保存到新文件然后重定向:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt
Explanation:
说明:
s/ # subsitution
.* # match anything
\/ # upto the last forward-slash (escaped to not confused a sed)
(.*) # anything after the last forward-slash (captured in brackets)
- # upto a hypen
.* # anything else left on line
/ # end match; start replace
\1 # the value captured in the first (only) set of brackets
/ # end
#2
7
Just for fun.
只是为了好玩。
awk -F\/ '{print substr($7,0,12)}' plainlinks
awk -F \ /'{print substr($ 7,0,12)}'plainlinks
or with grep
或者用grep
grep -Eo '[0-9]{6}-[0-9]{5}' plainlinks
grep -Eo'[0-9] {6} - [0-9] {5}'plainlinks
#3
4
Assuming the format stays consistent as you have described, you can do it with awk
:
假设格式保持一致,如您所述,您可以使用awk执行此操作:
awk 'BEGIN{FS="[/-]"; OFS="-"} {print $7, $8}' plainlinks > output_file
Output:
输出:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
Explanation:
说明:
-
awk
reads your input file one line at a time, breaking each line into "fields" - awk一次读取一行输入文件,将每行分成“字段”
-
'BEGIN{FS="[/-]"; OFS="-"}
specifies that delimiter used on the input lines should be either/
or-
, it also specifies that the output should be delimited by-
- “BEGIN {FS = “[/ - ]”; OFS =“ - ”}指定在输入行上使用的分隔符应该是/或 - ,它还指定输出应该由 - 分隔 -
-
{print $7, $8}'
tells awk to print the 7th and 8th field of each line, in this case999999
and9xxxx
- {print $ 7,$ 8}'告诉awk打印每行的第7和第8个字段,在本例中为999999和9xxxx
-
plainlinks
is the where the name of the input file would go - plainlinks是输入文件名称的去向
-
> output_file
redirects output to a file namedoutput_file
- > output_file将输出重定向到名为output_file的文件
#4
4
Just with the shell's parameter expansion:
只需使用shell的参数扩展:
while IFS= read -r line; do
tmp=${line##*noaa/}
echo ${tmp%-????.gz}
done < plainlinks
#5
1
If the format stays the same, no need for sed or awk:
如果格式保持不变,则不需要sed或awk:
cat your_file | cut -d "/" -f 7- | cut -d "-" -f 1,2
#1
11
Using sed
:
使用sed:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks
Output:
输出:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
To save the changes to the file use the -i
option:
要将更改保存到文件,请使用-i选项:
sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks
Or to save to a new file then redirect:
或者保存到新文件然后重定向:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt
Explanation:
说明:
s/ # subsitution
.* # match anything
\/ # upto the last forward-slash (escaped to not confused a sed)
(.*) # anything after the last forward-slash (captured in brackets)
- # upto a hypen
.* # anything else left on line
/ # end match; start replace
\1 # the value captured in the first (only) set of brackets
/ # end
#2
7
Just for fun.
只是为了好玩。
awk -F\/ '{print substr($7,0,12)}' plainlinks
awk -F \ /'{print substr($ 7,0,12)}'plainlinks
or with grep
或者用grep
grep -Eo '[0-9]{6}-[0-9]{5}' plainlinks
grep -Eo'[0-9] {6} - [0-9] {5}'plainlinks
#3
4
Assuming the format stays consistent as you have described, you can do it with awk
:
假设格式保持一致,如您所述,您可以使用awk执行此操作:
awk 'BEGIN{FS="[/-]"; OFS="-"} {print $7, $8}' plainlinks > output_file
Output:
输出:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
Explanation:
说明:
-
awk
reads your input file one line at a time, breaking each line into "fields" - awk一次读取一行输入文件,将每行分成“字段”
-
'BEGIN{FS="[/-]"; OFS="-"}
specifies that delimiter used on the input lines should be either/
or-
, it also specifies that the output should be delimited by-
- “BEGIN {FS = “[/ - ]”; OFS =“ - ”}指定在输入行上使用的分隔符应该是/或 - ,它还指定输出应该由 - 分隔 -
-
{print $7, $8}'
tells awk to print the 7th and 8th field of each line, in this case999999
and9xxxx
- {print $ 7,$ 8}'告诉awk打印每行的第7和第8个字段,在本例中为999999和9xxxx
-
plainlinks
is the where the name of the input file would go - plainlinks是输入文件名称的去向
-
> output_file
redirects output to a file namedoutput_file
- > output_file将输出重定向到名为output_file的文件
#4
4
Just with the shell's parameter expansion:
只需使用shell的参数扩展:
while IFS= read -r line; do
tmp=${line##*noaa/}
echo ${tmp%-????.gz}
done < plainlinks
#5
1
If the format stays the same, no need for sed or awk:
如果格式保持不变,则不需要sed或awk:
cat your_file | cut -d "/" -f 7- | cut -d "-" -f 1,2