使用sed提取电子邮件地址

时间:2022-03-30 16:51:09

I'm trying to become familiar with sed by extracting email address from input in the following form:

我试图通过从以下形式的输入中提取电子邮件地址来熟悉sed:

something_from.someone:user@email.com

something_from.someone:user.com

That is the input I'm sending to sed, I'm trying to remove everything up to and including ::

这是我发送给sed的输入,我正在尝试删除所有内容,包括:

sed 'd/[[alphanum:]]+[.][[:alphanum:]]+[:]//'

Based on my research, this should do it, but I'm getting this error:

根据我的研究,这应该可以做到,但我犯了这个错误:

sed: 1: "d/[[:alphanum:]]+[.][[: ...": extra characters at the end of d command

sed:1:“d /[[:alphanum:]]+[。][[:……: d命令末尾的额外字符

Any ideas as to what I'm doing incorrectly?

你知道我做错了什么吗?

1 个解决方案

#1


3  

Your delete syntax is incorrect. To delete in sed you need to do:

删除语法不正确。在sed中删除您需要做的:

sed '(separator) [pattern to delete](separator)d'

Thus, for example:

因此,例如:

sed -e '/regex/d' infile

This is for deleting whole lines generally. What you want to do instead is keep some part of the line so you need a capture-and-replace:

这是为了一般地删除整行。相反,你想做的是保留一部分内容,这样你就需要一个“捕获-替换”:

sed -e  's/regex-to-drop\(regex-to-keep\)/\1/g' input-file

The 's' is for substitute and the 'g' is for global, and the \( \) is what is captured while the \1 is where I want the captured thing to go. If I had a series of captured items,

“s”代表替换,“g”代表全局,“\(\)”代表捕获,而“\1”代表捕获对象。如果我有一系列被捕获的项目,

\(something\)\(something_else\)

I could reproduce them with another character between them by simply putting the following in the substitute part of the sed command:

只要在sed命令的替换部分中输入以下内容,我就可以用它们之间的另一个字符复制它们:

\1 ;; \2

This would produce: something ;; something_else and altogether would look like:

这将产生:某种东西;其他一些东西和整体看起来会是这样的:

sed -e 's/\(something\)\(something_else\)/\1 ;; \2/g' input-file

In your case, it looks like you want to drop everything before the colon:

在你的例子中,看起来你想把所有东西都放在冒号之前:

sed -e 's/^.*:\(.*\)$/\1/g' input-file

Footnote to the above as suggested by @fedorqui:

@fedorqui的脚注:

Sed uses standard regex notation to refer to the beginning and end of a line, so "^" refers to the beginning of the line and "$" refers to the end of the line. Thus, the complete explanation of the above is as follows:

Sed使用标准的正则表达式符号指代一行的开始和结束,所以“^”是指线的开始和结束的“美元”是指线。因此,对上述问题的完整解释如下:

's/^.*: 

Everything from the start of the line up to the colon (the "s" means we're setting up a 'substitute' command).

从行开始到冒号(“s”表示我们正在设置一个“替换”命令)。

Then:

然后:

\(.*\)$/ 

CAPTURE everything up to the end of the line, and

捕捉到一行末尾的所有内容

/\1/g'

Substitute the WHOLE line with the captured item. Do it globally (for the whole file).

用捕获的项替换整个行。全局执行(对于整个文件)。

#1


3  

Your delete syntax is incorrect. To delete in sed you need to do:

删除语法不正确。在sed中删除您需要做的:

sed '(separator) [pattern to delete](separator)d'

Thus, for example:

因此,例如:

sed -e '/regex/d' infile

This is for deleting whole lines generally. What you want to do instead is keep some part of the line so you need a capture-and-replace:

这是为了一般地删除整行。相反,你想做的是保留一部分内容,这样你就需要一个“捕获-替换”:

sed -e  's/regex-to-drop\(regex-to-keep\)/\1/g' input-file

The 's' is for substitute and the 'g' is for global, and the \( \) is what is captured while the \1 is where I want the captured thing to go. If I had a series of captured items,

“s”代表替换,“g”代表全局,“\(\)”代表捕获,而“\1”代表捕获对象。如果我有一系列被捕获的项目,

\(something\)\(something_else\)

I could reproduce them with another character between them by simply putting the following in the substitute part of the sed command:

只要在sed命令的替换部分中输入以下内容,我就可以用它们之间的另一个字符复制它们:

\1 ;; \2

This would produce: something ;; something_else and altogether would look like:

这将产生:某种东西;其他一些东西和整体看起来会是这样的:

sed -e 's/\(something\)\(something_else\)/\1 ;; \2/g' input-file

In your case, it looks like you want to drop everything before the colon:

在你的例子中,看起来你想把所有东西都放在冒号之前:

sed -e 's/^.*:\(.*\)$/\1/g' input-file

Footnote to the above as suggested by @fedorqui:

@fedorqui的脚注:

Sed uses standard regex notation to refer to the beginning and end of a line, so "^" refers to the beginning of the line and "$" refers to the end of the line. Thus, the complete explanation of the above is as follows:

Sed使用标准的正则表达式符号指代一行的开始和结束,所以“^”是指线的开始和结束的“美元”是指线。因此,对上述问题的完整解释如下:

's/^.*: 

Everything from the start of the line up to the colon (the "s" means we're setting up a 'substitute' command).

从行开始到冒号(“s”表示我们正在设置一个“替换”命令)。

Then:

然后:

\(.*\)$/ 

CAPTURE everything up to the end of the line, and

捕捉到一行末尾的所有内容

/\1/g'

Substitute the WHOLE line with the captured item. Do it globally (for the whole file).

用捕获的项替换整个行。全局执行(对于整个文件)。