How do i get the names from the line like below, using regex ??
如何使用正则表达式从下面的行中获取名称?
line #1==>
Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai
line #2==>
Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav
I've tried
regex = "\s*Elector\'s\sName\:\s([[a-zA-z]*\s[a-zA-z]*\s*[a-zA-z]*]*)\s"
re.findall(regex, line)
It was working for line 1 but is not able to fetch the last name. For line 2, it only fetched 'Surpam Rajeshwar' from the last name but it actually has 3 words in it.
它适用于第1行,但无法获取姓氏。对于第2行,它只从姓氏中获取了“Surpam Rajeshwar”,但它实际上有3个单词。
I Appreciate, if someone could help me with this or suggest me a different way to get the names. !!
我很感激,如果有人可以帮我这个或者建议我采用不同的方式获取名字。 !
4 个解决方案
#1
4
You may do that without a regex by splitting with Elector's Name:
, stripping the resulting items from whitespace and dropping all empty items:
您可以在没有正则表达式的情况下通过拆分Elector的名称来执行此操作:从空白处剥离结果项并删除所有空项:
ss = ["Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai",
"Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav"]
for s in ss:
print(filter(None, [x.strip() for x in s.split("Elector's Name:")]))
See a Python demo, output:
查看Python演示,输出:
['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']
['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']
Just in case you want to study regex, here is a possible regex based solution:
如果你想学习正则表达式,这里有一个可能的正则表达式解决方案:
re.findall(r"Elector's Name:\s*(.*?)(?=\s*Elector's Name:|$)", s)
看另一个Python演示
Pattern details
-
Elector's Name:
- a literal substring -
\s*
- 0+ whitespaces -
(.*?)
- Group 1 (this value is returned byre.findall
): any 0+ chars other than line break chars (withre.DOTALL
, including them) as few as possible -
(?=\s*Elector's Name:|$)
- a positive lookahead that requires 0+ whitespaces andElector's Name:
after them or the end of string ($
) immediately to the right of the current location.
选民姓名: - 文字子串
\ s * - 0+空格
(。*?) - 组1(此值由re.findall返回):除了换行符之外的任何0+字符(使用re.DOTALL,包括它们)尽可能少
(?= \ s *选举人的姓名:| $) - 一个积极的先行者,需要0+空格和选民姓名:在他们之后或者当前位置右边的字符串($)的结尾。
#2
1
Looks that it's more a job for re.split
according on the "Elector's Name: "
text (with optional spaces before or after), chained in a list comprehension to filter out empty fields:
根据“选民姓名:”文本(前后可选空格)看起来更像re.split的工作,链接在列表理解中以过滤掉空字段:
[x for x in re.split("\s*Elector's Name:\s*",l1) if x]
with your examples I get those outputs:
用你的例子我得到那些输出:
['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']
['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']
note that you can achieve this using str.split()
chained to str.split()
as well:
请注意,您可以使用链接到str.split()的str.split()来实现此目的:
[x.strip() for x in l1.split("Elector's Name:") if x]
#3
1
If you need only to get all names maybe try .split()
with delimiter Elector's Name:
. Like :
如果您只需要获取所有名称,可以尝试.split()并使用分隔符Elector的名称:喜欢 :
names = line.split('Elector's Name:')
for i in names:
print(i)
#4
0
Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
有些人在遇到问题时会想“我知道,我会使用正则表达式”。现在他们有两个问题。
So, using python
所以,使用python
line = "Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai"
[name.strip() for name in line.split("Elector's Name:") if name != '']
#1
4
You may do that without a regex by splitting with Elector's Name:
, stripping the resulting items from whitespace and dropping all empty items:
您可以在没有正则表达式的情况下通过拆分Elector的名称来执行此操作:从空白处剥离结果项并删除所有空项:
ss = ["Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai",
"Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav"]
for s in ss:
print(filter(None, [x.strip() for x in s.split("Elector's Name:")]))
See a Python demo, output:
查看Python演示,输出:
['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']
['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']
Just in case you want to study regex, here is a possible regex based solution:
如果你想学习正则表达式,这里有一个可能的正则表达式解决方案:
re.findall(r"Elector's Name:\s*(.*?)(?=\s*Elector's Name:|$)", s)
看另一个Python演示
Pattern details
-
Elector's Name:
- a literal substring -
\s*
- 0+ whitespaces -
(.*?)
- Group 1 (this value is returned byre.findall
): any 0+ chars other than line break chars (withre.DOTALL
, including them) as few as possible -
(?=\s*Elector's Name:|$)
- a positive lookahead that requires 0+ whitespaces andElector's Name:
after them or the end of string ($
) immediately to the right of the current location.
选民姓名: - 文字子串
\ s * - 0+空格
(。*?) - 组1(此值由re.findall返回):除了换行符之外的任何0+字符(使用re.DOTALL,包括它们)尽可能少
(?= \ s *选举人的姓名:| $) - 一个积极的先行者,需要0+空格和选民姓名:在他们之后或者当前位置右边的字符串($)的结尾。
#2
1
Looks that it's more a job for re.split
according on the "Elector's Name: "
text (with optional spaces before or after), chained in a list comprehension to filter out empty fields:
根据“选民姓名:”文本(前后可选空格)看起来更像re.split的工作,链接在列表理解中以过滤掉空字段:
[x for x in re.split("\s*Elector's Name:\s*",l1) if x]
with your examples I get those outputs:
用你的例子我得到那些输出:
['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']
['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']
note that you can achieve this using str.split()
chained to str.split()
as well:
请注意,您可以使用链接到str.split()的str.split()来实现此目的:
[x.strip() for x in l1.split("Elector's Name:") if x]
#3
1
If you need only to get all names maybe try .split()
with delimiter Elector's Name:
. Like :
如果您只需要获取所有名称,可以尝试.split()并使用分隔符Elector的名称:喜欢 :
names = line.split('Elector's Name:')
for i in names:
print(i)
#4
0
Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
有些人在遇到问题时会想“我知道,我会使用正则表达式”。现在他们有两个问题。
So, using python
所以,使用python
line = "Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai"
[name.strip() for name in line.split("Elector's Name:") if name != '']