this is probably very easy but I feel I am doing it wrong. Let's say I have the following string:
这可能很容易,但我觉得我做错了。假设我有以下字符串:
user: bob status: married age:45
Now I want to break it down to something like:
现在我想把它分解成:
user = 'bob'
status ='married'
age = 45
At the moment I am doing a lot of dirty splitting work but there's gotta be a better, Pythonic way using Regex. Here's what I do:
目前,我正在做大量肮脏的拆分工作,但肯定有更好的、python化的使用Regex的方法。这是我做的:
full_text = 'user: bob status: married age:45'
type = 'user'
cut_string = full_text_string.split(type + ":", 1)[1].split(" ")[0]
Thanks!
谢谢!
3 个解决方案
#1
3
Here's my solution. The regex : (\w+)\s*:\s*((?:\w+\b\s*)+)(?!\s*:)
这是我的解决方案。regex:(\ w +)\ s *:\ s *((?:\ w + \ b \ s *)+)(? ! \ s *:)
import re
s = 'user: bob status: married with children age:45'
pat = re.compile(r'(\w+)\s*:\s*((?:\w+\b\s*)+)(?!\s*:)')
print(pat.findall(s))
prints
打印
[('user', 'bob '), ('status', 'married with children '), ('age', '45')]
You can then use something like ast.literal_eval
to get the types right
然后,您可以使用类似ast.literal_eval这样的东西来正确地获取类型
#2
0
re.findall(r'(?:([0-9a-zA-Z]+): ?([0-9a-zA-Z]+))+',s)
re.findall(r '(?:([0-9a-zA-Z]+):?([0-9a-zA-Z]+))+ ',)
This will give back: [('user', 'bob'), ('status', 'married'), ('age', '45')]
这将回馈:[(“用户”,“鲍勃”)(“状态”,“结婚”)(“年龄”、“45”)]
The first group is a non-capturing group it means that this won't be in the results of findall.
第一个组是一个非捕获组,这意味着它不会出现在findall的结果中。
The [0-9a-z-A-Z]
part is equivalent to \w
.
[0-9a-z-A-Z]部分相当于\w。
#3
0
For those of us who avoid regex if we possibly can:
对于我们中那些尽可能避免使用regex的人:
>>> full_text='user: bob status: married age:45'
>>> alt_text = full_text.replace(':',' ').split()
>>> print alt_text[0],"=",alt_text[1]
>>> print alt_text[2],"=",alt_text[3]
>>> print alt_text[4],"=",alt_text[5]
user = bob
status = married
age = 45
If you had a space between age:
and 45
you wouldn't have to use replace
just full_text.split()
would suffice.
如果您在年龄之间有一个空格:到45岁之间,您不必使用replace,只需full_text.split()即可。
#1
3
Here's my solution. The regex : (\w+)\s*:\s*((?:\w+\b\s*)+)(?!\s*:)
这是我的解决方案。regex:(\ w +)\ s *:\ s *((?:\ w + \ b \ s *)+)(? ! \ s *:)
import re
s = 'user: bob status: married with children age:45'
pat = re.compile(r'(\w+)\s*:\s*((?:\w+\b\s*)+)(?!\s*:)')
print(pat.findall(s))
prints
打印
[('user', 'bob '), ('status', 'married with children '), ('age', '45')]
You can then use something like ast.literal_eval
to get the types right
然后,您可以使用类似ast.literal_eval这样的东西来正确地获取类型
#2
0
re.findall(r'(?:([0-9a-zA-Z]+): ?([0-9a-zA-Z]+))+',s)
re.findall(r '(?:([0-9a-zA-Z]+):?([0-9a-zA-Z]+))+ ',)
This will give back: [('user', 'bob'), ('status', 'married'), ('age', '45')]
这将回馈:[(“用户”,“鲍勃”)(“状态”,“结婚”)(“年龄”、“45”)]
The first group is a non-capturing group it means that this won't be in the results of findall.
第一个组是一个非捕获组,这意味着它不会出现在findall的结果中。
The [0-9a-z-A-Z]
part is equivalent to \w
.
[0-9a-z-A-Z]部分相当于\w。
#3
0
For those of us who avoid regex if we possibly can:
对于我们中那些尽可能避免使用regex的人:
>>> full_text='user: bob status: married age:45'
>>> alt_text = full_text.replace(':',' ').split()
>>> print alt_text[0],"=",alt_text[1]
>>> print alt_text[2],"=",alt_text[3]
>>> print alt_text[4],"=",alt_text[5]
user = bob
status = married
age = 45
If you had a space between age:
and 45
you wouldn't have to use replace
just full_text.split()
would suffice.
如果您在年龄之间有一个空格:到45岁之间,您不必使用replace,只需full_text.split()即可。