reference: Pandas DataFrame: remove unwanted parts from strings in a column
参考:Pandas DataFrame:从列中的字符串中删除不需要的部分
In reference to an answer provided in the link above. I've researched some regular expressions and I plan to dive deeper but in the meantime I could use some help.
参考上面链接中提供的答案。我已经研究了一些正则表达式,我打算深入研究,但与此同时我可以使用一些帮助。
My dataframe is something like:
我的数据框是这样的:
df:
c_contofficeID
0 0109
1 0109
2 3434
3 123434
4 1255N9
5 0109
6 123434
7 55N9
8 5599
9 0109
Psuedo Code
If the first two characters are a 12 remove them. Or alternatively, add a 12 to the characters that don't have a 12 in the first two characters.
如果前两个字符是12则删除它们。或者,在前两个字符中没有12的字符中添加12。
Result would look like:
结果如下:
c_contofficeID
0 0109
1 0109
2 3434
3 3434
4 55N9
5 0109
6 3434
7 55N9
8 5599
9 0109
I'm using the answer from the link above as a starting point:
我正在使用上面链接中的答案作为起点:
df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
I've tried the following:
我尝试过以下方法:
Attempt 1)
df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'[1][2]',value=r'')
Attempt 2)
df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'$[1][2]',value=r'')
Attempt 3)
df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'?[1]?[2]',value=r'')
1 个解决方案
#1
2
new answers
per comment from @Addison
来自@Addison的每条评论的新答案
# '12(?=.{4}$)' makes sure we have a 12 followed by exactly 4 something elses
df.c_contofficeID.str.replace('^12(?=.{4}$)', '')
If ID's must have four characters, it's simpler to
如果ID必须有四个字符,那么它就更简单了
df.c_contofficeID.str[-4:]
old answer
use str.replace
旧答案使用str.replace
df.c_contofficeID.str.replace('^12', '').to_frame()
#1
2
new answers
per comment from @Addison
来自@Addison的每条评论的新答案
# '12(?=.{4}$)' makes sure we have a 12 followed by exactly 4 something elses
df.c_contofficeID.str.replace('^12(?=.{4}$)', '')
If ID's must have four characters, it's simpler to
如果ID必须有四个字符,那么它就更简单了
df.c_contofficeID.str[-4:]
old answer
use str.replace
旧答案使用str.replace
df.c_contofficeID.str.replace('^12', '').to_frame()