从字符串中删除选定的字符。

时间:2022-09-04 17:07:02

Sorry in case of duplication, but the solutions I have seen does not solve my issue.

抱歉,如果有重复,但是我看到的解决方案并不能解决我的问题。

I have a data frame (df). One of its variables (df$Year) includes a list of years, such as:

我有一个数据帧(df)。其中一个变量(df$Year)包括年份列表,例如:

 > df$Year

 Year
 2001–                       
 2013–                     
 2016–                      
 2003–                      
 2012–2013                      
 2013–                      
 1993–2007, 2010–

In case of multiple years, I just want to keep the last one (i.e. rather than '1993–2007, 2010–' only '2010') and get rid of the '-'. Yet, I have tried with:

如果是多年,我只想保留最后一个(也就是说,不是“1993-2007年,2010 -只有“2010年”),去掉“-”。然而,我尝试过:

unlist(str_extract_all(df$Year, "[[:digit:]]4$"))

but this does not seem to work.

但这似乎行不通。

Any hint?

有提示吗?

1 个解决方案

#1


2  

We can use sub for a one liner:

我们可以用潜水艇装一个衬垫:

df$Year <- sub(".*(\\d{4})\\–?", "\\1", df$Year)
df$Year

[1] "2001" "2013" "2016" "2003" "2013" "2013" "2010"

Demo

Note that the dashes you use in your year ranges appear to be em dashes (or maybe en dashes), not the regular ASCII character.

注意,您在年范围中使用的破折号看起来是em破折号(或者可能是en破折号),而不是普通的ASCII字符。

#1


2  

We can use sub for a one liner:

我们可以用潜水艇装一个衬垫:

df$Year <- sub(".*(\\d{4})\\–?", "\\1", df$Year)
df$Year

[1] "2001" "2013" "2016" "2003" "2013" "2013" "2010"

Demo

Note that the dashes you use in your year ranges appear to be em dashes (or maybe en dashes), not the regular ASCII character.

注意,您在年范围中使用的破折号看起来是em破折号(或者可能是en破折号),而不是普通的ASCII字符。