I'm trying to do something but can't remember/find the answer. I have a list of city names from the Census Bureau and they put the city's type on the end which is messing up my match()
.
我正在尝试做一些但不记得/找到答案。我有一个来自人口普查局的城市名称列表,他们把城市的类型放在最后,这弄乱了我的比赛()。
I'd like to make this:
我想这样做:
Middletown Township
Sunny Valley Borough
Hillside Village
into this:
进入这个:
Middletown
Sunny Valley
Hillside
Any suggestions? Ideally I'd also like to know if there's a lastIndexOf()
function in R.
有什么建议么?理想情况下,我还想知道R中是否有lastIndexOf()函数。
Here's the dput:
这是输入:
> dput(df1)
structure(list(id = c(1, 2, 3), city = structure(c(2L, 3L, 1L
), .Label = c("Hillside Village", "Middletown Township", "Sunny Valley Borough"
), class = "factor")), .Names = c("id", "city"), row.names = c(NA,
-3L), class = "data.frame")
2 个解决方案
#1
16
This will work:
这将有效:
gsub("\\s*\\w*$", "", df1$city)
[1] "Middletown" "Sunny Valley" "Hillside"
It removes any substring consisting of one or more space chararacters, followed by any number of "word" characters (spaces, numbers, or underscores), followed by the end of the string.
它删除由一个或多个空格字符组成的子字符串,后跟任意数量的“单词”字符(空格,数字或下划线),后跟字符串的结尾。
#2
12
Here's a regexp that does what you need:
这是一个正则表达式,可以满足您的需求:
sub(df1$city, pattern = " [[:alpha:]]*$", replacement = "")
[1] "Middletown" "Sunny Valley" "Hillside"
[1]“米德尔敦”“阳光谷”“山坡”
That's replacing a substring that starts with a space, then contains only letters until the end of the string, with an empty string.
这正在替换以空格开头的子字符串,然后只包含字母直到字符串的结尾,并带有空字符串。
#1
16
This will work:
这将有效:
gsub("\\s*\\w*$", "", df1$city)
[1] "Middletown" "Sunny Valley" "Hillside"
It removes any substring consisting of one or more space chararacters, followed by any number of "word" characters (spaces, numbers, or underscores), followed by the end of the string.
它删除由一个或多个空格字符组成的子字符串,后跟任意数量的“单词”字符(空格,数字或下划线),后跟字符串的结尾。
#2
12
Here's a regexp that does what you need:
这是一个正则表达式,可以满足您的需求:
sub(df1$city, pattern = " [[:alpha:]]*$", replacement = "")
[1] "Middletown" "Sunny Valley" "Hillside"
[1]“米德尔敦”“阳光谷”“山坡”
That's replacing a substring that starts with a space, then contains only letters until the end of the string, with an empty string.
这正在替换以空格开头的子字符串,然后只包含字母直到字符串的结尾,并带有空字符串。