I'm a beginner in R programming language, and I'm using RStudio to work on this project I have. My dataframe has a column for the zone of the mall, but some zones are actually subzones of a bigger zone, so they are called something like: Ikea 1, Ikea 2, Ikea 3, etc. I want to create a new column with the bigger zone for each entry.
我是R编程语言的初学者,我正在使用RStudio来处理我的这个项目。我的数据框有一个商场区域的列,但有些区域实际上是更大区域的子区域,因此它们被称为:Ikea 1,Ikea 2,Ikea 3等。我想创建一个新的列,每个条目的更大区域。
The dataframe looks like this:
数据框如下所示:
ID ENTRY ZONE
1 13:39:40 Casual Dinnerware
2 15:28:43 Van Thiel 3
3 10:41:05 Caracole 7
4 16:37:31 Entrance
I want to add a new column that has the "mother" zone, in case it is a subzone, for the given example, I want something like:
我想添加一个具有“母亲”区域的新列,如果它是一个子区域,对于给定的示例,我想要类似的东西:
ID ENTRY ZONE NEW ZONE
1 13:39:40 Casual Dinnerware Casual Dinneware
2 15:28:43 Van Thiel 3 Van Thiel
3 10:41:05 Caracole 7 Caracole
4 16:37:31 Entrance Entrance
Note that not every zone is a subzone!
请注意,并非每个区域都是子区域!
My ideia was to analyse each entry and if the zone ended with a number, I would remove the number and write the rest in the new column. I already read a few questions that I thought that would help, related to regular expressions and all (like this one), but I couldn't get this to work.
我的意思是分析每个条目,如果区域以数字结尾,我会删除该数字并将其余部分写入新列。我已经阅读了一些我认为会有所帮助的问题,这些问题与正则表达式和所有问题(比如这个问题)有关,但我无法解决这个问题。
Thank you for your time, if you have any questions, let me know!
感谢您的时间,如果您有任何疑问,请告诉我们!
1 个解决方案
#1
2
As brittenb said:
正如brittenb所说:
df$NEW_ZONE = gsub("\\s\\d+$", "", df$ZONE)
will do the trick for you.\\s
is a space,\\d
is a number, and$
indicates the end of the string, which is important to ensure that numbers which are part of the bigger zone aren't included.df $ NEW_ZONE = gsub(“\\ s \\ d + $”,“”,df $ ZONE)会为你做的伎俩。 \\ s是空格,\\ d是数字,$表示字符串的结尾,这对于确保不包括属于较大区域的数字很重要。
This solved my problem, thank you.
这解决了我的问题,谢谢。
#1
2
As brittenb said:
正如brittenb所说:
df$NEW_ZONE = gsub("\\s\\d+$", "", df$ZONE)
will do the trick for you.\\s
is a space,\\d
is a number, and$
indicates the end of the string, which is important to ensure that numbers which are part of the bigger zone aren't included.df $ NEW_ZONE = gsub(“\\ s \\ d + $”,“”,df $ ZONE)会为你做的伎俩。 \\ s是空格,\\ d是数字,$表示字符串的结尾,这对于确保不包括属于较大区域的数字很重要。
This solved my problem, thank you.
这解决了我的问题,谢谢。