I have data like below -
我有如下的数据
PLAYSTORE BANGKOK
FLOAT@THE BAY SINGAPORE
YANTRA SINGAPORE
AIRASIA_QS9DQQL SINGAPORE
I want to remove the last word from each string, if it is in list of cities that i am looking for using this -
我想从每个字符串中删除最后一个单词,如果它在我要使用这个-的城市列表中
sub('(?i)^(.*)\\b(singapore|stockholm|singapor|bangkok|kuala lumpur|london|tokyo)$','\\2', merch_desc$desc2 )
But \1 or \2 dont work and i get the full string again. Is there a way to correct this?
但是\1或\2不管用,我又得到了整根弦。有办法改正吗?
I want 2 outputs - 1 with the company names and another with the locations into 2 separate vectors.
我想要2个输出- 1个带公司名称,另一个带地点的两个单独的向量。
merch_desc$merch -
merch_desc营销-
PLAYSTORE
FLOAT@THE BAY
YANTRA
AIRASIA_QS9DQQL
merch_desc$loc -
merch_desc loc -
BANGKOK
SINGAPORE
SINGAPORE
SINGAPORE
It seems strange that it works on string but not on data frames -
奇怪的是,它可以在字符串上工作,但不能在数据帧上工作
test$desc2
[1] "qoo10 singapore " "bill payment via internet banking" "mcdonald's restaurants singapore "
[4] "hdb season parking singapore " "grabtaxi pte ltd singapore "
This does not work -
这行不通-
sub('^.* (singapore|stockholm|singapor|bangkok|kuala lumpur|london|tokyo)$', '\\1', test$desc2 )
[1] "qoo10 singapore " "bill payment via internet banking" "mcdonald's restaurants singapore "
[4] "hdb season parking singapore " "grabtaxi pte ltd singapore "
But this works -
但是这个工作,
sub('^.* (singapore|stockholm|singapor|bangkok|kuala lumpur|london|tokyo)$', '\\1', 'tigerair y843km singapore' )
[1] "singapore"
Edit 2 -
编辑2 -
Use trimws(). Without Trimws it does not handle the multiple spaces.
使用trimws()。没有微调,它不能处理多个空间。
Thanks, Manish
谢谢,Manish
1 个解决方案
#1
3
We can capture the substring as groups using sub
in pattern
, then we add a delimiter (,
) between the capture groups in the replacement
, use that as sep
in the read.table
. If there are leading/lagging spaces, remove it by str_trim
from stringr
by looping through the columns.
我们可以使用模式中的sub将子字符串捕获为组,然后在替换的捕获组之间添加一个分隔符(,),将其用作read.table中的sep。如果有前导/后置空间,通过在列中循环从stringr中使用str_trim删除它。
library(stringr)
d1 <- read.table(text=sub('(.*)\\s+(\\S+)$', '\\1,\\2', v1),sep=',')
d1[] <- lapply(d1, str_trim)
d1
# V1 V2
#1 PLAYSTORE BANGKOK
#2 FLOAT@THE BAY SINGAPORE
#3 YANTRA SINGAPORE
#4 AIRASIA_QS9DQQL SINGAPORE
Or as suggested by @RichardScriven, a base R
option for trimming leading/lagging spaces is trimws
.
或者如@RichardScriven所建议的,修剪领先/后置空间的基本R选项是trimws。
d1[] <- lapply(d1, trimws)
data
v1 <- c('PLAYSTORE BANGKOK','FLOAT@THE BAY SINGAPORE',
'YANTRA SINGAPORE',
'AIRASIA_QS9DQQL SINGAPORE')
#1
3
We can capture the substring as groups using sub
in pattern
, then we add a delimiter (,
) between the capture groups in the replacement
, use that as sep
in the read.table
. If there are leading/lagging spaces, remove it by str_trim
from stringr
by looping through the columns.
我们可以使用模式中的sub将子字符串捕获为组,然后在替换的捕获组之间添加一个分隔符(,),将其用作read.table中的sep。如果有前导/后置空间,通过在列中循环从stringr中使用str_trim删除它。
library(stringr)
d1 <- read.table(text=sub('(.*)\\s+(\\S+)$', '\\1,\\2', v1),sep=',')
d1[] <- lapply(d1, str_trim)
d1
# V1 V2
#1 PLAYSTORE BANGKOK
#2 FLOAT@THE BAY SINGAPORE
#3 YANTRA SINGAPORE
#4 AIRASIA_QS9DQQL SINGAPORE
Or as suggested by @RichardScriven, a base R
option for trimming leading/lagging spaces is trimws
.
或者如@RichardScriven所建议的,修剪领先/后置空间的基本R选项是trimws。
d1[] <- lapply(d1, trimws)
data
v1 <- c('PLAYSTORE BANGKOK','FLOAT@THE BAY SINGAPORE',
'YANTRA SINGAPORE',
'AIRASIA_QS9DQQL SINGAPORE')