从字符串中删除最后一个单词

时间:2022-03-15 20:06:24

I have data like below -

我有如下的数据

PLAYSTORE BANGKOK
FLOAT@THE BAY          SINGAPORE
YANTRA                 SINGAPORE
AIRASIA_QS9DQQL        SINGAPORE

I want to remove the last word from each string, if it is in list of cities that i am looking for using this -

我想从每个字符串中删除最后一个单词,如果它在我要使用这个-的城市列表中

sub('(?i)^(.*)\\b(singapore|stockholm|singapor|bangkok|kuala lumpur|london|tokyo)$','\\2', merch_desc$desc2 )

But \1 or \2 dont work and i get the full string again. Is there a way to correct this?

但是\1或\2不管用,我又得到了整根弦。有办法改正吗?

I want 2 outputs - 1 with the company names and another with the locations into 2 separate vectors.

我想要2个输出- 1个带公司名称,另一个带地点的两个单独的向量。

merch_desc$merch -

merch_desc营销-

  PLAYSTORE 
    FLOAT@THE BAY          
    YANTRA                 
    AIRASIA_QS9DQQL      

merch_desc$loc -

merch_desc loc -

BANGKOK
SINGAPORE
SINGAPORE
SINGAPORE

It seems strange that it works on string but not on data frames -

奇怪的是,它可以在字符串上工作,但不能在数据帧上工作

test$desc2
[1] "qoo10                  singapore    " "bill payment via internet banking"    "mcdonald's restaurants singapore    "
[4] "hdb season parking     singapore    " "grabtaxi pte ltd       singapore    "

This does not work -

这行不通-

sub('^.* (singapore|stockholm|singapor|bangkok|kuala lumpur|london|tokyo)$', '\\1', test$desc2 )
[1] "qoo10                  singapore    " "bill payment via internet banking"    "mcdonald's restaurants singapore    "
[4] "hdb season parking     singapore    " "grabtaxi pte ltd       singapore    "

But this works -

但是这个工作,

sub('^.* (singapore|stockholm|singapor|bangkok|kuala lumpur|london|tokyo)$', '\\1', 'tigerair y843km singapore' )
[1] "singapore"

Edit 2 -

编辑2 -

Use trimws(). Without Trimws it does not handle the multiple spaces.

使用trimws()。没有微调,它不能处理多个空间。

Thanks, Manish

谢谢,Manish

1 个解决方案

#1


3  

We can capture the substring as groups using sub in pattern, then we add a delimiter (,) between the capture groups in the replacement, use that as sep in the read.table. If there are leading/lagging spaces, remove it by str_trim from stringr by looping through the columns.

我们可以使用模式中的sub将子字符串捕获为组,然后在替换的捕获组之间添加一个分隔符(,),将其用作read.table中的sep。如果有前导/后置空间,通过在列中循环从stringr中使用str_trim删除它。

library(stringr)
d1 <- read.table(text=sub('(.*)\\s+(\\S+)$', '\\1,\\2', v1),sep=',')
d1[] <- lapply(d1, str_trim)
d1
#              V1        V2
#1       PLAYSTORE   BANGKOK
#2   FLOAT@THE BAY SINGAPORE
#3          YANTRA SINGAPORE
#4 AIRASIA_QS9DQQL SINGAPORE

Or as suggested by @RichardScriven, a base R option for trimming leading/lagging spaces is trimws.

或者如@RichardScriven所建议的,修剪领先/后置空间的基本R选项是trimws。

d1[] <- lapply(d1, trimws)

data

v1 <- c('PLAYSTORE BANGKOK','FLOAT@THE BAY          SINGAPORE',
       'YANTRA                 SINGAPORE',
        'AIRASIA_QS9DQQL        SINGAPORE')

#1


3  

We can capture the substring as groups using sub in pattern, then we add a delimiter (,) between the capture groups in the replacement, use that as sep in the read.table. If there are leading/lagging spaces, remove it by str_trim from stringr by looping through the columns.

我们可以使用模式中的sub将子字符串捕获为组,然后在替换的捕获组之间添加一个分隔符(,),将其用作read.table中的sep。如果有前导/后置空间,通过在列中循环从stringr中使用str_trim删除它。

library(stringr)
d1 <- read.table(text=sub('(.*)\\s+(\\S+)$', '\\1,\\2', v1),sep=',')
d1[] <- lapply(d1, str_trim)
d1
#              V1        V2
#1       PLAYSTORE   BANGKOK
#2   FLOAT@THE BAY SINGAPORE
#3          YANTRA SINGAPORE
#4 AIRASIA_QS9DQQL SINGAPORE

Or as suggested by @RichardScriven, a base R option for trimming leading/lagging spaces is trimws.

或者如@RichardScriven所建议的,修剪领先/后置空间的基本R选项是trimws。

d1[] <- lapply(d1, trimws)

data

v1 <- c('PLAYSTORE BANGKOK','FLOAT@THE BAY          SINGAPORE',
       'YANTRA                 SINGAPORE',
        'AIRASIA_QS9DQQL        SINGAPORE')