在第一个空格中拆分字符字段而不删除r中的字段

时间:2022-02-11 21:38:59

I want to split up the field "Fare_class" on the first space with dropping any fields.I know a similar question exists, but when i tried that approach, it dropped all fields except for "Fare_Class".

我想在第一个空间拆分字段“Fare_class”并删除任何字段。我知道存在类似的问题,但是当我尝试这种方法时,它会删除除“Fare_Class”之外的所有字段。

Travel_class    Fare_class          Avios_awarded      
First           Flexible F        300% of miles flown       
First           Lowest A          250% of miles flown              
Business     Flexible J, C, D     250% of miles flown    
Business       Lowest R, I        150% of miles flown             

Below is the table I'd like to create. Splitting "Fare_class" on the first space into two new fields "Fare" and "Booking".

下面是我要创建的表格。将第一个空间的“Fare_class”拆分为两个新字段“Fare”和“Booking”。

Travel_class    Fare_class       Fare       Booking      Avios_awarded      
First            Flexible F      Flexible     F      300% of miles flown       
First            Lowest A        Lowest       A      250% of miles flown              
Business      Flexible J, C, D   Flexible   J,C,D   250% of miles flown    
Business        Lowest R, I      Lowest      R,I    150% of miles flown   

3 个解决方案

#1


2  

Alternative 1:

library(stringr)
str_split_fixed(Fare_class, " ", 2)

#     [,1]        [,2]     
#[1,] "Flexible"  "F"      
#[2,] "Lowest"    "A"      
#[3,] "Flexible"  "J, C, D"
#[4,] "Lowest"    "R, I" 

Alternative 2:

library(reshape2)
colsplit(Fare_class," ",c("Fare", "Booking"))

#      Fare  Booking
#1 Flexible        F
#2   Lowest        A
#3 Flexible  J, C, D
#4   Lowest     R, I

#2


0  

library(stringr)

Fare_class <- c('Flexible F',
 'Lowest A',
 'Flexible J, C, D',
 'Lowest R, I')

fare <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 1)
class <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 2)

str_split is used to split the string into (n=) 2 pieces. The output of str_split is a list of 2-element vectors. sapply(..., '[[', ) is used to return the first / second subelement of each list element.

str_split用于将字符串拆分为(n =)2个。 str_split的输出是2元素向量的列表。 sapply(...,'[[',)用于返回每个列表元素的第一个/第二个子元素。

#3


0  

Here's a solution with separate from tidyr to split the column by regex:

这是一个与tidyr分开的解决方案,用于通过正则表达式拆分列:

library(tidyr)

separate(df, Fare_class, c("Fare", "Booking"), sep = "\\b\\s\\b", remove = FALSE)

or use extract for more complex patterns to split by capture groups:

或者使用提取来获取更复杂的模式以按捕获组进行拆分:

extract(df, Fare_class, c("Fare", "Booking"), regex = "(^\\p{L}+\\b)\\s(.+$)", remove = FALSE)

Result:

  Travel_class       Fare_class     Fare Booking        Avios_awarded
1        First       Flexible F Flexible       F  300% of miles flown
2        First         Lowest A   Lowest       A  250% of miles flown
3     Business Flexible J, C, D Flexible J, C, D  250% of miles flown
4     Business      Lowest R, I   Lowest    R, I  150% of miles flown

Note:

If you don't want to keep the original column Fare_class, just remove remove = FALSE from separate or extract.

如果您不想保留原始列Fare_class,只需从remove或extract中删除remove = FALSE即可。

Data:

df = structure(list(Travel_class = structure(c(2L, 2L, 1L, 1L), .Label = c("Business", 
"First"), class = "factor"), Fare_class = structure(c(1L, 3L, 
2L, 4L), .Label = c("Flexible F", "Flexible J, C, D", "Lowest A", 
"Lowest R, I"), class = "factor"), Avios_awarded = structure(c(4L, 
1L, 3L, 2L), .Label = c(" 250% of miles flown", "150% of miles flown", 
"250% of miles flown", "300% of miles flown"), class = "factor")), .Names = c("Travel_class", 
"Fare_class", "Avios_awarded"), class = "data.frame", row.names = c(NA, 
-4L))

#1


2  

Alternative 1:

library(stringr)
str_split_fixed(Fare_class, " ", 2)

#     [,1]        [,2]     
#[1,] "Flexible"  "F"      
#[2,] "Lowest"    "A"      
#[3,] "Flexible"  "J, C, D"
#[4,] "Lowest"    "R, I" 

Alternative 2:

library(reshape2)
colsplit(Fare_class," ",c("Fare", "Booking"))

#      Fare  Booking
#1 Flexible        F
#2   Lowest        A
#3 Flexible  J, C, D
#4   Lowest     R, I

#2


0  

library(stringr)

Fare_class <- c('Flexible F',
 'Lowest A',
 'Flexible J, C, D',
 'Lowest R, I')

fare <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 1)
class <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 2)

str_split is used to split the string into (n=) 2 pieces. The output of str_split is a list of 2-element vectors. sapply(..., '[[', ) is used to return the first / second subelement of each list element.

str_split用于将字符串拆分为(n =)2个。 str_split的输出是2元素向量的列表。 sapply(...,'[[',)用于返回每个列表元素的第一个/第二个子元素。

#3


0  

Here's a solution with separate from tidyr to split the column by regex:

这是一个与tidyr分开的解决方案,用于通过正则表达式拆分列:

library(tidyr)

separate(df, Fare_class, c("Fare", "Booking"), sep = "\\b\\s\\b", remove = FALSE)

or use extract for more complex patterns to split by capture groups:

或者使用提取来获取更复杂的模式以按捕获组进行拆分:

extract(df, Fare_class, c("Fare", "Booking"), regex = "(^\\p{L}+\\b)\\s(.+$)", remove = FALSE)

Result:

  Travel_class       Fare_class     Fare Booking        Avios_awarded
1        First       Flexible F Flexible       F  300% of miles flown
2        First         Lowest A   Lowest       A  250% of miles flown
3     Business Flexible J, C, D Flexible J, C, D  250% of miles flown
4     Business      Lowest R, I   Lowest    R, I  150% of miles flown

Note:

If you don't want to keep the original column Fare_class, just remove remove = FALSE from separate or extract.

如果您不想保留原始列Fare_class,只需从remove或extract中删除remove = FALSE即可。

Data:

df = structure(list(Travel_class = structure(c(2L, 2L, 1L, 1L), .Label = c("Business", 
"First"), class = "factor"), Fare_class = structure(c(1L, 3L, 
2L, 4L), .Label = c("Flexible F", "Flexible J, C, D", "Lowest A", 
"Lowest R, I"), class = "factor"), Avios_awarded = structure(c(4L, 
1L, 3L, 2L), .Label = c(" 250% of miles flown", "150% of miles flown", 
"250% of miles flown", "300% of miles flown"), class = "factor")), .Names = c("Travel_class", 
"Fare_class", "Avios_awarded"), class = "data.frame", row.names = c(NA, 
-4L))