I want to split up the field "Fare_class" on the first space with dropping any fields.I know a similar question exists, but when i tried that approach, it dropped all fields except for "Fare_Class".
我想在第一个空间拆分字段“Fare_class”并删除任何字段。我知道存在类似的问题,但是当我尝试这种方法时,它会删除除“Fare_Class”之外的所有字段。
Travel_class Fare_class Avios_awarded
First Flexible F 300% of miles flown
First Lowest A 250% of miles flown
Business Flexible J, C, D 250% of miles flown
Business Lowest R, I 150% of miles flown
Below is the table I'd like to create. Splitting "Fare_class" on the first space into two new fields "Fare" and "Booking".
下面是我要创建的表格。将第一个空间的“Fare_class”拆分为两个新字段“Fare”和“Booking”。
Travel_class Fare_class Fare Booking Avios_awarded
First Flexible F Flexible F 300% of miles flown
First Lowest A Lowest A 250% of miles flown
Business Flexible J, C, D Flexible J,C,D 250% of miles flown
Business Lowest R, I Lowest R,I 150% of miles flown
3 个解决方案
#1
2
Alternative 1:
library(stringr)
str_split_fixed(Fare_class, " ", 2)
# [,1] [,2]
#[1,] "Flexible" "F"
#[2,] "Lowest" "A"
#[3,] "Flexible" "J, C, D"
#[4,] "Lowest" "R, I"
Alternative 2:
library(reshape2)
colsplit(Fare_class," ",c("Fare", "Booking"))
# Fare Booking
#1 Flexible F
#2 Lowest A
#3 Flexible J, C, D
#4 Lowest R, I
#2
0
library(stringr)
Fare_class <- c('Flexible F',
'Lowest A',
'Flexible J, C, D',
'Lowest R, I')
fare <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 1)
class <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 2)
str_split is used to split the string into (n=) 2 pieces. The output of str_split is a list of 2-element vectors. sapply(..., '[[', ) is used to return the first / second subelement of each list element.
str_split用于将字符串拆分为(n =)2个。 str_split的输出是2元素向量的列表。 sapply(...,'[[',)用于返回每个列表元素的第一个/第二个子元素。
#3
0
Here's a solution with separate
from tidyr
to split the column by regex:
这是一个与tidyr分开的解决方案,用于通过正则表达式拆分列:
library(tidyr)
separate(df, Fare_class, c("Fare", "Booking"), sep = "\\b\\s\\b", remove = FALSE)
or use extract
for more complex patterns to split by capture groups:
或者使用提取来获取更复杂的模式以按捕获组进行拆分:
extract(df, Fare_class, c("Fare", "Booking"), regex = "(^\\p{L}+\\b)\\s(.+$)", remove = FALSE)
Result:
Travel_class Fare_class Fare Booking Avios_awarded
1 First Flexible F Flexible F 300% of miles flown
2 First Lowest A Lowest A 250% of miles flown
3 Business Flexible J, C, D Flexible J, C, D 250% of miles flown
4 Business Lowest R, I Lowest R, I 150% of miles flown
Note:
If you don't want to keep the original column Fare_class
, just remove remove = FALSE
from separate
or extract
.
如果您不想保留原始列Fare_class,只需从remove或extract中删除remove = FALSE即可。
Data:
df = structure(list(Travel_class = structure(c(2L, 2L, 1L, 1L), .Label = c("Business",
"First"), class = "factor"), Fare_class = structure(c(1L, 3L,
2L, 4L), .Label = c("Flexible F", "Flexible J, C, D", "Lowest A",
"Lowest R, I"), class = "factor"), Avios_awarded = structure(c(4L,
1L, 3L, 2L), .Label = c(" 250% of miles flown", "150% of miles flown",
"250% of miles flown", "300% of miles flown"), class = "factor")), .Names = c("Travel_class",
"Fare_class", "Avios_awarded"), class = "data.frame", row.names = c(NA,
-4L))
#1
2
Alternative 1:
library(stringr)
str_split_fixed(Fare_class, " ", 2)
# [,1] [,2]
#[1,] "Flexible" "F"
#[2,] "Lowest" "A"
#[3,] "Flexible" "J, C, D"
#[4,] "Lowest" "R, I"
Alternative 2:
library(reshape2)
colsplit(Fare_class," ",c("Fare", "Booking"))
# Fare Booking
#1 Flexible F
#2 Lowest A
#3 Flexible J, C, D
#4 Lowest R, I
#2
0
library(stringr)
Fare_class <- c('Flexible F',
'Lowest A',
'Flexible J, C, D',
'Lowest R, I')
fare <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 1)
class <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 2)
str_split is used to split the string into (n=) 2 pieces. The output of str_split is a list of 2-element vectors. sapply(..., '[[', ) is used to return the first / second subelement of each list element.
str_split用于将字符串拆分为(n =)2个。 str_split的输出是2元素向量的列表。 sapply(...,'[[',)用于返回每个列表元素的第一个/第二个子元素。
#3
0
Here's a solution with separate
from tidyr
to split the column by regex:
这是一个与tidyr分开的解决方案,用于通过正则表达式拆分列:
library(tidyr)
separate(df, Fare_class, c("Fare", "Booking"), sep = "\\b\\s\\b", remove = FALSE)
or use extract
for more complex patterns to split by capture groups:
或者使用提取来获取更复杂的模式以按捕获组进行拆分:
extract(df, Fare_class, c("Fare", "Booking"), regex = "(^\\p{L}+\\b)\\s(.+$)", remove = FALSE)
Result:
Travel_class Fare_class Fare Booking Avios_awarded
1 First Flexible F Flexible F 300% of miles flown
2 First Lowest A Lowest A 250% of miles flown
3 Business Flexible J, C, D Flexible J, C, D 250% of miles flown
4 Business Lowest R, I Lowest R, I 150% of miles flown
Note:
If you don't want to keep the original column Fare_class
, just remove remove = FALSE
from separate
or extract
.
如果您不想保留原始列Fare_class,只需从remove或extract中删除remove = FALSE即可。
Data:
df = structure(list(Travel_class = structure(c(2L, 2L, 1L, 1L), .Label = c("Business",
"First"), class = "factor"), Fare_class = structure(c(1L, 3L,
2L, 4L), .Label = c("Flexible F", "Flexible J, C, D", "Lowest A",
"Lowest R, I"), class = "factor"), Avios_awarded = structure(c(4L,
1L, 3L, 2L), .Label = c(" 250% of miles flown", "150% of miles flown",
"250% of miles flown", "300% of miles flown"), class = "factor")), .Names = c("Travel_class",
"Fare_class", "Avios_awarded"), class = "data.frame", row.names = c(NA,
-4L))