I have a dataset with several columns, one of which is a column for reaction times. These reaction times are comma separated to denote the reaction times (of the same participant) for the different trials.
我有一个包含多个列的数据集,其中一列是反应时间的列。这些反应时间以逗号分隔,表示不同试验的反应时间(同一参与者)。
For example: row 1 (i.e.: the data from participant 1) has the following under the column "reaction times"
例如:第1行(即:来自参与者1的数据)在“反应时间”栏下有以下内容
reaction_times
2000,1450,1800,2200
Hence these are the reaction times of participant 1 for trials 1,2,3,4
.
因此,这些是参与者1对试验1,2,3,4的反应时间。
I now want to create a new data set in which the reaction times for these trials all form individual columns. This way I can calculate the mean reaction time for each trial.
我现在想要创建一个新的数据集,其中这些试验的反应时间都形成单独的列。这样我就可以计算每次试验的平均反应时间。
trial 1 trial 2 trial 3 trial 4
participant 1: 2000 1450 1800 2200
I tried the "colsplit" from the "reshape2"-package but that doesn't seem to split my data into new columns (perhaps because my data is all in 1 cell).
我尝试了“reshape2”-package中的“colsplit”,但这似乎并没有将我的数据拆分成新列(可能是因为我的数据都在1个单元格中)。
Any suggestions?
有什么建议么?
4 个解决方案
#1
17
I think you are looking for the strsplit() function;
我想你正在寻找strsplit()函数;
a = "2000,1450,1800,2200"
strsplit(a, ",")
[[1]]
[1] "2000" "1450" "1800" "2200"
Notice that strsplit returns a list, in this case with only one element. This is because strsplit takes vectors as input. Therefore, you can also put a long vector of your single cell characters into the function and get back a splitted list of that vector. In a more relevant example this look like:
请注意,strsplit返回一个列表,在这种情况下只有一个元素。这是因为strsplit将向量作为输入。因此,您还可以将单个单元格字符的长向量放入函数中,并返回该向量的拆分列表。在一个更相关的例子中,这看起来像:
# Create some example data
dat = data.frame(reaction_time =
apply(matrix(round(runif(100, 1, 2000)),
25, 4), 1, paste, collapse = ","),
stringsAsFactors=FALSE)
splitdat = do.call("rbind", strsplit(dat$reaction_time, ","))
splitdat = data.frame(apply(splitdat, 2, as.numeric))
names(splitdat) = paste("trial", 1:4, sep = "")
head(splitdat)
trial1 trial2 trial3 trial4
1 597 1071 1430 997
2 614 322 1242 1140
3 1522 1679 51 1120
4 225 1988 1938 1068
5 621 623 1174 55
6 1918 1828 136 1816
and finally, to calculate the mean per person:
最后,计算每人的平均值:
apply(splitdat, 1, mean)
[1] 1187.50 361.25 963.75 1017.00 916.25 1409.50 730.00 1310.75 1133.75
[10] 851.25 914.75 881.25 889.00 1014.75 676.75 850.50 805.00 1460.00
[19] 901.00 1443.50 507.25 691.50 1090.00 833.25 669.25
#2
9
A nifty, if rather heavy-handed, way is to use read.csv
in conjunction with textConnection
. Assuming your data is in a data frame, df
:
一个漂亮的,如果相当沉重的方式是将read.csv与textConnection结合使用。假设您的数据在数据框中,df:
x <- read.csv(textConnection(df[["reaction times"]]))
#3
9
Old question, but I came across it from another recent question (which seems unrelated).
老问题,但我从另一个最近的问题(似乎无关)中遇到了它。
Both existing answers are appropriate, but I wanted to share an answer related to a package I have created called "splitstackshape" that is fast and has straightforward syntax.
两个现有的答案都是合适的,但我想分享一个与我创建的名为“splitstackshape”的软件包相关的答案,该软件包很快且语法简单明了。
Here's some sample data:
这是一些示例数据:
set.seed(1)
dat = data.frame(
reaction_time = apply(matrix(round(
runif(24, 1, 2000)), 6, 4), 1, paste, collapse = ","))
This is the splitting:
这是分裂:
library(splitstackshape)
cSplit(dat, "reaction_time", ",")
# reaction_time_1 reaction_time_2 reaction_time_3 reaction_time_4
# 1: 532 1889 1374 761
# 2: 745 1322 769 1555
# 3: 1146 1259 1540 1869
# 4: 1817 125 996 425
# 5: 404 413 1436 1304
# 6: 1797 354 1984 252
And, optionally, if you need to take the rowMeans
:
并且,可选地,如果您需要使用rowMeans:
rowMeans(cSplit(dat, "reaction_time", ","))
# [1] 1139.00 1097.75 1453.50 840.75 889.25 1096.75
#4
3
Another option using dplyr and tidyr with Paul Hiemstra's example data is:
使用dplyr和tidyr与Paul Hiemstra的示例数据的另一个选项是:
# create example data
data = data.frame(reaction_time =
apply(matrix(round(runif(100, 1, 2000)),
25, 4), 1, paste, collapse = ","),
stringsAsFactors=FALSE)
head(data)
# clean data
data2 <- data %>% mutate(split_reaction_time = str_split(as.character(reaction_time), ",")) %>% unnest(split_reaction_time)
data2$col_names <- c("trial1", "trial2", "trial3", "trial4")
data2 <- data2 %>% spread(key = col_names, value = split_reaction_time) %>% select(-reaction_time)
head(data2)
#1
17
I think you are looking for the strsplit() function;
我想你正在寻找strsplit()函数;
a = "2000,1450,1800,2200"
strsplit(a, ",")
[[1]]
[1] "2000" "1450" "1800" "2200"
Notice that strsplit returns a list, in this case with only one element. This is because strsplit takes vectors as input. Therefore, you can also put a long vector of your single cell characters into the function and get back a splitted list of that vector. In a more relevant example this look like:
请注意,strsplit返回一个列表,在这种情况下只有一个元素。这是因为strsplit将向量作为输入。因此,您还可以将单个单元格字符的长向量放入函数中,并返回该向量的拆分列表。在一个更相关的例子中,这看起来像:
# Create some example data
dat = data.frame(reaction_time =
apply(matrix(round(runif(100, 1, 2000)),
25, 4), 1, paste, collapse = ","),
stringsAsFactors=FALSE)
splitdat = do.call("rbind", strsplit(dat$reaction_time, ","))
splitdat = data.frame(apply(splitdat, 2, as.numeric))
names(splitdat) = paste("trial", 1:4, sep = "")
head(splitdat)
trial1 trial2 trial3 trial4
1 597 1071 1430 997
2 614 322 1242 1140
3 1522 1679 51 1120
4 225 1988 1938 1068
5 621 623 1174 55
6 1918 1828 136 1816
and finally, to calculate the mean per person:
最后,计算每人的平均值:
apply(splitdat, 1, mean)
[1] 1187.50 361.25 963.75 1017.00 916.25 1409.50 730.00 1310.75 1133.75
[10] 851.25 914.75 881.25 889.00 1014.75 676.75 850.50 805.00 1460.00
[19] 901.00 1443.50 507.25 691.50 1090.00 833.25 669.25
#2
9
A nifty, if rather heavy-handed, way is to use read.csv
in conjunction with textConnection
. Assuming your data is in a data frame, df
:
一个漂亮的,如果相当沉重的方式是将read.csv与textConnection结合使用。假设您的数据在数据框中,df:
x <- read.csv(textConnection(df[["reaction times"]]))
#3
9
Old question, but I came across it from another recent question (which seems unrelated).
老问题,但我从另一个最近的问题(似乎无关)中遇到了它。
Both existing answers are appropriate, but I wanted to share an answer related to a package I have created called "splitstackshape" that is fast and has straightforward syntax.
两个现有的答案都是合适的,但我想分享一个与我创建的名为“splitstackshape”的软件包相关的答案,该软件包很快且语法简单明了。
Here's some sample data:
这是一些示例数据:
set.seed(1)
dat = data.frame(
reaction_time = apply(matrix(round(
runif(24, 1, 2000)), 6, 4), 1, paste, collapse = ","))
This is the splitting:
这是分裂:
library(splitstackshape)
cSplit(dat, "reaction_time", ",")
# reaction_time_1 reaction_time_2 reaction_time_3 reaction_time_4
# 1: 532 1889 1374 761
# 2: 745 1322 769 1555
# 3: 1146 1259 1540 1869
# 4: 1817 125 996 425
# 5: 404 413 1436 1304
# 6: 1797 354 1984 252
And, optionally, if you need to take the rowMeans
:
并且,可选地,如果您需要使用rowMeans:
rowMeans(cSplit(dat, "reaction_time", ","))
# [1] 1139.00 1097.75 1453.50 840.75 889.25 1096.75
#4
3
Another option using dplyr and tidyr with Paul Hiemstra's example data is:
使用dplyr和tidyr与Paul Hiemstra的示例数据的另一个选项是:
# create example data
data = data.frame(reaction_time =
apply(matrix(round(runif(100, 1, 2000)),
25, 4), 1, paste, collapse = ","),
stringsAsFactors=FALSE)
head(data)
# clean data
data2 <- data %>% mutate(split_reaction_time = str_split(as.character(reaction_time), ",")) %>% unnest(split_reaction_time)
data2$col_names <- c("trial1", "trial2", "trial3", "trial4")
data2 <- data2 %>% spread(key = col_names, value = split_reaction_time) %>% select(-reaction_time)
head(data2)