I have a data frame in R that has many rows (over 3000) with F0 (fundamental frequency) tracks of an utterance in it. The rows have the following information in them: speaker ID, group #, repetition #, accent type, sex, and then 50 columns of F0 points. The data looks like this:
我在R中有一个数据帧,它有很多行(超过3000),其中有一个话语的F0(基频)轨道。行中包含以下信息:说话者ID,组#,重复#,重音类型,性别,然后50列F0点。数据如下所示:
Speaker Sex Group Repetition Accent Word 1 2 3 4
105 M 1 1 N AILMENT 102.31030 102.31030 102.31030 102.31127
105 M 1 1 N COLLEGE 111.80641 111.80313 111.68612 111.36020
105 M 1 1 N FATHER 124.06655 124.06655 124.06655 124.06655
But instead of only going to X4, it has 50 points per row, so I have a 3562x56 data frame. I want to change it so each column of data in the F0 track (so after word, from 1:50) gets its own column, with the associated column number as another row. I want to keep all of the information in the first six columns with each data point as well, so it would look like this:
但是它不是仅仅转到X4,而是每行50个点,所以我有一个3562x56的数据帧。我想改变它,因此F0轨道中的每一列数据(因此,从1:50开始)得到它自己的列,相关的列号作为另一行。我想在每个数据点的前六列中保留所有信息,所以它看起来像这样:
Speaker Sex Group Repetition Accent Word Num F0
105 M 1 1 N AILMENT 1 102.31030
105 M 1 1 N AILMENT 2 102.31030
105 M 1 1 N AILMENT 3 102.31030
105 M 1 1 N AILMENT 4 102.31127
...
105 M 1 1 N COLLEGE 1 111.80641
105 M 1 1 N COLLEGE 1 111.80313
105 M 1 1 N COLLEGE 1 111.68612
105 M 1 1 N COLLEGE 1 111.36020
...
The code I tried to use, while tedious, is as follows:
我尝试使用的代码虽然繁琐,但如下所示:
x = 1
for (i in 1:dim(normrangef0)[1])
{
for (j in 1:50)
{
norm.all$Speaker[x] <- normrangef0$Speaker[i]
norm.all$Sex[x] <- normrangef0$Sex[i]
norm.all$Group[x] <- normrangef0$Group[i]
norm.all$Repetition[x] <- normrangef0$Repetition[i]
norm.all$Word[x] <- normrangef0$Word[i]
norm.all$Accent[x] <- normrangef0$Accent[i]
norm.all$Time[x] <- j
norm.all$F0[x] <- normrangef0[i,j+6]
x = x+1
}
}
However, when I do this with norm.all as a NULL object (just defined by norm.all = c() ), I end up with a list of over 200k items, many of which are NAs. When I define norm.all as a data frame (either an empty one or one of all 0s, in the 178100x8 data frame, I get an error:
但是,当我使用norm.all作为NULL对象(仅由norm.all = c()定义)时,我最终得到了一个超过20万项的列表,其中许多是NA。当我将norm.all定义为数据帧(空数或全0中的一个,在178100x8数据帧中,我得到一个错误:
Error in `$<-.data.frame`(`*tmp*`, "Speaker", value = 105L) : replacement has 1 row, data has 0
Is my code just totally off? Is there another way to do this?
我的代码完全关闭了吗?还有另一种方法吗?
2 个解决方案
#1
6
Use melt
from "reshape2"
使用来自“reshape2”的融化
library(reshape2)
melt(mydf, id.vars=c("Speaker", "Sex", "Group", "Repetition", "Accent", "Word"))
# Speaker Sex Group Repetition Accent Word variable value
# 1 105 M 1 1 N AILMENT 1 102.3103
# 2 105 M 1 1 N COLLEGE 1 111.8064
# 3 105 M 1 1 N FATHER 1 124.0666
# 4 105 M 1 1 N AILMENT 2 102.3103
# 5 105 M 1 1 N COLLEGE 2 111.8031
# 6 105 M 1 1 N FATHER 2 124.0666
# 7 105 M 1 1 N AILMENT 3 102.3103
# 8 105 M 1 1 N COLLEGE 3 111.6861
# 9 105 M 1 1 N FATHER 3 124.0666
# 10 105 M 1 1 N AILMENT 4 102.3113
# 11 105 M 1 1 N COLLEGE 4 111.3602
# 12 105 M 1 1 N FATHER 4 124.0666
In base R, you can also use stack
to stack the columns named 1 through 4, and cbind
that with the first group of columns. Alternatively, unlist
will also do this.
在基础R中,您还可以使用堆栈来堆叠名为1到4的列,并使用第一组列cbind。或者,unlist也会这样做。
You may also want to look into the "data.table" package to get a bit of a speed boost.
您可能还想查看“data.table”包以获得一点速度提升。
#2
2
With reshape
:
重塑:
x <- read.table(header=T, text="Speaker Sex Group Repetition Accent Word 1 2 3 4
105 M 1 1 N AILMENT 102.31030 102.31030 102.31030 102.31127
105 M 1 1 N COLLEGE 111.80641 111.80313 111.68612 111.36020
105 M 1 1 N FATHER 124.06655 124.06655 124.06655 124.06655")
reshape(x, direction="long", sep='', varying=paste0('X', 1:4))
## Speaker Sex Group Repetition Accent Word time X id
## 1.1 105 M 1 1 N AILMENT 1 102.3103 1
## 2.1 105 M 1 1 N COLLEGE 1 111.8064 2
## 3.1 105 M 1 1 N FATHER 1 124.0666 3
## 1.2 105 M 1 1 N AILMENT 2 102.3103 1
## 2.2 105 M 1 1 N COLLEGE 2 111.8031 2
## 3.2 105 M 1 1 N FATHER 2 124.0666 3
## 1.3 105 M 1 1 N AILMENT 3 102.3103 1
## 2.3 105 M 1 1 N COLLEGE 3 111.6861 2
## 3.3 105 M 1 1 N FATHER 3 124.0666 3
## 1.4 105 M 1 1 N AILMENT 4 102.3113 1
## 2.4 105 M 1 1 N COLLEGE 4 111.3602 2
## 3.4 105 M 1 1 N FATHER 4 124.0666 3
#1
6
Use melt
from "reshape2"
使用来自“reshape2”的融化
library(reshape2)
melt(mydf, id.vars=c("Speaker", "Sex", "Group", "Repetition", "Accent", "Word"))
# Speaker Sex Group Repetition Accent Word variable value
# 1 105 M 1 1 N AILMENT 1 102.3103
# 2 105 M 1 1 N COLLEGE 1 111.8064
# 3 105 M 1 1 N FATHER 1 124.0666
# 4 105 M 1 1 N AILMENT 2 102.3103
# 5 105 M 1 1 N COLLEGE 2 111.8031
# 6 105 M 1 1 N FATHER 2 124.0666
# 7 105 M 1 1 N AILMENT 3 102.3103
# 8 105 M 1 1 N COLLEGE 3 111.6861
# 9 105 M 1 1 N FATHER 3 124.0666
# 10 105 M 1 1 N AILMENT 4 102.3113
# 11 105 M 1 1 N COLLEGE 4 111.3602
# 12 105 M 1 1 N FATHER 4 124.0666
In base R, you can also use stack
to stack the columns named 1 through 4, and cbind
that with the first group of columns. Alternatively, unlist
will also do this.
在基础R中,您还可以使用堆栈来堆叠名为1到4的列,并使用第一组列cbind。或者,unlist也会这样做。
You may also want to look into the "data.table" package to get a bit of a speed boost.
您可能还想查看“data.table”包以获得一点速度提升。
#2
2
With reshape
:
重塑:
x <- read.table(header=T, text="Speaker Sex Group Repetition Accent Word 1 2 3 4
105 M 1 1 N AILMENT 102.31030 102.31030 102.31030 102.31127
105 M 1 1 N COLLEGE 111.80641 111.80313 111.68612 111.36020
105 M 1 1 N FATHER 124.06655 124.06655 124.06655 124.06655")
reshape(x, direction="long", sep='', varying=paste0('X', 1:4))
## Speaker Sex Group Repetition Accent Word time X id
## 1.1 105 M 1 1 N AILMENT 1 102.3103 1
## 2.1 105 M 1 1 N COLLEGE 1 111.8064 2
## 3.1 105 M 1 1 N FATHER 1 124.0666 3
## 1.2 105 M 1 1 N AILMENT 2 102.3103 1
## 2.2 105 M 1 1 N COLLEGE 2 111.8031 2
## 3.2 105 M 1 1 N FATHER 2 124.0666 3
## 1.3 105 M 1 1 N AILMENT 3 102.3103 1
## 2.3 105 M 1 1 N COLLEGE 3 111.6861 2
## 3.3 105 M 1 1 N FATHER 3 124.0666 3
## 1.4 105 M 1 1 N AILMENT 4 102.3113 1
## 2.4 105 M 1 1 N COLLEGE 4 111.3602 2
## 3.4 105 M 1 1 N FATHER 4 124.0666 3