将面板数据宽格式重塑为长格式

时间:2021-08-09 20:09:54

I am struggling with transformation of a Panel Dataset from wide to long format. The Dataset looks like this:

我正在努力将Panel数据集从宽格式转换为长格式。数据集看起来像这样:

ID | KP1_430a | KP1_430b | KP1_430c | KP2_430a | KP2_430b | KP2_430c | KP1_1500a | ...  
1     ....
2     ....

KP1; KP2 up to KP7 describe the Waves. a,b up to f describe a specific Item. (E.g. left to right right placement of Party a)

KP1; KP2直到KP7描述了波浪。 a,b到f描述特定的项目。 (例如,甲方从左到右的位置)

I would like to have this data in long format. Like this:

我想以长格式提供这些数据。喜欢这个:

ID | Party | Wave | 430 | 1500  
 1     1       1     ..    ..
 1     2       1     ..    ..
 .     .       .          
 1     1       2     ..    ..
 .     .       .         
 2     1       1     ..    ..  

I tried to use the reshape function. But I had problems reshaping it over time and over the parties simultaneously.

我试着使用重塑功能。但是,随着时间的推移和同时对各方进行重塑,我遇到了问题。

Here is a small data.frame example.

这是一个小的data.frame示例。

data <- data.frame(matrix(rnorm(10),2,10))  
data[,1] <- 1:2  
names(data) <- c("ID","KP1_430a" , "KP1_430b" , "KP1_430c" , "KP2_430a" , "KP2_430b ", "KP2_430c ", "KP1_1500a" ,"KP1_1500b", "KP1_1500c")

And this is how far I got.

这是我得到了多远。

  data_long <- reshape(data,varying=list(names(data)[2:4],names(data)[5:7], names(data[8:10]),  
                            v.names=c("KP1_430","KP2_430","KP1_1500"),  
                           direction="long", timevar="Party")

The question remains: how I can get the time varying variables in long format as well? And is there a more elegant way to reshape this data? In the code above I would have to enter the names (names(data)[2:4]) for each wave and variable. With this small data.frame it is Ok, but the Dataset is a lot larger.

问题仍然存在:如何以长格式获取时变变量?是否有更优雅的方式来重塑这些数据?在上面的代码中,我必须为每个wave和变量输入名称(名称(数据)[2:4])。有了这个小的data.frame就可以,但数据集要大得多。

EDIT: How this transformation could be done by hand: I actually have done this, which leaves me with a page-long code file.
First, Bind KP1_430a and KP1_1500a with IDs, Time=1 and Party=1 column wise. Second create the same object for all parties [b-f], changing the party index respectively, and append it row-wise. Do step one and two for the rest of the waves [2-7], respectively changing party and time var, and append them row-wise.

编辑:如何手动完成这个转换:我实际上已经完成了这个,这留下了一个页面长的代码文件。首先,绑定KP1_430a和KP1_1500a,ID为ID,时间= 1,Party = 1列。第二,为所有各方创建相同的对象[b-f],分别改变聚会索引,并逐行追加。对其余的波[2-7]执行第一步和第二步,分别更改party和time var,并按顺序追加它们。

2 个解决方案

#1


4  

It is usually easier to proceed in two steps: first use melt to put your data into a "tall" format (unless it is already the case) and then use dcast to convert ti to a wider format.

通常更容易分两步进行:首先使用melt将数据置于“高”格式(除非已经是这种情况),然后使用dcast将ti转换为更宽的格式。

library(reshape2)
library(stringr)

# Tall format
d <- melt(data, id.vars="ID")

# Process the column containing wave and party
d1 <- str_match_all( 
  as.character( d$variable ), 
  "KP([0-9])_([0-9]+)([a-z])" 
)
d1 <- do.call( rbind, d1 )
d1 <- d1[,-1]
colnames(d1) <- c("wave", "number", "party")
d1 <- as.data.frame( d1)
d <- cbind( d, d1 )

# Convert to the desired format
d <- dcast( d, ID + wave + party ~ number )

#2


0  

At the moment your Wave data is in your variable names and you need to extract it with some string processing. I had no trouble with melt

目前,您的Wave数据位于变量名称中,您需要通过一些字符串处理来提取它。我融化没有问题

mdat <- melt(data, id.vars="ID")
mdat$wave=sub("KP", "", sub("_.+$", "", mdat$variable)) # remove the other stuff
mdat

Your description is too sketchy (so far) for me to figure out the rule for deriving a "Party" variable, so perhaps you can edit you question to show how that might be done by a human being .... and then we can show the computer how to to do it.

你的描述太粗略了(到目前为止)让我找出导出“派对”变量的规则,所以也许你可以编辑你的问题来展示人类可以做到的......然后我们就可以了向计算机展示如何做到这一点。

EDIT: If the last lower-case letter in the original column names is Party as Vincent thinks, then you could trim the trailing spaces in those names and extract:

编辑:如果原始列名中的最后一个小写字母是Vincent认为的Party,那么您可以修剪这些名称中的尾随空格并提取:

mdat$var <- sub("\\s", "", (as.character(mdat$variable)))
mdat$party=substr( mdat$var, nchar(mdat$var), nchar(mdat$var))
#--------------
> mdat
   ID  variable      value wave party       var
1   1  KP1_430a  0.7220627    1     a  KP1_430a
2   2  KP1_430a  0.9585243    1     a  KP1_430a
3   1  KP1_430b -1.2954671    1     b  KP1_430b
4   2  KP1_430b  0.3393617    1     b  KP1_430b
5   1  KP1_430c -1.1477627    1     c  KP1_430c
6   2  KP1_430c -1.0909179    1     c  KP1_430c
<snipped output>

#1


4  

It is usually easier to proceed in two steps: first use melt to put your data into a "tall" format (unless it is already the case) and then use dcast to convert ti to a wider format.

通常更容易分两步进行:首先使用melt将数据置于“高”格式(除非已经是这种情况),然后使用dcast将ti转换为更宽的格式。

library(reshape2)
library(stringr)

# Tall format
d <- melt(data, id.vars="ID")

# Process the column containing wave and party
d1 <- str_match_all( 
  as.character( d$variable ), 
  "KP([0-9])_([0-9]+)([a-z])" 
)
d1 <- do.call( rbind, d1 )
d1 <- d1[,-1]
colnames(d1) <- c("wave", "number", "party")
d1 <- as.data.frame( d1)
d <- cbind( d, d1 )

# Convert to the desired format
d <- dcast( d, ID + wave + party ~ number )

#2


0  

At the moment your Wave data is in your variable names and you need to extract it with some string processing. I had no trouble with melt

目前,您的Wave数据位于变量名称中,您需要通过一些字符串处理来提取它。我融化没有问题

mdat <- melt(data, id.vars="ID")
mdat$wave=sub("KP", "", sub("_.+$", "", mdat$variable)) # remove the other stuff
mdat

Your description is too sketchy (so far) for me to figure out the rule for deriving a "Party" variable, so perhaps you can edit you question to show how that might be done by a human being .... and then we can show the computer how to to do it.

你的描述太粗略了(到目前为止)让我找出导出“派对”变量的规则,所以也许你可以编辑你的问题来展示人类可以做到的......然后我们就可以了向计算机展示如何做到这一点。

EDIT: If the last lower-case letter in the original column names is Party as Vincent thinks, then you could trim the trailing spaces in those names and extract:

编辑:如果原始列名中的最后一个小写字母是Vincent认为的Party,那么您可以修剪这些名称中的尾随空格并提取:

mdat$var <- sub("\\s", "", (as.character(mdat$variable)))
mdat$party=substr( mdat$var, nchar(mdat$var), nchar(mdat$var))
#--------------
> mdat
   ID  variable      value wave party       var
1   1  KP1_430a  0.7220627    1     a  KP1_430a
2   2  KP1_430a  0.9585243    1     a  KP1_430a
3   1  KP1_430b -1.2954671    1     b  KP1_430b
4   2  KP1_430b  0.3393617    1     b  KP1_430b
5   1  KP1_430c -1.1477627    1     c  KP1_430c
6   2  KP1_430c -1.0909179    1     c  KP1_430c
<snipped output>