I want to create one large dataframe from two smaller dataframes with the first three columns in common between each small dataframe. I also want shared species between the two columns to fall into the same columns.
我想从两个较小的dataframe中创建一个大的dataframe,前三列在每个小的dataframe中相同。我还希望在两列之间共享的物种可以归入相同的列。
My dataframe (df) 1 has 38obs. of 40 variables
我的dataframe (df) 1有38个obs。40个变量
My dataframe (df) 2 has 30obs. of 35 variables
我的dataframe (df) 2有30obs。35个变量
I want to retain these headers common to both (LOGID, DECAY, DIAMETER). Some species are common to both dataframes and others are unique to one or the other. I want all species occurrences in a new table.
我想保留这两个标头共同的(逻辑,衰减,直径)。有些物种对dataframes和其他物种都是通用的,而另一些物种则是唯一的。我希望所有物种都出现在一个新表中。
Do I use cbind with some sort of match function? or create dummy columns? How would I go about this?
我用cbind吗?或创建虚拟列?我该怎么做呢?
e.g. DF1:
例如DF1:
LOGID DECAY DIAMETER SP1 SP2 SP3
1 2 20 2 2 3
2 4 22 1 0 7
3 4 12 3 1 2
e.g. DF2
例如DF2
LOGID DECAY DIAMETER SP1 SP5 SP3 SP7
4 2 25 8 0 2 1
5 4 10 0 0 3 1
6 2 11 1 1 1 1
I want them like this:
我希望他们是这样的:
LOGID DECAY DIAMETER SP1 SP2 SP3 SP5 SP7
1 2 20 2 2 3 0 0
2 4 22 1 0 7 0 0
3 4 12 3 1 2 0 0
4 2 25 8 0 2 0 1
5 4 10 0 0 3 0 1
6 2 11 1 0 1 1 1
I have tried using the code suggested below and end up with the following problem mainly because I didnt specify what I wanted the first time..... I want common species to fall into shared columns.
我试过使用下面建议的代码,最后出现了以下问题,主要是因为我没有明确说明我第一次想要什么……我想把普通物种分成共享列。
LOGID DECAY DIAMETER SP1x SP2 SP3x SP1y SP5 SP3y SP7
1 2 20 2 2 3 0 0 0 0
2 4 22 1 0 7 0 0 0 0
3 4 12 3 1 2 0 0 0 0
4 2 25 0 0 0 8 0 2 1
5 4 10 0 0 0 0 0 3 1
6 2 11 0 0 0 1 1 1 1
3 个解决方案
#1
2
Perhaps (but you are asked to produce a small example in code so we can test before throwing out code.):
也许(但要求您在代码中生成一个小示例,以便我们在抛出代码之前进行测试):
merge(df1,df2, by=1:3, all=TRUE)
With your example data, my suggestion produces:
通过您的示例数据,我的建议产生:
> merge(DF1,DF2, by=1:3, all=TRUE)
LOGID DECAY DIAMETER SP1 SP2 SP3 SP4 SP5 SP6 SP7
1 1 2 20 2 2 3 NA NA NA NA
2 2 4 22 1 0 7 NA NA NA NA
3 3 4 12 3 1 2 NA NA NA NA
4 4 2 25 NA NA NA 8 0 2 1
5 5 4 10 NA NA NA 0 0 3 1
6 6 2 11 NA NA NA 1 1 1 1
If you want to convert the NA's to 0's (which I see as not really true) then just do it:
如果你想把NA转换成0(我认为这不是真的),那就这么做:
> DF3 <- merge(DF1,DF2, by=1:3, all=TRUE)
> DF3[is.na(DF3)] <- 0
> DF3
LOGID DECAY DIAMETER SP1 SP2 SP3 SP4 SP5 SP6 SP7
1 1 2 20 2 2 3 0 0 0 0
2 2 4 22 1 0 7 0 0 0 0
3 3 4 12 3 1 2 0 0 0 0
4 4 2 25 0 0 0 8 0 2 1
5 5 4 10 0 0 0 0 0 3 1
6 6 2 11 0 0 0 1 1 1 1
If you really do not have any "overlapping" values in the shared columns and only want to "rbind" the dataframes then there is an rbind.fill
function in pkg:plyr. With the new example:
如果您确实没有共享列中的任何“重叠”值,并且只想“rbind”dataframes,那么就会有一个rbind。在包裹填充函数:plyr。用新的例子:
library( plyr )
rbind.fill(DF1,DF2)
LOGID DECAY DIAMETER SP1 SP2 SP3 SP5 SP7
1 1 2 20 2 2 3 NA NA
2 2 4 22 1 0 7 NA NA
3 3 4 12 3 1 2 NA NA
4 4 2 25 8 NA 2 0 1
5 5 4 10 0 NA 3 0 1
6 6 2 11 1 NA 1 1 1
#2
1
First cbind
the extra columns to the two data frames. For example:
首先,将额外的列绑定到两个数据帧。例如:
df1 <- cbind(df1, numeric(nrow(df1)),numeric(nrow(df1)),numeric(nrow(df1)),numeric(nrow(df1)))
names(df1)[7:10] = c("SP4","SP5","SP6","SP7")
Then do likewise for the second data frame.
然后对第二个数据帧执行同样的操作。
Then you can rbind
the two data frames.
然后可以绑定这两个数据帧。
If there are one or two variables present in both data frames, consider combining them in the rbind
ed data frame like so:
如果两个数据帧中都存在一个或两个变量,请考虑将它们组合到rbinded数据帧中,如下所示:
df.combined $SP3<- df.combined $SP3.x + df.combined $SP3.y
You will want to examine this case quite carefully before dropping SP3.x
and SP3.y
在删除SP3之前,您需要非常仔细地检查这个案例。x和SP3.y
You may also reconsider merge
, including the shared variables in the by
argument. But only if you are certain the variables present on the two original date frames will not collide. Otherwise you will have duplicate logid and decay tuples.
您还可以重新考虑merge,包括by参数中的共享变量。但是,只有当您确定两个原始日期框架上的变量不会发生碰撞时,才会发生冲突。否则,您将有重复的逻辑和衰变元组。
All this begs the question if you would be better off to try something like unstack
or melt
considering species as a variable. This would be more advantageous if you have several variables present in both data frames. Basically flatten your two original data frames, row bind them, then tabulate out the species variable as columns.
所有这一切都引出了一个问题:如果把物种作为变量来考虑,你是否更愿意尝试一些类似于解堆或融化的东西。如果在两个数据帧中都有多个变量,那么这将更加有利。基本上将两个原始数据帧拉平,行绑定它们,然后将物种变量作为列列出。
#3
0
There are actually many ways to do this I have found. I appreciate all comments above!
我发现有很多方法可以做到这一点。感谢以上所有评论!
Here is how I eventually did it:
我最终是这样做的:
Because my matrices are very large with many species, trying to find the common species can also be done using intersect: common.species <- intersect(colnames(df1), colnames(df2))
因为我的矩阵非常大,有很多种,试图找到常见的物种也可以用intersect: common来做。< -相交(colnames(df1)colnames(df2))
Then change to a dataframe: common.species=as.data.frame(common.species)
然后更改为dataframe: common.species=as.data.frame(common.species)
Merge your two dataframes: Datamerged<-merge(df1,df2, by=common.species, all=TRUE)
合并两个dataframes: Datamerged<-merge(df1,df2, by=common)。物种,所有= TRUE)
Change the NAs to zeros: Datamerged[is.na(Datamerged)] <- 0
将NAs更改为0:Datamerged[is.na(Datamerged)] <- 0
Voila!
瞧!
#1
2
Perhaps (but you are asked to produce a small example in code so we can test before throwing out code.):
也许(但要求您在代码中生成一个小示例,以便我们在抛出代码之前进行测试):
merge(df1,df2, by=1:3, all=TRUE)
With your example data, my suggestion produces:
通过您的示例数据,我的建议产生:
> merge(DF1,DF2, by=1:3, all=TRUE)
LOGID DECAY DIAMETER SP1 SP2 SP3 SP4 SP5 SP6 SP7
1 1 2 20 2 2 3 NA NA NA NA
2 2 4 22 1 0 7 NA NA NA NA
3 3 4 12 3 1 2 NA NA NA NA
4 4 2 25 NA NA NA 8 0 2 1
5 5 4 10 NA NA NA 0 0 3 1
6 6 2 11 NA NA NA 1 1 1 1
If you want to convert the NA's to 0's (which I see as not really true) then just do it:
如果你想把NA转换成0(我认为这不是真的),那就这么做:
> DF3 <- merge(DF1,DF2, by=1:3, all=TRUE)
> DF3[is.na(DF3)] <- 0
> DF3
LOGID DECAY DIAMETER SP1 SP2 SP3 SP4 SP5 SP6 SP7
1 1 2 20 2 2 3 0 0 0 0
2 2 4 22 1 0 7 0 0 0 0
3 3 4 12 3 1 2 0 0 0 0
4 4 2 25 0 0 0 8 0 2 1
5 5 4 10 0 0 0 0 0 3 1
6 6 2 11 0 0 0 1 1 1 1
If you really do not have any "overlapping" values in the shared columns and only want to "rbind" the dataframes then there is an rbind.fill
function in pkg:plyr. With the new example:
如果您确实没有共享列中的任何“重叠”值,并且只想“rbind”dataframes,那么就会有一个rbind。在包裹填充函数:plyr。用新的例子:
library( plyr )
rbind.fill(DF1,DF2)
LOGID DECAY DIAMETER SP1 SP2 SP3 SP5 SP7
1 1 2 20 2 2 3 NA NA
2 2 4 22 1 0 7 NA NA
3 3 4 12 3 1 2 NA NA
4 4 2 25 8 NA 2 0 1
5 5 4 10 0 NA 3 0 1
6 6 2 11 1 NA 1 1 1
#2
1
First cbind
the extra columns to the two data frames. For example:
首先,将额外的列绑定到两个数据帧。例如:
df1 <- cbind(df1, numeric(nrow(df1)),numeric(nrow(df1)),numeric(nrow(df1)),numeric(nrow(df1)))
names(df1)[7:10] = c("SP4","SP5","SP6","SP7")
Then do likewise for the second data frame.
然后对第二个数据帧执行同样的操作。
Then you can rbind
the two data frames.
然后可以绑定这两个数据帧。
If there are one or two variables present in both data frames, consider combining them in the rbind
ed data frame like so:
如果两个数据帧中都存在一个或两个变量,请考虑将它们组合到rbinded数据帧中,如下所示:
df.combined $SP3<- df.combined $SP3.x + df.combined $SP3.y
You will want to examine this case quite carefully before dropping SP3.x
and SP3.y
在删除SP3之前,您需要非常仔细地检查这个案例。x和SP3.y
You may also reconsider merge
, including the shared variables in the by
argument. But only if you are certain the variables present on the two original date frames will not collide. Otherwise you will have duplicate logid and decay tuples.
您还可以重新考虑merge,包括by参数中的共享变量。但是,只有当您确定两个原始日期框架上的变量不会发生碰撞时,才会发生冲突。否则,您将有重复的逻辑和衰变元组。
All this begs the question if you would be better off to try something like unstack
or melt
considering species as a variable. This would be more advantageous if you have several variables present in both data frames. Basically flatten your two original data frames, row bind them, then tabulate out the species variable as columns.
所有这一切都引出了一个问题:如果把物种作为变量来考虑,你是否更愿意尝试一些类似于解堆或融化的东西。如果在两个数据帧中都有多个变量,那么这将更加有利。基本上将两个原始数据帧拉平,行绑定它们,然后将物种变量作为列列出。
#3
0
There are actually many ways to do this I have found. I appreciate all comments above!
我发现有很多方法可以做到这一点。感谢以上所有评论!
Here is how I eventually did it:
我最终是这样做的:
Because my matrices are very large with many species, trying to find the common species can also be done using intersect: common.species <- intersect(colnames(df1), colnames(df2))
因为我的矩阵非常大,有很多种,试图找到常见的物种也可以用intersect: common来做。< -相交(colnames(df1)colnames(df2))
Then change to a dataframe: common.species=as.data.frame(common.species)
然后更改为dataframe: common.species=as.data.frame(common.species)
Merge your two dataframes: Datamerged<-merge(df1,df2, by=common.species, all=TRUE)
合并两个dataframes: Datamerged<-merge(df1,df2, by=common)。物种,所有= TRUE)
Change the NAs to zeros: Datamerged[is.na(Datamerged)] <- 0
将NAs更改为0:Datamerged[is.na(Datamerged)] <- 0
Voila!
瞧!