I have a matrix that I would like to subset and eventually use to make a plot. The data is a list of counts for specific blood markers for each patient in a population. It looks like this:
我有一个矩阵,我想分组并最终用于制作情节。该数据是群体中每个患者的特定血液标记的计数列表。它看起来像这样:
df <- data.frame(MarkerID=c("Class","A123","A124"),
MarkerName=c("","X","Y"),
Patient.1=c(0,1,5),
Patent.2=c(1,2,6),
Patent.3=c(0,3,7),
Patient.4=c(1,4,8))
I would like to make a data frame of all of the patients (columns 3-6) that have a class value of zero (1st row) and a second data frame of all of the patients with a class value of 1.
我想建立所有患者(第3-6列)的数据框,其类值为零(第1行),并且所有患者的第二个数据框的类值为1。
In the past I have used the subset function to select rows based on the values in a column, is it possible to select a subset of columns based on the values in a row?
在过去,我使用子集函数根据列中的值选择行,是否可以根据行中的值选择列的子集?
I've tried this:
我试过这个:
x <- subset(data, data[1,] == 0)
however, when I do dim(x)
the number of columns is the same as dim(data)
but the number of rows is different. Any ideas on how I can make this return just those columns whose value in row 1 is 0?
但是,当我做dim(x)时,列数与dim(数据)相同,但行数不同。有关如何使其返回的任何想法只返回第1行中值为0的列?
Roland, Yes. You're example df is what the data frame looks like. There are ~30,000 markers and >400 patients in the data frame so I didn't post the dput(head(data))
. Thanks for the reshaping tip, I'll give that a try.
罗兰,是的。你的例子是df是数据框的样子。数据框中有大约30,000个标记和> 400个患者,因此我没有发布输入(头部(数据))。感谢重塑提示,我会尝试一下。
Your example code did work to subset the columns based on the rows
您的示例代码确实可以根据行对列进行子集化
data[,c(TRUE,TRUE,data[1,-(1:2)]==1)]
on the data I was then able to get a data frame with all of the rows and only the columns with the indicated class.
在数据上我然后能够获得包含所有行的数据框,并且只能获得具有指示类的列。
1 个解决方案
#1
12
Your data is nor arranged in a good way. It would be better to reshape it.
您的数据也没有很好的安排。重塑它会更好。
In absence of input data this is just a guess:
如果没有输入数据,这只是一个猜测:
df <- data.frame(MarkerID=c("Class","A123","A124"),
MarkerName=c("","X","Y"),
Patient.1=c(0,1,5),
Patent.2=c(1,2,6),
Patent.3=c(0,3,7),
Patient.4=c(1,4,8))
# MarkerID MarkerName Patient.1 Patent.2 Patent.3 Patient.4
#1 Class 0 1 0 1
#2 A123 X 1 2 3 4
#3 A124 Y 5 6 7 8
df[,c(TRUE,TRUE,df[1,-(1:2)]==0)]
# MarkerID MarkerName Patient.1 Patent.3
#1 Class 0 0
#2 A123 X 1 3
#3 A124 Y 5 7
Here c(TRUE,TRUE,df[1,-(1:2)]==0)
creates a logical vector, which is TRUE
for the first two columns and for those columns, which have a 0 in the first row. Then I subset the columns based on this vector.
这里c(TRUE,TRUE,df [1, - (1:2)] == 0)创建一个逻辑向量,对于前两列和那些在第一行中为0的列为TRUE。然后我根据这个向量对列进行子集化。
df[,c(TRUE,TRUE,df[1,-(1:2)]==1)]
# MarkerID MarkerName Patent.2 Patient.4
#1 Class 1 1
#2 A123 X 2 4
#3 A124 Y 6 8
This would reshape your data into a more common format (for statistical software):
这会将您的数据重塑为更常见的格式(对于统计软件):
library(reshape2)
df2 <- merge(melt(df[1,],variable.name="Patient",value.name="class")[-(1:2)],
melt(df[-1,],variable.name="Patient"),all=TRUE)
# Patient class MarkerID MarkerName value
#1 Patent.2 1 A123 X 2
#2 Patent.2 1 A124 Y 6
#3 Patent.3 0 A123 X 3
#4 Patent.3 0 A124 Y 7
#5 Patient.1 0 A123 X 1
#6 Patient.1 0 A124 Y 5
#7 Patient.4 1 A123 X 4
#8 Patient.4 1 A124 Y 8
You could then use subset
:
然后你可以使用子集:
subset(df2,class==0)
# Patient class MarkerID MarkerName value
#3 Patent.3 0 A123 X 3
#4 Patent.3 0 A124 Y 7
#5 Patient.1 0 A123 X 1
#6 Patient.1 0 A124 Y 5
#1
12
Your data is nor arranged in a good way. It would be better to reshape it.
您的数据也没有很好的安排。重塑它会更好。
In absence of input data this is just a guess:
如果没有输入数据,这只是一个猜测:
df <- data.frame(MarkerID=c("Class","A123","A124"),
MarkerName=c("","X","Y"),
Patient.1=c(0,1,5),
Patent.2=c(1,2,6),
Patent.3=c(0,3,7),
Patient.4=c(1,4,8))
# MarkerID MarkerName Patient.1 Patent.2 Patent.3 Patient.4
#1 Class 0 1 0 1
#2 A123 X 1 2 3 4
#3 A124 Y 5 6 7 8
df[,c(TRUE,TRUE,df[1,-(1:2)]==0)]
# MarkerID MarkerName Patient.1 Patent.3
#1 Class 0 0
#2 A123 X 1 3
#3 A124 Y 5 7
Here c(TRUE,TRUE,df[1,-(1:2)]==0)
creates a logical vector, which is TRUE
for the first two columns and for those columns, which have a 0 in the first row. Then I subset the columns based on this vector.
这里c(TRUE,TRUE,df [1, - (1:2)] == 0)创建一个逻辑向量,对于前两列和那些在第一行中为0的列为TRUE。然后我根据这个向量对列进行子集化。
df[,c(TRUE,TRUE,df[1,-(1:2)]==1)]
# MarkerID MarkerName Patent.2 Patient.4
#1 Class 1 1
#2 A123 X 2 4
#3 A124 Y 6 8
This would reshape your data into a more common format (for statistical software):
这会将您的数据重塑为更常见的格式(对于统计软件):
library(reshape2)
df2 <- merge(melt(df[1,],variable.name="Patient",value.name="class")[-(1:2)],
melt(df[-1,],variable.name="Patient"),all=TRUE)
# Patient class MarkerID MarkerName value
#1 Patent.2 1 A123 X 2
#2 Patent.2 1 A124 Y 6
#3 Patent.3 0 A123 X 3
#4 Patent.3 0 A124 Y 7
#5 Patient.1 0 A123 X 1
#6 Patient.1 0 A124 Y 5
#7 Patient.4 1 A123 X 4
#8 Patient.4 1 A124 Y 8
You could then use subset
:
然后你可以使用子集:
subset(df2,class==0)
# Patient class MarkerID MarkerName value
#3 Patent.3 0 A123 X 3
#4 Patent.3 0 A124 Y 7
#5 Patient.1 0 A123 X 1
#6 Patient.1 0 A124 Y 5