保持data.frame(或表)中的范围

时间:2021-06-15 21:05:07

I would like to do this

我想这样做

set.seed(667) 
df <- data.frame(a = sample(c(c(4,7),11,NA),  10, rep = TRUE), 
                 b = sample(c(1, 2, 3, NA, 5, 6),  10, rep=TRUE), 
                 c = sample(c(11, 12, 13, 14, 15, 16),  10, rep=TRUE))

but instead of getting this,

但没有得到这个,

df
    a  b  c
1   4 NA 12
2   7  6 12
3  NA NA 14
4  11  1 16
5  NA  2 14
6  NA  3 13
7  11 NA 13
8  NA  6 15
9   7  3 16
10  7  5 16

I would like to get something this where I have a range at some points,

我想得到一些东西,我在某些地方有一个范围,

    a  b  c
1  4-7 NA 12
2  4-7  6 12
3  NA  NA 14
4  11   1 16
5  NA   2 14
6  NA   3 13
7  11  NA 13
8  NA   6 15
9  4-7  3 16
10 4-7  5 16

I'm confused and tired and asking for help.

我很困惑,很累,并寻求帮助。

Update after reading SimonO101's comments at 2013-09-09 22:30:14Z

I think my question could also be stated like this, I would like this data frame

我想我的问题也可以这样说,我想要这个数据框架

data.frame(A = c(4:7, 9),B = c(1,2))

to show up like

出现像

  A   B
1 4:7 9
2   2 2

3 个解决方案

#1


3  

Maybe you want this?

也许你想要这个?

library(data.table)

d = data.table(A = list(c(4,7), 9),B = c(1,2))
#     A B
#1: 4,7 1
#2:   9 2

One more possibility is to store the unevaluated expression (it's really not clear what OP wants, so I'm just shooting in the dark here):

另一种可能性是存储未评估的表达式(OP真的不清楚OP想要什么,所以我只是在黑暗中拍摄):

d = data.table(A = list(quote(4:7), 9), B = c(1,2))
#        A B
#1: <call> 1
#2:      9 2
d[,A]
#[[1]]
#4:7
#
#[[2]]
#[1] 9
lapply(d[, A], eval)
#[[1]]
#[1] 4 5 6 7
#
#[[2]]
#[1] 9

#2


1  

You could use cut to convert the values to whatever intervals you like, and also set appropriate labels for each of the intervals like so:

您可以使用剪切将值转换为您喜欢的任何间隔,并为每个间隔设置适当的标签,如下所示:

newdf <- sapply( df , cut , breaks = c(1:4,7.01,8:16) , labels = c(1:3,"4-7",8:16) , right = TRUE )
#      a     b     c   
# [1,] "3"   NA    "12"
# [2,] "4-7" "4-7" "12"
# [3,] NA    NA    "14"
# [4,] "11"  NA    "16"
# [5,] NA    "1"   "14"
# [6,] NA    "2"   "13"
# [7,] "11"  NA    "13"
# [8,] NA    "4-7" "15"
# [9,] "4-7" "2"   "16"
#[10,] "4-7" "4-7" "16"

#3


0  

What exactly do you want to do with these ranges?

你想用这些范围做什么?

One simple option is to replace each column with 2 columns, the first is the minimum, the second is the maximum (so you would have a.min, a.max, b.min, etc.). You could represent exact values by either having the max be NA or by having the min and the max be the same.

一个简单的选择是用2列替换每列,第一列是最小值,第二列是最大值(因此你将有a.min,a.max,b.min等)。您可以通过使max为NA或使min和max相同来表示精确值。

Another option is to create a new object that is stored as a list with each row being either a vector of length 1 (exact value) or length 2 (the range). Write a method for format for your object that creates a character vector of either the single value or the range (e.g. 4-7) and when you print the data frame it calls the format function and ends up printing something like you show above. You will need other methods for working with those columns in whatever way you plan to work with this data.

另一种选择是创建一个存储为列表的新对象,每个行是长​​度为1(精确值)或长度为2(范围)的向量。为对象编写一种格式的方法,创建单个值或范围的字符向量(例如4-7),当您打印数据框时,它调用格式函数并最终打印出上面显示的内容。您计划使用此数据的任何方式都需要其他方法来处理这些列。

#1


3  

Maybe you want this?

也许你想要这个?

library(data.table)

d = data.table(A = list(c(4,7), 9),B = c(1,2))
#     A B
#1: 4,7 1
#2:   9 2

One more possibility is to store the unevaluated expression (it's really not clear what OP wants, so I'm just shooting in the dark here):

另一种可能性是存储未评估的表达式(OP真的不清楚OP想要什么,所以我只是在黑暗中拍摄):

d = data.table(A = list(quote(4:7), 9), B = c(1,2))
#        A B
#1: <call> 1
#2:      9 2
d[,A]
#[[1]]
#4:7
#
#[[2]]
#[1] 9
lapply(d[, A], eval)
#[[1]]
#[1] 4 5 6 7
#
#[[2]]
#[1] 9

#2


1  

You could use cut to convert the values to whatever intervals you like, and also set appropriate labels for each of the intervals like so:

您可以使用剪切将值转换为您喜欢的任何间隔,并为每个间隔设置适当的标签,如下所示:

newdf <- sapply( df , cut , breaks = c(1:4,7.01,8:16) , labels = c(1:3,"4-7",8:16) , right = TRUE )
#      a     b     c   
# [1,] "3"   NA    "12"
# [2,] "4-7" "4-7" "12"
# [3,] NA    NA    "14"
# [4,] "11"  NA    "16"
# [5,] NA    "1"   "14"
# [6,] NA    "2"   "13"
# [7,] "11"  NA    "13"
# [8,] NA    "4-7" "15"
# [9,] "4-7" "2"   "16"
#[10,] "4-7" "4-7" "16"

#3


0  

What exactly do you want to do with these ranges?

你想用这些范围做什么?

One simple option is to replace each column with 2 columns, the first is the minimum, the second is the maximum (so you would have a.min, a.max, b.min, etc.). You could represent exact values by either having the max be NA or by having the min and the max be the same.

一个简单的选择是用2列替换每列,第一列是最小值,第二列是最大值(因此你将有a.min,a.max,b.min等)。您可以通过使max为NA或使min和max相同来表示精确值。

Another option is to create a new object that is stored as a list with each row being either a vector of length 1 (exact value) or length 2 (the range). Write a method for format for your object that creates a character vector of either the single value or the range (e.g. 4-7) and when you print the data frame it calls the format function and ends up printing something like you show above. You will need other methods for working with those columns in whatever way you plan to work with this data.

另一种选择是创建一个存储为列表的新对象,每个行是长​​度为1(精确值)或长度为2(范围)的向量。为对象编写一种格式的方法,创建单个值或范围的字符向量(例如4-7),当您打印数据框时,它调用格式函数并最终打印出上面显示的内容。您计划使用此数据的任何方式都需要其他方法来处理这些列。