
时间:2021-06-15 21:05:07

I would like to do this


df <- data.frame(a = sample(c(c(4,7),11,NA),  10, rep = TRUE), 
                 b = sample(c(1, 2, 3, NA, 5, 6),  10, rep=TRUE), 
                 c = sample(c(11, 12, 13, 14, 15, 16),  10, rep=TRUE))

but instead of getting this,


    a  b  c
1   4 NA 12
2   7  6 12
3  NA NA 14
4  11  1 16
5  NA  2 14
6  NA  3 13
7  11 NA 13
8  NA  6 15
9   7  3 16
10  7  5 16

I would like to get something this where I have a range at some points,


    a  b  c
1  4-7 NA 12
2  4-7  6 12
3  NA  NA 14
4  11   1 16
5  NA   2 14
6  NA   3 13
7  11  NA 13
8  NA   6 15
9  4-7  3 16
10 4-7  5 16

I'm confused and tired and asking for help.


Update after reading SimonO101's comments at 2013-09-09 22:30:14Z

I think my question could also be stated like this, I would like this data frame


data.frame(A = c(4:7, 9),B = c(1,2))

to show up like


  A   B
1 4:7 9
2   2 2

3 个解决方案



Maybe you want this?



d = data.table(A = list(c(4,7), 9),B = c(1,2))
#     A B
#1: 4,7 1
#2:   9 2

One more possibility is to store the unevaluated expression (it's really not clear what OP wants, so I'm just shooting in the dark here):


d = data.table(A = list(quote(4:7), 9), B = c(1,2))
#        A B
#1: <call> 1
#2:      9 2
#[1] 9
lapply(d[, A], eval)
#[1] 4 5 6 7
#[1] 9



You could use cut to convert the values to whatever intervals you like, and also set appropriate labels for each of the intervals like so:


newdf <- sapply( df , cut , breaks = c(1:4,7.01,8:16) , labels = c(1:3,"4-7",8:16) , right = TRUE )
#      a     b     c   
# [1,] "3"   NA    "12"
# [2,] "4-7" "4-7" "12"
# [3,] NA    NA    "14"
# [4,] "11"  NA    "16"
# [5,] NA    "1"   "14"
# [6,] NA    "2"   "13"
# [7,] "11"  NA    "13"
# [8,] NA    "4-7" "15"
# [9,] "4-7" "2"   "16"
#[10,] "4-7" "4-7" "16"



What exactly do you want to do with these ranges?


One simple option is to replace each column with 2 columns, the first is the minimum, the second is the maximum (so you would have a.min, a.max, b.min, etc.). You could represent exact values by either having the max be NA or by having the min and the max be the same.


Another option is to create a new object that is stored as a list with each row being either a vector of length 1 (exact value) or length 2 (the range). Write a method for format for your object that creates a character vector of either the single value or the range (e.g. 4-7) and when you print the data frame it calls the format function and ends up printing something like you show above. You will need other methods for working with those columns in whatever way you plan to work with this data.




Maybe you want this?



d = data.table(A = list(c(4,7), 9),B = c(1,2))
#     A B
#1: 4,7 1
#2:   9 2

One more possibility is to store the unevaluated expression (it's really not clear what OP wants, so I'm just shooting in the dark here):


d = data.table(A = list(quote(4:7), 9), B = c(1,2))
#        A B
#1: <call> 1
#2:      9 2
#[1] 9
lapply(d[, A], eval)
#[1] 4 5 6 7
#[1] 9



You could use cut to convert the values to whatever intervals you like, and also set appropriate labels for each of the intervals like so:


newdf <- sapply( df , cut , breaks = c(1:4,7.01,8:16) , labels = c(1:3,"4-7",8:16) , right = TRUE )
#      a     b     c   
# [1,] "3"   NA    "12"
# [2,] "4-7" "4-7" "12"
# [3,] NA    NA    "14"
# [4,] "11"  NA    "16"
# [5,] NA    "1"   "14"
# [6,] NA    "2"   "13"
# [7,] "11"  NA    "13"
# [8,] NA    "4-7" "15"
# [9,] "4-7" "2"   "16"
#[10,] "4-7" "4-7" "16"



What exactly do you want to do with these ranges?


One simple option is to replace each column with 2 columns, the first is the minimum, the second is the maximum (so you would have a.min, a.max, b.min, etc.). You could represent exact values by either having the max be NA or by having the min and the max be the same.


Another option is to create a new object that is stored as a list with each row being either a vector of length 1 (exact value) or length 2 (the range). Write a method for format for your object that creates a character vector of either the single value or the range (e.g. 4-7) and when you print the data frame it calls the format function and ends up printing something like you show above. You will need other methods for working with those columns in whatever way you plan to work with this data.
