I'd like to split a dataframe in 4 equals parts, because I'd like to use the 4 cores of my computer.
我想把一个dataframe分成4个部分,因为我想用我电脑的4个核心。
I did this :
我这样做:
df2 <- split(df, 1:4)
unsplit(df2, f=1:4)
and that
这
df2 <- split(df, 1:4)
unsplit(df2, f=c('1','2','3','4')
But the unsplit function did not work, I have these warnings messages
但是,unsplit函数不起作用,我有这些警告消息。
1: In split.default(seq_along(x), f, drop = drop, ...) :
data length is not a multiple of split variable
...
Do you have an idea of the reason ?
你知道原因吗?
2 个解决方案
#1
7
How many rows in df
? You will get that warning if the number of rows in your table is not divisible by 4. I think you are using the split factor f
incorrectly, unless what you want to do is put each subsequent row into a different split data.frame.
df有多少行?如果表中的行数不能被4整除,您将得到该警告。我认为您是在错误地使用分割因子f,除非您想要做的是将随后的每一行放入不同的分割数据。
If you really want to split your data into 4 dataframes. one row after the other then make your splitting factor the same size as the number of rows in your dataframe using rep_len
like this:
如果你真的想把你的数据分成4个dataframes。一个接一个,然后让你的分裂因子与你的dataframe中使用rep_len的行数相同:
## Split like this:
split(df , f = rep_len(1:4, nrow(df) ) )
## Unsplit like this:
unsplit( split(df , f = rep_len(1:4, nrow(df) ) ) , f = rep_len(1:4,nrow(df) ) )
Hopefully this example illustrates why the error occurs and how to avoid it (i.e. use a proper splitting factor!).
希望这个示例说明了错误发生的原因以及如何避免错误(即使用适当的分解因子!)
## Want to split our data.frame into two halves, but rows not divisible by 2
df <- data.frame( x = runif(5) )
df
## Splitting still works but...
## We get a warning because the split factor 'f' was not recycled as a multiple of it's length
split( df , f = 1:2 )
#$`1`
# x
#1 0.6970968
#3 0.5614762
#5 0.5910995
#$`2`
# x
#2 0.6206521
#4 0.1798006
Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable
## Instead let's use the same split levels (1:2)...
## but make it equal to the length of the rows in the table:
splt <- rep_len( 1:2 , nrow(df) )
splt
#[1] 1 2 1 2 1
## Split works, and f is not recycled because there are
## the same number of values in 'f' as rows in the table
split( df , f = splt )
#$`1`
# x
#1 0.6970968
#3 0.5614762
#5 0.5910995
#$`2`
# x
#2 0.6206521
#4 0.1798006
## And unsplitting then works as expected and reconstructs our original data.frame
unsplit( split( df , f = splt ) , f = splt )
# x
#1 0.6970968
#2 0.6206521
#3 0.5614762
#4 0.1798006
#5 0.5910995
#2
0
In the R language 'split' example . . .
在R语言的“分割”的例子中…
aq <- airquality
g <- aq$Month
l <- split(aq,g)
After the 'scale' function is executed
在“scale”函数执行之后。
l <- lapply(l, transform, Ozone = scale(Ozone))
I am guessing that at one time in R history the function 'scale' did not add extra attributes to the column it is modifying.
我猜想,在R历史上的某个时候,函数的“scale”并没有向它所修改的列添加额外的属性。
..$ Ozone : num ...
.. ..- attr(*, "scaled:center")= num 29.4
.. ..- attr(*, "scaled:scale")= num 18.2
As seen in here . . .
如图所示。
> str(l)
List of 5
$ 5:'data.frame': 31 obs. of 6 variables:
..$ Ozone : num [1:31, 1] 0.782 0.557 -0.523 -0.253 NA ...
.. ..- attr(*, "scaled:center")= num 23.6
.. ..- attr(*, "scaled:scale")= num 22.2
..$ Solar.R: int [1:31] 190 118 149 313 NA NA 299 99 19 194 ...
..$ Wind : num [1:31] 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
..$ Temp : int [1:31] 67 72 74 62 56 66 65 59 61 69 ...
..$ Month : int [1:31] 5 5 5 5 5 5 5 5 5 5 ...
..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
$ 6:'data.frame': 30 obs. of 6 variables:
..$ Ozone : num [1:30, 1] NA NA NA NA NA ...
.. ..- attr(*, "scaled:center")= num 29.4
.. ..- attr(*, "scaled:scale")= num 18.2
..$ Solar.R: int [1:30] 286 287 242 186 220 264 127 273 291 323 ...
..$ Wind : num [1:30] 8.6 9.7 16.1 9.2 8.6 14.3 9.7 6.9 13.8 11.5 ...
..$ Temp : int [1:30] 78 74 67 84 85 79 82 87 90 87 ...
..$ Month : int [1:30] 6 6 6 6 6 6 6 6 6 6 ...
..$ Day : int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
$ 7:'data.frame': 31 obs. of 6 variables:
..$ Ozone : num [1:31, 1] 2.399 -0.32 -0.857 NA 0.154 ...
.. ..- attr(*, "scaled:center")= num 59.1
.. ..- attr(*, "scaled:scale")= num 31.6
..$ Solar.R: int [1:31] 269 248 236 101 175 314 276 267 272 175 ...
..$ Wind : num [1:31] 4.1 9.2 9.2 10.9 4.6 10.9 5.1 6.3 5.7 7.4 ...
..$ Temp : int [1:31] 84 85 81 84 83 83 88 92 92 89 ...
..$ Month : int [1:31] 7 7 7 7 7 7 7 7 7 7 ...
..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
$ 8:'data.frame': 31 obs. of 6 variables:
..$ Ozone : num [1:31, 1] -0.528 -1.284 -1.108 0.455 -0.629 ...
.. ..- attr(*, "scaled:center")= num 60
.. ..- attr(*, "scaled:scale")= num 39.7
..$ Solar.R: int [1:31] 83 24 77 NA NA NA 255 229 207 222 ...
..$ Wind : num [1:31] 6.9 13.8 7.4 6.9 7.4 4.6 4 10.3 8 8.6 ...
..$ Temp : int [1:31] 81 81 82 86 85 87 89 90 90 92 ...
..$ Month : int [1:31] 8 8 8 8 8 8 8 8 8 8 ...
..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
$ 9:'data.frame': 30 obs. of 6 variables:
..$ Ozone : num [1:30, 1] 2.674 1.928 1.721 2.467 0.644 ...
.. ..- attr(*, "scaled:center")= num 31.4
.. ..- attr(*, "scaled:scale")= num 24.1
..$ Solar.R: int [1:30] 167 197 183 189 95 92 252 220 230 259 ...
..$ Wind : num [1:30] 6.9 5.1 2.8 4.6 7.4 15.5 10.9 10.3 10.9 9.7 ...
..$ Temp : int [1:30] 91 92 93 93 87 84 80 78 75 73 ...
..$ Month : int [1:30] 9 9 9 9 9 9 9 9 9 9 ...
..$ Day : int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
But now it does add those attributes
但现在它确实添加了这些属性。
..$ Ozone : num ...
.. ..- attr(*, "scaled:center")= num 29.4
.. ..- attr(*, "scaled:scale")= num 18.2
and the very simple 'unsplit' function is not programmed to handle those attributes.
而非常简单的“unsplit”函数并不是用来处理这些属性的。
> unsplit(l,g)
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long
The (direct and simple) solution is to get rid of those attributes.
(直接和简单的)解决方案是去掉这些属性。
attributes(l[[1]]$Ozone) <- NULL
attributes(l[[2]]$Ozone) <- NULL
attributes(l[[3]]$Ozone) <- NULL
attributes(l[[4]]$Ozone) <- NULL
attributes(l[[5]]$Ozone) <- NULL
Then try to unsplit again.
然后再试着再次分开。
str( unsplit(l,g) )
> str( unsplit(l,g) )
'data.frame': 153 obs. of 6 variables:
$ Ozone : num 0.782 0.557 -0.523 -0.253 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
So, now it works.
所以,现在它的工作原理。
Andre Mikulec
安德烈Mikulec
#1
7
How many rows in df
? You will get that warning if the number of rows in your table is not divisible by 4. I think you are using the split factor f
incorrectly, unless what you want to do is put each subsequent row into a different split data.frame.
df有多少行?如果表中的行数不能被4整除,您将得到该警告。我认为您是在错误地使用分割因子f,除非您想要做的是将随后的每一行放入不同的分割数据。
If you really want to split your data into 4 dataframes. one row after the other then make your splitting factor the same size as the number of rows in your dataframe using rep_len
like this:
如果你真的想把你的数据分成4个dataframes。一个接一个,然后让你的分裂因子与你的dataframe中使用rep_len的行数相同:
## Split like this:
split(df , f = rep_len(1:4, nrow(df) ) )
## Unsplit like this:
unsplit( split(df , f = rep_len(1:4, nrow(df) ) ) , f = rep_len(1:4,nrow(df) ) )
Hopefully this example illustrates why the error occurs and how to avoid it (i.e. use a proper splitting factor!).
希望这个示例说明了错误发生的原因以及如何避免错误(即使用适当的分解因子!)
## Want to split our data.frame into two halves, but rows not divisible by 2
df <- data.frame( x = runif(5) )
df
## Splitting still works but...
## We get a warning because the split factor 'f' was not recycled as a multiple of it's length
split( df , f = 1:2 )
#$`1`
# x
#1 0.6970968
#3 0.5614762
#5 0.5910995
#$`2`
# x
#2 0.6206521
#4 0.1798006
Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable
## Instead let's use the same split levels (1:2)...
## but make it equal to the length of the rows in the table:
splt <- rep_len( 1:2 , nrow(df) )
splt
#[1] 1 2 1 2 1
## Split works, and f is not recycled because there are
## the same number of values in 'f' as rows in the table
split( df , f = splt )
#$`1`
# x
#1 0.6970968
#3 0.5614762
#5 0.5910995
#$`2`
# x
#2 0.6206521
#4 0.1798006
## And unsplitting then works as expected and reconstructs our original data.frame
unsplit( split( df , f = splt ) , f = splt )
# x
#1 0.6970968
#2 0.6206521
#3 0.5614762
#4 0.1798006
#5 0.5910995
#2
0
In the R language 'split' example . . .
在R语言的“分割”的例子中…
aq <- airquality
g <- aq$Month
l <- split(aq,g)
After the 'scale' function is executed
在“scale”函数执行之后。
l <- lapply(l, transform, Ozone = scale(Ozone))
I am guessing that at one time in R history the function 'scale' did not add extra attributes to the column it is modifying.
我猜想,在R历史上的某个时候,函数的“scale”并没有向它所修改的列添加额外的属性。
..$ Ozone : num ...
.. ..- attr(*, "scaled:center")= num 29.4
.. ..- attr(*, "scaled:scale")= num 18.2
As seen in here . . .
如图所示。
> str(l)
List of 5
$ 5:'data.frame': 31 obs. of 6 variables:
..$ Ozone : num [1:31, 1] 0.782 0.557 -0.523 -0.253 NA ...
.. ..- attr(*, "scaled:center")= num 23.6
.. ..- attr(*, "scaled:scale")= num 22.2
..$ Solar.R: int [1:31] 190 118 149 313 NA NA 299 99 19 194 ...
..$ Wind : num [1:31] 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
..$ Temp : int [1:31] 67 72 74 62 56 66 65 59 61 69 ...
..$ Month : int [1:31] 5 5 5 5 5 5 5 5 5 5 ...
..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
$ 6:'data.frame': 30 obs. of 6 variables:
..$ Ozone : num [1:30, 1] NA NA NA NA NA ...
.. ..- attr(*, "scaled:center")= num 29.4
.. ..- attr(*, "scaled:scale")= num 18.2
..$ Solar.R: int [1:30] 286 287 242 186 220 264 127 273 291 323 ...
..$ Wind : num [1:30] 8.6 9.7 16.1 9.2 8.6 14.3 9.7 6.9 13.8 11.5 ...
..$ Temp : int [1:30] 78 74 67 84 85 79 82 87 90 87 ...
..$ Month : int [1:30] 6 6 6 6 6 6 6 6 6 6 ...
..$ Day : int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
$ 7:'data.frame': 31 obs. of 6 variables:
..$ Ozone : num [1:31, 1] 2.399 -0.32 -0.857 NA 0.154 ...
.. ..- attr(*, "scaled:center")= num 59.1
.. ..- attr(*, "scaled:scale")= num 31.6
..$ Solar.R: int [1:31] 269 248 236 101 175 314 276 267 272 175 ...
..$ Wind : num [1:31] 4.1 9.2 9.2 10.9 4.6 10.9 5.1 6.3 5.7 7.4 ...
..$ Temp : int [1:31] 84 85 81 84 83 83 88 92 92 89 ...
..$ Month : int [1:31] 7 7 7 7 7 7 7 7 7 7 ...
..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
$ 8:'data.frame': 31 obs. of 6 variables:
..$ Ozone : num [1:31, 1] -0.528 -1.284 -1.108 0.455 -0.629 ...
.. ..- attr(*, "scaled:center")= num 60
.. ..- attr(*, "scaled:scale")= num 39.7
..$ Solar.R: int [1:31] 83 24 77 NA NA NA 255 229 207 222 ...
..$ Wind : num [1:31] 6.9 13.8 7.4 6.9 7.4 4.6 4 10.3 8 8.6 ...
..$ Temp : int [1:31] 81 81 82 86 85 87 89 90 90 92 ...
..$ Month : int [1:31] 8 8 8 8 8 8 8 8 8 8 ...
..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
$ 9:'data.frame': 30 obs. of 6 variables:
..$ Ozone : num [1:30, 1] 2.674 1.928 1.721 2.467 0.644 ...
.. ..- attr(*, "scaled:center")= num 31.4
.. ..- attr(*, "scaled:scale")= num 24.1
..$ Solar.R: int [1:30] 167 197 183 189 95 92 252 220 230 259 ...
..$ Wind : num [1:30] 6.9 5.1 2.8 4.6 7.4 15.5 10.9 10.3 10.9 9.7 ...
..$ Temp : int [1:30] 91 92 93 93 87 84 80 78 75 73 ...
..$ Month : int [1:30] 9 9 9 9 9 9 9 9 9 9 ...
..$ Day : int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
But now it does add those attributes
但现在它确实添加了这些属性。
..$ Ozone : num ...
.. ..- attr(*, "scaled:center")= num 29.4
.. ..- attr(*, "scaled:scale")= num 18.2
and the very simple 'unsplit' function is not programmed to handle those attributes.
而非常简单的“unsplit”函数并不是用来处理这些属性的。
> unsplit(l,g)
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long
The (direct and simple) solution is to get rid of those attributes.
(直接和简单的)解决方案是去掉这些属性。
attributes(l[[1]]$Ozone) <- NULL
attributes(l[[2]]$Ozone) <- NULL
attributes(l[[3]]$Ozone) <- NULL
attributes(l[[4]]$Ozone) <- NULL
attributes(l[[5]]$Ozone) <- NULL
Then try to unsplit again.
然后再试着再次分开。
str( unsplit(l,g) )
> str( unsplit(l,g) )
'data.frame': 153 obs. of 6 variables:
$ Ozone : num 0.782 0.557 -0.523 -0.253 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
So, now it works.
所以,现在它的工作原理。
Andre Mikulec
安德烈Mikulec