I have a matrix (first.transactions.data) with two columns id and date and 12499 rows.
我有一个矩阵(first.transactions.data),其中包含两列id和date以及12499行。
id date
1 19164958 2001-09-01
2 39244924 2001-11-01
3 39578413 2001-09-01
4 40992265 2001-11-01
5 43061957 2001-09-01
6 47196850 2001-11-01
7 51236987 2001-11-01
8 51326773 2001-09-01
9 54271247 2001-09-01
10 70765025 2001-09-01
11 70781923 2001-09-01
12 70782614 2001-09-01
13 70797166 2001-09-01
14 70992941 2001-09-01
15 70995813 2001-09-01
Now I want to write a function that can divide this matrix in equally long sub-matrices n. E.g with n = 3 a matrix 1/A that contains rows 1 to 5 a second matrix 2/B which contains rows 6 to 10 and a last matrix 3/C containing rows 11 to 15.
现在我想写一个函数,可以在同样长的子矩阵n中划分这个矩阵。例如,在n = 3的情况下,矩阵1 / A包含行1至5,第二矩阵2 / B包含行6至10,最后一个矩阵3 / C包含行11至15。
I've tried using split or cut but I encounter several problems with them. E.g.
我尝试过使用拆分或切割,但我遇到了几个问题。例如。
sub <- split(first.transactions.data, cut(first.transactions.data$id, 10))
Results in:
结果是:
$`(1.91e+07,2.61e+07]`
id date
1: 19164958 2001-09-01
$`(2.61e+07,3.3e+07]`
Empty data.table (0 rows) of 2 cols: id,date
$`(3.3e+07,4e+07]`
id date
1: 39244924 2001-11-01
2: 39578413 2001-09-01
$`(4e+07,4.7e+07]`
id date
1: 40992265 2001-11-01
2: 43061957 2001-09-01
or sub <- split(first.transactions.data, sample(rep(1:29, 431)))
或sub < - split(first.transactions.data,sample(rep(1:29,431)))
yields:
收益率:
$`1`
id date
1: 71189663 2001-09-01
2: 71307343 2001-09-01
3: 71361917 2001-09-01
4: 71410408 2001-09-01
5: 71518508 2001-09-01
---
427: 88698009 2002-01-01
428: 88698658 2002-01-01
429: 88700541 2002-01-01
430: 88700697 2002-01-01
431: 88701106 2002-01-01
$`2`
id date
1: 71172578 2001-09-01
2: 71608016 2001-09-01
3: 71647277 2001-09-01
4: 71834223 2001-09-01
5: 71998882 2001-09-01
---
427: 88702992 2002-01-01
428: 88703276 2002-01-01
429: 88703439 2002-01-01
430: 88704952 2002-01-01
431: 88705136 2002-01-01
The first command doesn't output equally long parts (I think its using quantiles and not number of observations). The second command seems to subset the matrix in random observations of the originating matrix. Additionally, I have to specify into how many parts I want to divide and how long the sub sets are going to be. Finally, I don't know how to access the content of each sub-matrix.
第一个命令不输出相同长的部分(我认为它使用分位数而不是观测数)。第二个命令似乎是在始发矩阵的随机观察中对矩阵进行子集化。另外,我必须指定我想要分割的部分数量以及子集的长度。最后,我不知道如何访问每个子矩阵的内容。
I want to create those sub-matrices to use them as cohorts. With the cohorts I later want to check in the full data set how many of the IDs are still alive in later periods to calculate the individual's retention rate by cohort.
我想创建这些子矩阵以将它们用作同类群组。对于同类群组,我后来想要检查完整数据集中有多少ID在以后的时间段内仍然存在,以便通过群组计算个体的保留率。
Can I use the commands split and cut for this, do I need others or is my approach even infeasible in R?
我可以使用拆分和切割的命令,我需要其他人还是我的方法在R中是不可行的?
Thank you very much for your time and help.
非常感谢您的时间和帮助。
Patrik
帕特里克
PS: Sorry for my presentation of the matrix. I can't figure out how to edit it properly.
PS:对不起我对矩阵的介绍。我无法弄清楚如何正确编辑它。
1 个解决方案
#1
1
You indeed need split
:
你确实需要拆分:
split(first.transactions.data, rep(1:3, each = 5))
(adjust numbers to suit your needs, maybe make them nrow
-dependent)
(调整数字以满足您的需求,可能会使它们无法依赖)
#1
1
You indeed need split
:
你确实需要拆分:
split(first.transactions.data, rep(1:3, each = 5))
(adjust numbers to suit your needs, maybe make them nrow
-dependent)
(调整数字以满足您的需求,可能会使它们无法依赖)