how would you do this in R? ( this is a data preparation task ) From the adverse events dataset, derive the treatment-emergent adverse events dataset: For each body system and preferred term, a row for each patient such that either:
你怎么在R? (这是一个数据准备任务)从不良事件数据集中,导出治疗 - 紧急不良事件数据集:对于每个身体系统和首选术语,每个患者的行,以便:
- That adverse event occurred in the post-baseline period but not the baseline period, or
- 该不良事件发生在基线后期但不是基线期,或
- Even though the event did occur in the baseline period , it occurred post-baseline at a higher severity than observed during baseline
- 即使事件确实发生在基线期,但它发生在基线后的严重程度高于基线期间观察到的严重程度
Variables:
severity = 1 , 2 , 3 (integer code for mild moderate severe)
patid visit bodysys prefterm
Baseline rows are rows such that visit<=2
Post baseline rows are rows such that visit>2
Here is the data prep in SAS , in 23 lines of code:
以下是SAS中的数据准备,有23行代码:
data base1_dset ;
set ae_dset ;
if visit<=2 ;
proc sort data=base1_dset ;
by patid bodysys prefterm severity ;
data base2_dset ;
set base1_dset ;
by patid bodysys prefterm severity ;
if last.prefterm ;
data post1_dset ;
set ae_dset ;
if visit> 2 ;
proc sort data=post1_dset ;
by patid bodysys prefterm severity ;
data post2_dset ;
set post1_dset ;
by patid bodysys prefterm severity ;
if last.prefterm ;
rename severity = severity2 ;
data new_ae_dset ;
merge base2_dset post2_dset ;
by patid bodysys prefterm ;
if severity2>severity or severity==. ;
And here is the data prep in Vilno Data Transformation , in 12 lines of code : ( for more see http://fivetimesfaster.blogspot.com )
这里是Vilno数据转换中的数据准备,有12行代码:(更多信息请参见http://fivetimesfaster.blogspot.com)
inlist ae_dset ;
if not ( visit<=2 ) deleterow ;
select severity=max(severity) by patid bodysys prefterm ;
sendoff(base2_dset) patid bodysys prefterm severity ;
inlist ae_dset ;
if not ( visit>2 ) deleterow ;
select severity2=max(severity) by patid bodysys prefterm ;
sendoff(post2_dset) patid bodysys prefterm severity2 ;
inlist base2_dset post2_dset ;
mergeby patid bodysys prefterm ;
if not ( severity2>severity or severity is null ) deleterow ;
sendoff(new_ae_dset) patid bodysys prefterm severity2 ;
How would you do this in R?
你会如何在R中做到这一点?
thanks , Robert
谢谢,罗伯特
PS the formatting of the code examples is horrendous, why is * ignoring some of my return/newline characters?
PS代码示例的格式是可怕的,为什么*忽略了我的一些返回/换行符?
1 个解决方案
#1
1
This seems to do more or less what you are asking (at least if the variables are numeric). There will be better ways
这似乎或多或少地与你所要求的一致(至少如果变量是数字的话)。会有更好的方法
smallvisit <- ae_dset[ ae_dset$visit <= 2, ]
bigvisit <- ae_dset[ ae_dset$visit > 2, ]
nams <- c("patid", "bodysys", "prefterm")
smallvisitsorted <- smallvisit[ do.call( order, smallvisit[nams] ), ]
smallvisitsplit <- split( smallvisitsorted, smallvisitsorted[nams], drop=TRUE )
last <- function(a){ tail( a, 1 ) }
smallvisitlast <- as.data.frame( t( sapply( smallvisitsplit, last ) ) )
mergedvisit <- merge( bigvisit, smallvisitlast, by=nams, all.x=TRUE )
new_ae_dset <- mergedvisit[ mergedvisit$severity.x > mergedvisit$severity.y |
is.na( mergedvisit$severity.y ) , ]
For example if ae_dset
looks like
例如,如果ae_dset看起来像
patid bodysys prefterm visit severity
1 5 9 2 1 3
2 22 1 5 5 2
3 11 2 9 3 3
4 11 2 9 2 2
5 22 3 3 3 1
6 3 4 6 1 2
7 22 3 3 2 2
8 22 3 3 4 3
9 11 2 9 1 1
10 3 3 6 5 2
11 4 3 7 7 3
then, using this code, new_ae_dset
will look
然后,使用此代码,new_ae_dset将会显示
patid bodysys prefterm visit.x severity.x visit.y severity.y
1 3 3 6 5 2 NA NA
2 4 3 7 7 3 NA NA
3 11 2 9 3 3 1 1
4 22 1 5 5 2 NA NA
6 22 3 3 4 3 2 2
#1
1
This seems to do more or less what you are asking (at least if the variables are numeric). There will be better ways
这似乎或多或少地与你所要求的一致(至少如果变量是数字的话)。会有更好的方法
smallvisit <- ae_dset[ ae_dset$visit <= 2, ]
bigvisit <- ae_dset[ ae_dset$visit > 2, ]
nams <- c("patid", "bodysys", "prefterm")
smallvisitsorted <- smallvisit[ do.call( order, smallvisit[nams] ), ]
smallvisitsplit <- split( smallvisitsorted, smallvisitsorted[nams], drop=TRUE )
last <- function(a){ tail( a, 1 ) }
smallvisitlast <- as.data.frame( t( sapply( smallvisitsplit, last ) ) )
mergedvisit <- merge( bigvisit, smallvisitlast, by=nams, all.x=TRUE )
new_ae_dset <- mergedvisit[ mergedvisit$severity.x > mergedvisit$severity.y |
is.na( mergedvisit$severity.y ) , ]
For example if ae_dset
looks like
例如,如果ae_dset看起来像
patid bodysys prefterm visit severity
1 5 9 2 1 3
2 22 1 5 5 2
3 11 2 9 3 3
4 11 2 9 2 2
5 22 3 3 3 1
6 3 4 6 1 2
7 22 3 3 2 2
8 22 3 3 4 3
9 11 2 9 1 1
10 3 3 6 5 2
11 4 3 7 7 3
then, using this code, new_ae_dset
will look
然后,使用此代码,new_ae_dset将会显示
patid bodysys prefterm visit.x severity.x visit.y severity.y
1 3 3 6 5 2 NA NA
2 4 3 7 7 3 NA NA
3 11 2 9 3 3 1 1
4 22 1 5 5 2 NA NA
6 22 3 3 4 3 2 2