I have a large dataframe with multiple columns (about 150).
There is a range of columns (Dx1, Dx2..until Dx30) which are diagnosis codes (the codes are numbers, but they are categorical variables that correspond to a medical diagnosis using the ICD-9 coding system).
我有一个包含多列的大型数据框(大约150个)。有一系列列(Dx1,Dx2..until Dx30)是诊断代码(代码是数字,但它们是与使用ICD-9编码系统的医疗诊断相对应的分类变量)。
I have working code to search a single column, but need to search all 30 columns to see if any of the columns contain a code within the specified range (DXrange).
我有工作代码来搜索单个列,但需要搜索所有30列,以查看是否有任何列包含指定范围内的代码(DXrange)。
The core dataframe looks like:
核心数据框如下所示:
Case DX1 DX2 DX3 DX4...DX30
1 123 345 567 99 12
2 234 345 NA NA NA
3 456 567 789 345 34
Here is the working code:
这是工作代码:
## Defines a range of codes to search for
DXrange <- factor(41000:41091, levels = levels(core$DX1))
## Search for the DXrange codes in column DX1.
core$IndexEvent <- core$DX1 %in% DXrange & substr(core$DX1, 5, 5) != 2
## What is the frequency of the IndexEvent?
cat("Frequency of IndexEvent : \n"); table(core$IndexEvent)
The working code is adapted from "Calculating Nationwide Readmissions Database (NRD) Variances, Report # 2017-01"
工作代码改编自“计算全国再入院数据库(NRD)差异,报告#2017-01”
I could run this for each DX column and then sum them for a final IndexEvent total, but this is not very efficient.
我可以为每个DX列运行它,然后将它们相加以获得最终的IndexEvent总数,但这不是很有效。
2 个解决方案
#1
1
I would first normalize my data, before searching in the codes, such as the following example:
在搜索代码之前,我首先要对数据进行规范化,例如以下示例:
set.seed(314)
df <- data.frame(id = 1:5,
DX1 = sample(1:10,5),
DX2 = sample(1:10,5),
DX3 = sample(1:10,5))
require(dplyr)
require(tidyr)
df %>%
gather(key,value,-id) %>%
filter(value %in% 1:2)
or with just base R
或只是基地R
df.long <- do.call(rbind,lapply(df[,2:4],function(x) data.frame(id = df$id, DX = x)))
df.long[df.long$DX %in% 1:2, ]
#2
1
We could use filter_at
with any_vars
我们可以将filter_at与any_vars一起使用
df %>%
filter_at(vars(matches("DX\\d+")), any_vars(. %in% DXrange))
where
DXrange <- 41000:41091
#1
1
I would first normalize my data, before searching in the codes, such as the following example:
在搜索代码之前,我首先要对数据进行规范化,例如以下示例:
set.seed(314)
df <- data.frame(id = 1:5,
DX1 = sample(1:10,5),
DX2 = sample(1:10,5),
DX3 = sample(1:10,5))
require(dplyr)
require(tidyr)
df %>%
gather(key,value,-id) %>%
filter(value %in% 1:2)
or with just base R
或只是基地R
df.long <- do.call(rbind,lapply(df[,2:4],function(x) data.frame(id = df$id, DX = x)))
df.long[df.long$DX %in% 1:2, ]
#2
1
We could use filter_at
with any_vars
我们可以将filter_at与any_vars一起使用
df %>%
filter_at(vars(matches("DX\\d+")), any_vars(. %in% DXrange))
where
DXrange <- 41000:41091