用R从多个excel文件中提取某些数据

时间:2022-11-17 19:38:57

I import my data from multiple Excel files into R, and my data looks like this in R (there could be 100+ files each day):

我将我的数据从多个Excel文件导入R,我的数据在R中看起来像这样(每天可能有100多个文件):

> data
[[1]]
   ST Code Emp          Employee              Pay.Code Hours   Gross
1  AL 7229  65                 S                HOURLY  0.00    0.00
2  AL 7229  65                 S                SALARY  0.00 3060.00
3  AL 7229  65                 S              PER DIEM  0.00  765.00
4  AL 7229  65                 S EXPENSE REIMBURSEMENT  0.00   11.00
5  CA   42   2                 R                HOURLY 60.00  720.00
6  CA   42   2                 R              OVERTIME  3.25   58.50
7  CA   42   3                 A                HOURLY 80.00  800.00
8  CA   42   3                 A              OVERTIME  6.25   93.75
9  CA   42   4                 N                HOURLY 79.25  990.63
10 CA   42   4                 N              OVERTIME  7.00  131.25
11 CA   42   9                 P                HOURLY 32.00  352.00
12 CA   42   9                 P              OVERTIME  1.75   28.88
13 CA   42  10                 E                HOURLY 72.00  864.00
14 CA   42  10                 E              OVERTIME  5.00   90.00

[[2]]
   ST Code Employee Pay.Code    Gross
1 AL  7229       NA       NA  23954.0
2 AL  8380       NA       NA  11092.1
3  GA 7380       NA       NA  98142.0
4  GA 8380       NA       NA  11984.0
5  NC 7380       NA       NA 218129.0
6  NC 8380       NA       NA  27891.0
7  TN 7380       NA       NA  28441.0
8  TN 8380       NA       NA   8348.0

Now I'm trying to do is get code = "7229" as a single data set and export to a new excel file like this:

现在我想做的是将code =“7229”作为单个数据集导出并导出到新的excel文件,如下所示:

  > data

   ST Code Emp          Employee              Pay.Code Hours   Gross
1  AL 7229  65                 S                HOURLY  0.00    0.00
2  AL 7229  65                 S                SALARY  0.00 3060.00
3  AL 7229  65                 S              PER DIEM  0.00  765.00
4  AL 7229  65                 S EXPENSE REIMBURSEMENT  0.00   11.00
5  AL 7229  NA                                           NA  23954.0
6 AL  8380  NA                                           NA  11092.1

Is there any better ways to do this?

有没有更好的方法来做到这一点?

2 个解决方案

#1


1  

This should do the trick:

这应该是诀窍:

library(tidyverse)

df_list %>% 
  map_dfr(filter, Code == 7229) %>% 
  write_csv(path = "/INSERT/PATH/HERE/text.csv")

Here is the code with a reproducible example:

以下是具有可重现示例的代码:

df_1 <- tribble(
  ~ST,  ~Code, ~Emp, ~Employee, ~Pay.Code,               ~Hours, ~Gross,
  "AL", 7229,  65,   "S",       "HOURLY",                0.00,   0.00,
  "AL", 7229,  65,   "S",       "SALARY",                0.00,   3060.00,
  "AL", 7229,  65,   "S",       "PER DIEM",              0.00,   765.00,
  "AL", 7229,  65,   "S",       "EXPENSE REIMBURSEMENT", 0.00,   11.00,
  "CA", 42,    2,    "R",       "HOURLY",                60.00,  720.00,
  "CA", 42,    2,    "R",       "OVERTIME",              3.25,   58.50,
  "CA", 42,    3,    "A",       "HOURLY",                80.00,  800.00,
  "CA", 42,    3,    "A",       "OVERTIME",              6.25,   93.75,
  "CA", 42,    4,    "N",       "HOURLY",                79.25,  990.63,
  "CA", 42,    4,    "N",       "OVERTIME",              7.00,   131.25,
  "CA", 42,    9,    "P",       "HOURLY",                32.00,  352.00,
  "CA", 42,    9,    "P",       "OVERTIME",              1.75,   28.88,
  "CA", 42,    10,   "E",       "HOURLY",                72.00,  864.00,
  "CA", 42,    10,   "E",       "OVERTIME",              5.00,   90.00
)

df_2 <- tribble(
  ~ST, ~Code, ~Employee, ~Pay.Code,    ~Gross,
  "AL", 7229,       NA,       NA,  23954.0,
  "AL", 8380,       NA,       NA,  11092.1,
  "GA", 7380,       NA,       NA,  98142.0,
  "GA", 8380,       NA,       NA,  11984.0,
  "NC", 7380,       NA,       NA, 218129.0,
  "NC", 8380,       NA,       NA,  27891.0,
  "TN", 7380,       NA,       NA,  28441.0,
  "TN", 8380,       NA,       NA,   8348.0
)

df_list <- list(df_1, df_2)

df_list %>% 
  map_dfr(filter, Code == 7229) %>% 
  write_csv(path = "/INSERT/PATH/HERE/text.csv")

Which gives:

# A tibble: 5 x 7
     ST  Code   Emp Employee              Pay.Code Hours Gross
  <chr> <dbl> <dbl>    <chr>                 <chr> <dbl> <dbl>
1    AL  7229    65        S                HOURLY     0     0
2    AL  7229    65        S                SALARY     0  3060
3    AL  7229    65        S              PER DIEM     0   765
4    AL  7229    65        S EXPENSE REIMBURSEMENT     0    11
5    AL  7229    NA     <NA>                  <NA>    NA 23954

#2


1  

Try

df = do.call("rbind", data)

Then you'll have all your data in a single dataframe which you can filter on:

然后,您将所有数据都放在一个可以过滤的数据框中:

df[which(df$Code == 7229),]

#1


1  

This should do the trick:

这应该是诀窍:

library(tidyverse)

df_list %>% 
  map_dfr(filter, Code == 7229) %>% 
  write_csv(path = "/INSERT/PATH/HERE/text.csv")

Here is the code with a reproducible example:

以下是具有可重现示例的代码:

df_1 <- tribble(
  ~ST,  ~Code, ~Emp, ~Employee, ~Pay.Code,               ~Hours, ~Gross,
  "AL", 7229,  65,   "S",       "HOURLY",                0.00,   0.00,
  "AL", 7229,  65,   "S",       "SALARY",                0.00,   3060.00,
  "AL", 7229,  65,   "S",       "PER DIEM",              0.00,   765.00,
  "AL", 7229,  65,   "S",       "EXPENSE REIMBURSEMENT", 0.00,   11.00,
  "CA", 42,    2,    "R",       "HOURLY",                60.00,  720.00,
  "CA", 42,    2,    "R",       "OVERTIME",              3.25,   58.50,
  "CA", 42,    3,    "A",       "HOURLY",                80.00,  800.00,
  "CA", 42,    3,    "A",       "OVERTIME",              6.25,   93.75,
  "CA", 42,    4,    "N",       "HOURLY",                79.25,  990.63,
  "CA", 42,    4,    "N",       "OVERTIME",              7.00,   131.25,
  "CA", 42,    9,    "P",       "HOURLY",                32.00,  352.00,
  "CA", 42,    9,    "P",       "OVERTIME",              1.75,   28.88,
  "CA", 42,    10,   "E",       "HOURLY",                72.00,  864.00,
  "CA", 42,    10,   "E",       "OVERTIME",              5.00,   90.00
)

df_2 <- tribble(
  ~ST, ~Code, ~Employee, ~Pay.Code,    ~Gross,
  "AL", 7229,       NA,       NA,  23954.0,
  "AL", 8380,       NA,       NA,  11092.1,
  "GA", 7380,       NA,       NA,  98142.0,
  "GA", 8380,       NA,       NA,  11984.0,
  "NC", 7380,       NA,       NA, 218129.0,
  "NC", 8380,       NA,       NA,  27891.0,
  "TN", 7380,       NA,       NA,  28441.0,
  "TN", 8380,       NA,       NA,   8348.0
)

df_list <- list(df_1, df_2)

df_list %>% 
  map_dfr(filter, Code == 7229) %>% 
  write_csv(path = "/INSERT/PATH/HERE/text.csv")

Which gives:

# A tibble: 5 x 7
     ST  Code   Emp Employee              Pay.Code Hours Gross
  <chr> <dbl> <dbl>    <chr>                 <chr> <dbl> <dbl>
1    AL  7229    65        S                HOURLY     0     0
2    AL  7229    65        S                SALARY     0  3060
3    AL  7229    65        S              PER DIEM     0   765
4    AL  7229    65        S EXPENSE REIMBURSEMENT     0    11
5    AL  7229    NA     <NA>                  <NA>    NA 23954

#2


1  

Try

df = do.call("rbind", data)

Then you'll have all your data in a single dataframe which you can filter on:

然后,您将所有数据都放在一个可以过滤的数据框中:

df[which(df$Code == 7229),]