将csv文件中的特定行读入R

时间:2021-12-27 18:13:06

I have a large csv file that I need to read into R. However, I only need observations with specific variable values (i.e. with certain dates). Is there a way you can do that from the onset without the need to read the entire file and then subsetting?

我有一个很大的csv文件需要读到r中,但是,我只需要有特定的变量值(例如有特定的日期)的观察值。是否有一种方法可以从一开始就做到这一点,而不需要读取整个文件然后进行子设置?

1 个解决方案

#1


2  

Assuming the dates are in the first column of your data set (and you are on a Unix-like machine), you could do something like this:

假设日期在数据集的第一列(并且您在一个类unix的机器上),您可以这样做:

dates <- paste0(c("2015-06-01", "2015-06-16"), collapse = "|")
expr <- paste0("grep -E '(", dates, "),.+' tmpcsv.csv", collapse = "")
##
R> data.table::fread(expr)
           V1         V2
1: 2015-06-16 -1.6866933
2: 2015-06-16  1.3686023
3: 2015-06-01 -0.2257710
4: 2015-06-16 -1.0185754
5: 2015-06-01  0.3035286
6: 2015-06-01  2.0500847
7: 2015-06-01 -0.4910312

If not, you will have to modify the regular expression accordingly.

如果不是,则必须相应地修改正则表达式。


Data:

数据:

set.seed(123)
##
df <- data.frame(
  Date = Sys.Date() + floor(50*round(runif(50, -1, 1), 1)),
  Value = rnorm(50)
)
write.csv(df, file = "tmpcsv.csv", row.names = FALSE)
##

#1


2  

Assuming the dates are in the first column of your data set (and you are on a Unix-like machine), you could do something like this:

假设日期在数据集的第一列(并且您在一个类unix的机器上),您可以这样做:

dates <- paste0(c("2015-06-01", "2015-06-16"), collapse = "|")
expr <- paste0("grep -E '(", dates, "),.+' tmpcsv.csv", collapse = "")
##
R> data.table::fread(expr)
           V1         V2
1: 2015-06-16 -1.6866933
2: 2015-06-16  1.3686023
3: 2015-06-01 -0.2257710
4: 2015-06-16 -1.0185754
5: 2015-06-01  0.3035286
6: 2015-06-01  2.0500847
7: 2015-06-01 -0.4910312

If not, you will have to modify the regular expression accordingly.

如果不是,则必须相应地修改正则表达式。


Data:

数据:

set.seed(123)
##
df <- data.frame(
  Date = Sys.Date() + floor(50*round(runif(50, -1, 1), 1)),
  Value = rnorm(50)
)
write.csv(df, file = "tmpcsv.csv", row.names = FALSE)
##