Did some research on this and only found information on reading in multiple CSV files.
对此进行了一些研究,并且只找到了有关在多个CSV文件中阅读的信息。
I'm trying to create a widget where I can read in a CSV file with data sets and print as many graphs as there are data sets.
我正在尝试创建一个小部件,我可以在CSV文件中读取数据集并打印与数据集一样多的图形。
But I was trying to brainstorm a means of reading in a CSV with multiple data sets inputted vertically. However, I won't know the length of each data set and I won't know how many data sets would be present.
但是我试图用垂直输入的多个数据集在CSV中集思广益。但是,我不知道每个数据集的长度,我不知道会有多少数据集。
Any ideas or concepts to consider would be appreciated.
任何想要考虑的想法或概念将不胜感激。
2 个解决方案
#1
2
# Create sample data
unlink("so-data.csv") # remove it if it exists
set.seed(1492) # reproducible
# make 3 data frames of different lengths
frames <- lapply(c(3, 10, 5), function(n) {
data.frame(X = runif(n), Y1 = runif(n), Y2= runif(n))
})
# write them to single file preserving the header
suppressWarnings(
invisible(
lapply(frames, write.table, file="so-data.csv", sep=",", quote=FALSE,
append=TRUE, row.names=FALSE)
)
)
That file looks like:
该文件看起来像:
"X","Y1","Y2"
0.277646409813315,0.110495456494391,0.852662623859942
0.21606229362078,0.0521760624833405,0.510357670951635
0.184417578391731,0.00824321852996945,0.390395383816212
"X","Y1","Y2"
0.769067857181653,0.916519832098857,0.971386880846694
0.6415081594605,0.63678711745888,0.148033464793116
0.638599780155346,0.381162445060909,0.989824152784422
0.194932354846969,0.132614633999765,0.845784503268078
0.522090089507401,0.599085820373148,0.218151196138933
0.521618122234941,0.0903550288639963,0.983936473494396
0.792095972690731,0.932019826257601,0.703315682942048
0.12338977586478,0.584303047973663,0.421113619813696
0.343668724410236,0.561827397439629,0.111441049026325
0.660837838426232,0.345943035557866,0.0270762923173606
"X","Y1","Y2"
0.309987690066919,0.441982284653932,0.133840701542795
0.747786369873211,0.240106994053349,0.62044994905591
0.789473889162764,0.853503877297044,0.150850139558315
0.165826949058101,0.119402598123997,0.318282842403278
0.39083837531507,0.109747459646314,0.876092307968065
Now you can do:
现在你可以这样做:
# read in the data as lines
l <- readLines("so-data.csv")
# figure out where the individual data sets are
starts <- which(grepl("X", l))
ends <- c((starts[2:length(starts)]-1), length(l))
# read them in
new_frames <- mapply(function(start, end) {
read.csv(text=paste0(l[start:end], collapse="\n"), header=TRUE)
}, starts, ends, SIMPLIFY=FALSE)
str(new_frames)
## List of 3
## $ :'data.frame': 3 obs. of 3 variables:
## ..$ X : num [1:3] 0.278 0.216 0.184
## ..$ Y1: num [1:3] 0.1105 0.05218 0.00824
## ..$ Y2: num [1:3] 0.853 0.51 0.39
## $ :'data.frame': 10 obs. of 3 variables:
## ..$ X : num [1:10] 0.769 0.642 0.639 0.195 0.522 ...
## ..$ Y1: num [1:10] 0.917 0.637 0.381 0.133 0.599 ...
## ..$ Y2: num [1:10] 0.971 0.148 0.99 0.846 0.218 ...
## $ :'data.frame': 5 obs. of 3 variables:
## ..$ X : num [1:5] 0.31 0.748 0.789 0.166 0.391
## ..$ Y1: num [1:5] 0.442 0.24 0.854 0.119 0.11
## ..$ Y2: num [1:5] 0.134 0.62 0.151 0.318 0.876
#2
2
As @Oriol Mirosa mentioned in the comments, this is one way you can do it. You can first read the whole file:
正如@Oriol Mirosa在评论中提到的,这是你可以做到的一种方式。您可以先读取整个文件:
df = read.csv("path", header = TRUE)
Assuming below is how the whole csv file is structured:
假设以下是整个csv文件的结构:
df = data.frame(X=c(1:10, "X", 1:20, "X", 1:30),
Y=c(1:10, "Y", 1:20, "Y", 1:30),
Z=c(1:10, "Z", 1:20, "Z", 1:30))
df$newset = ifelse(df$X == "X", 1, 0)
df$newset = as.factor(cumsum(df$newset))
dfs = split(df, df$newset)
dfs[-1] = lapply(dfs[-1], function(x) x[-1,-ncol(x)])
dfs[[1]] = dfs[[1]][,-ncol(dfs[[1]])]
I created a binary variable newset
indicating whether a row is a "header". Then, used cumsum
to populate each "dataset" with a unique number. I then split()
on newset
to create a list of datasets with each element containing one. Finally, I removed the first row of each dataset and made them the column names as desired. This should work no matter the length of each dataset.
我创建了一个二进制变量newset,指示行是否为“标题”。然后,使用cumsum以唯一编号填充每个“数据集”。然后我在newset上split()创建一个数据集列表,每个元素包含一个。最后,我删除了每个数据集的第一行,并根据需要将它们作为列名。无论每个数据集的长度如何,这都应该有效。
Result:
# $`0`
# X Y Z
# 1 1 1 1
# 2 2 2 2
# 3 3 3 3
# 4 4 4 4
# 5 5 5 5
# 6 6 6 6
# 7 7 7 7
# 8 8 8 8
# 9 9 9 9
# 10 10 10 10
#
# $`1`
# X Y Z
# 12 1 1 1
# 13 2 2 2
# 14 3 3 3
# 15 4 4 4
# 16 5 5 5
# 17 6 6 6
# 18 7 7 7
# 19 8 8 8
# 20 9 9 9
# 21 10 10 10
# 22 11 11 11
# 23 12 12 12
# 24 13 13 13
# 25 14 14 14
# 26 15 15 15
# 27 16 16 16
# 28 17 17 17
# 29 18 18 18
# 30 19 19 19
# 31 20 20 20
#
# $`2`
# X Y Z
# 33 1 1 1
# 34 2 2 2
# 35 3 3 3
# 36 4 4 4
# 37 5 5 5
# 38 6 6 6
# 39 7 7 7
# 40 8 8 8
# 41 9 9 9
# 42 10 10 10
# 43 11 11 11
# 44 12 12 12
# 45 13 13 13
# 46 14 14 14
# 47 15 15 15
# 48 16 16 16
# 49 17 17 17
# 50 18 18 18
# 51 19 19 19
# 52 20 20 20
# 53 21 21 21
# 54 22 22 22
# 55 23 23 23
# 56 24 24 24
# 57 25 25 25
# 58 26 26 26
# 59 27 27 27
# 60 28 28 28
# 61 29 29 29
# 62 30 30 30
#1
2
# Create sample data
unlink("so-data.csv") # remove it if it exists
set.seed(1492) # reproducible
# make 3 data frames of different lengths
frames <- lapply(c(3, 10, 5), function(n) {
data.frame(X = runif(n), Y1 = runif(n), Y2= runif(n))
})
# write them to single file preserving the header
suppressWarnings(
invisible(
lapply(frames, write.table, file="so-data.csv", sep=",", quote=FALSE,
append=TRUE, row.names=FALSE)
)
)
That file looks like:
该文件看起来像:
"X","Y1","Y2"
0.277646409813315,0.110495456494391,0.852662623859942
0.21606229362078,0.0521760624833405,0.510357670951635
0.184417578391731,0.00824321852996945,0.390395383816212
"X","Y1","Y2"
0.769067857181653,0.916519832098857,0.971386880846694
0.6415081594605,0.63678711745888,0.148033464793116
0.638599780155346,0.381162445060909,0.989824152784422
0.194932354846969,0.132614633999765,0.845784503268078
0.522090089507401,0.599085820373148,0.218151196138933
0.521618122234941,0.0903550288639963,0.983936473494396
0.792095972690731,0.932019826257601,0.703315682942048
0.12338977586478,0.584303047973663,0.421113619813696
0.343668724410236,0.561827397439629,0.111441049026325
0.660837838426232,0.345943035557866,0.0270762923173606
"X","Y1","Y2"
0.309987690066919,0.441982284653932,0.133840701542795
0.747786369873211,0.240106994053349,0.62044994905591
0.789473889162764,0.853503877297044,0.150850139558315
0.165826949058101,0.119402598123997,0.318282842403278
0.39083837531507,0.109747459646314,0.876092307968065
Now you can do:
现在你可以这样做:
# read in the data as lines
l <- readLines("so-data.csv")
# figure out where the individual data sets are
starts <- which(grepl("X", l))
ends <- c((starts[2:length(starts)]-1), length(l))
# read them in
new_frames <- mapply(function(start, end) {
read.csv(text=paste0(l[start:end], collapse="\n"), header=TRUE)
}, starts, ends, SIMPLIFY=FALSE)
str(new_frames)
## List of 3
## $ :'data.frame': 3 obs. of 3 variables:
## ..$ X : num [1:3] 0.278 0.216 0.184
## ..$ Y1: num [1:3] 0.1105 0.05218 0.00824
## ..$ Y2: num [1:3] 0.853 0.51 0.39
## $ :'data.frame': 10 obs. of 3 variables:
## ..$ X : num [1:10] 0.769 0.642 0.639 0.195 0.522 ...
## ..$ Y1: num [1:10] 0.917 0.637 0.381 0.133 0.599 ...
## ..$ Y2: num [1:10] 0.971 0.148 0.99 0.846 0.218 ...
## $ :'data.frame': 5 obs. of 3 variables:
## ..$ X : num [1:5] 0.31 0.748 0.789 0.166 0.391
## ..$ Y1: num [1:5] 0.442 0.24 0.854 0.119 0.11
## ..$ Y2: num [1:5] 0.134 0.62 0.151 0.318 0.876
#2
2
As @Oriol Mirosa mentioned in the comments, this is one way you can do it. You can first read the whole file:
正如@Oriol Mirosa在评论中提到的,这是你可以做到的一种方式。您可以先读取整个文件:
df = read.csv("path", header = TRUE)
Assuming below is how the whole csv file is structured:
假设以下是整个csv文件的结构:
df = data.frame(X=c(1:10, "X", 1:20, "X", 1:30),
Y=c(1:10, "Y", 1:20, "Y", 1:30),
Z=c(1:10, "Z", 1:20, "Z", 1:30))
df$newset = ifelse(df$X == "X", 1, 0)
df$newset = as.factor(cumsum(df$newset))
dfs = split(df, df$newset)
dfs[-1] = lapply(dfs[-1], function(x) x[-1,-ncol(x)])
dfs[[1]] = dfs[[1]][,-ncol(dfs[[1]])]
I created a binary variable newset
indicating whether a row is a "header". Then, used cumsum
to populate each "dataset" with a unique number. I then split()
on newset
to create a list of datasets with each element containing one. Finally, I removed the first row of each dataset and made them the column names as desired. This should work no matter the length of each dataset.
我创建了一个二进制变量newset,指示行是否为“标题”。然后,使用cumsum以唯一编号填充每个“数据集”。然后我在newset上split()创建一个数据集列表,每个元素包含一个。最后,我删除了每个数据集的第一行,并根据需要将它们作为列名。无论每个数据集的长度如何,这都应该有效。
Result:
# $`0`
# X Y Z
# 1 1 1 1
# 2 2 2 2
# 3 3 3 3
# 4 4 4 4
# 5 5 5 5
# 6 6 6 6
# 7 7 7 7
# 8 8 8 8
# 9 9 9 9
# 10 10 10 10
#
# $`1`
# X Y Z
# 12 1 1 1
# 13 2 2 2
# 14 3 3 3
# 15 4 4 4
# 16 5 5 5
# 17 6 6 6
# 18 7 7 7
# 19 8 8 8
# 20 9 9 9
# 21 10 10 10
# 22 11 11 11
# 23 12 12 12
# 24 13 13 13
# 25 14 14 14
# 26 15 15 15
# 27 16 16 16
# 28 17 17 17
# 29 18 18 18
# 30 19 19 19
# 31 20 20 20
#
# $`2`
# X Y Z
# 33 1 1 1
# 34 2 2 2
# 35 3 3 3
# 36 4 4 4
# 37 5 5 5
# 38 6 6 6
# 39 7 7 7
# 40 8 8 8
# 41 9 9 9
# 42 10 10 10
# 43 11 11 11
# 44 12 12 12
# 45 13 13 13
# 46 14 14 14
# 47 15 15 15
# 48 16 16 16
# 49 17 17 17
# 50 18 18 18
# 51 19 19 19
# 52 20 20 20
# 53 21 21 21
# 54 22 22 22
# 55 23 23 23
# 56 24 24 24
# 57 25 25 25
# 58 26 26 26
# 59 27 27 27
# 60 28 28 28
# 61 29 29 29
# 62 30 30 30