In total I have 21 csv files which I would like to load to R. So I did:
我总共有21个csv文件,我想加载到R.所以我做了:
list_of_data = list.files(pattern="*.csv")
tbl_met = lapply(list_of_data, read.csv)
Can't give you the dput
because it's too much data...
无法给你输入,因为它的数据太多了......
What I want to do is to get a list off all names in first column in all datasets. Combined to one vector/list but there are 2 problems:
我想要做的是从所有数据集的第一列中获取所有名称的列表。结合到一个向量/列表但有2个问题:
-
first of all the columns in those files are separated by ";" or without any separation mark... Do I have to look inside those files and make them all separated in the same way ?
首先,这些文件中的列用“;”分隔或者没有任何分隔标记......我是否必须查看这些文件并以相同的方式将它们全部分开?
-
second problem is that there might be duplicates of names and I'd like to remove them from the list.
第二个问题是可能存在重复的名称,我想将它们从列表中删除。
Do you have any idea how to do that ? Should I provide you some more data ? If yes, let me know how to do that.
你知道怎么做吗?我应该为您提供更多数据吗?如果是,请告诉我如何做到这一点。
2 个解决方案
#1
1
I found the solution. Probably it's not the easiest one but it works. First of all I had to convert all of the csv files to the same pattern. Easy task with R.
我找到了解决方案。可能它不是最简单的,但它的工作原理。首先,我必须将所有csv文件转换为相同的模式。 R的简单任务
Later:
后来:
list_of_data = list.files(pattern="*.csv")
tbl_met = lapply(list_of_data, read.csv)
tbl <- rbindlist(tbl_met) ## binding all of the tables in the list by row
vec_names <- tbl$locus ## name of the column with names which I am interested in
vec <- unique(vec_names) ## removing the duplicates
Nicely done!
做得很好!
#2
0
I am a little sceptical about the fact that in some files there are no separation marks. How would you separate the columns? Are all the column names the same at least?
我有点怀疑在某些文件中没有分隔标记的事实。你会如何分隔列?是否所有列名至少相同?
But can you try this and see if it gives anything?
但你可以尝试一下,看看它是否有任何意义吗?
library(data.table)
list_of_data = list.files(pattern="*.csv")
tbl_met = lapply(list_of_data, fread)
DT=rbindlist(l=tbl_met,use.names = FALSE);
print(unique(DT[,1,with=FALSE]))
Thanks
谢谢
#1
1
I found the solution. Probably it's not the easiest one but it works. First of all I had to convert all of the csv files to the same pattern. Easy task with R.
我找到了解决方案。可能它不是最简单的,但它的工作原理。首先,我必须将所有csv文件转换为相同的模式。 R的简单任务
Later:
后来:
list_of_data = list.files(pattern="*.csv")
tbl_met = lapply(list_of_data, read.csv)
tbl <- rbindlist(tbl_met) ## binding all of the tables in the list by row
vec_names <- tbl$locus ## name of the column with names which I am interested in
vec <- unique(vec_names) ## removing the duplicates
Nicely done!
做得很好!
#2
0
I am a little sceptical about the fact that in some files there are no separation marks. How would you separate the columns? Are all the column names the same at least?
我有点怀疑在某些文件中没有分隔标记的事实。你会如何分隔列?是否所有列名至少相同?
But can you try this and see if it gives anything?
但你可以尝试一下,看看它是否有任何意义吗?
library(data.table)
list_of_data = list.files(pattern="*.csv")
tbl_met = lapply(list_of_data, fread)
DT=rbindlist(l=tbl_met,use.names = FALSE);
print(unique(DT[,1,with=FALSE]))
Thanks
谢谢