根据R中的选择标准从目录中读取选定的文件

时间:2022-08-03 19:51:43

I would like to read only selected .txt files in a folder to construct a giant table... I have over 9K files, and would like to import the files with the selected distance and building type, which is indicated in part of the file name.

我只希望在文件夹中读取选定的.txt文件来构建一个巨大的表……我有超过9K的文件,并且想要导入带有选择的距离和构建类型的文件,这是在文件名称的部分显示的。

For example, I want to first select files with name containing "_U0" and "_0_Final.txt":

例如,我想首先选择文件名包含“_U0”和“_0_Final.txt”的文件:

Type = c(0,1)
D3Test = 1
Distance = c(0,50,150,300,650,800)
D2Test = 1;

files <- list.files(path=data.folder, pattern=paste("*U", Type[D3Test],"*_",Distance[D2Test],"_Final.txt",sep=""))

But the result returned empty... Any problem with my construction?

但是结果是空的……我的构造有问题吗?

 filename <- scan(what="")
 "M10_F1_T1_D1_U0_H1_0_Final.txt"   "M10_F1_T1_D1_U0_H1_150_Final.txt" "M10_F1_T1_D1_U0_H1_300_Final.txt"
 "M10_F1_T1_D1_U0_H1_50_Final.txt"  "M10_F1_T1_D1_U0_H1_650_Final.txt" "M10_F1_T1_D1_U0_H1_800_Final.txt"
 "M10_F1_T1_D1_U0_H2_0_Final.txt"   "M10_F1_T1_D1_U0_H2_150_Final.txt" "M10_F1_T1_D1_U0_H2_300_Final.txt"
 "M10_F1_T1_D1_U0_H2_50_Final.txt"  "M10_F1_T1_D1_U0_H2_650_Final.txt" "M10_F1_T1_D1_U0_H2_800_Final.txt"
 "M10_F1_T1_D1_U0_H3_0_Final.txt"   "M10_F1_T1_D1_U0_H3_150_Final.txt" "M10_F1_T1_D1_U0_H3_300_Final.txt"
 "M10_F1_T1_D1_U0_H3_50_Final.txt"  "M10_F1_T1_D1_U0_H3_650_Final.txt" "M10_F1_T1_D1_U0_H3_800_Final.txt"
 "M10_F1_T1_D1_U1_H1_0_Final.txt"   "M10_F1_T1_D1_U1_H1_150_Final.txt" "M10_F1_T1_D1_U1_H1_300_Final.txt"
 "M10_F1_T1_D1_U1_H1_50_Final.txt"  "M10_F1_T1_D1_U1_H1_650_Final.txt" "M10_F1_T1_D1_U1_H1_800_Final.txt"

3 个解决方案

#1


2  

Another way would be to use sprintf and grepl.

另一种方法是使用sprintf和grepl。

x <- c("M10_F1_T1_D1_U0_H1_150_Final.txt", "M10_F1_T1_D1_U0_H2_650_Final.txt", "M10_F1_T1_D1_U1_H1_650_Final.txt")

x[grepl(sprintf("U%i_H%i_%i", 1, 1, 650), x)]

[1] "M10_F1_T1_D1_U1_H1_650_Final.txt"

#2


1  

You should look at the result that you are passing to pattern:

你应该看看你传递给模式的结果:

"*U0*_0_Final.txt"

It is not going to pick up any of those filenames. The asterisk is saying zero or more instances of "0" between "U" and the underscore. If Type and Distance are not represented by T and D in the file names, then this delivers the correct pattern:

它不会取任何这些文件名。星号表示在“U”和下划线之间的“0”或更多实例。如果文件名中没有用T和D表示类型和距离,那么这就提供了正确的模式:

grep( pattern=paste0("_U", Type[D3Test],".*_", Distance[D2Test],"_Final\\.txt"), filename)
#-----------
#[1]  1  7 13   So matches 3 filenames

Notice that you need to escape (with two backslashes) the periods that you want to be only periods because periods are special characters. You also need to use ".*" to allow a gap in the pattern.

请注意,您需要转义(使用两个反斜杠)您希望仅作为句点的句点,因为句点是特殊字符。你也需要使用"。允许在模式中有一个间隙。

#3


0  

files <- list.files(path=data.folder, pattern=paste("*U", Type[D3Test], "....",Distance[D2Test], sep=""))

I revised my code and this one works! Basically the idea is to use dot to present each character between Type[D3Test] and Distance[D2Test], since the characters between these two are fixed at 4.

我修改了我的代码,这一个有效!基本的想法是使用点来表示类型[D3Test]和距离[D2Test]之间的每个字符,因为这两个字符之间的字符在4处是固定的。

Thanks to: http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

感谢:http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

#1


2  

Another way would be to use sprintf and grepl.

另一种方法是使用sprintf和grepl。

x <- c("M10_F1_T1_D1_U0_H1_150_Final.txt", "M10_F1_T1_D1_U0_H2_650_Final.txt", "M10_F1_T1_D1_U1_H1_650_Final.txt")

x[grepl(sprintf("U%i_H%i_%i", 1, 1, 650), x)]

[1] "M10_F1_T1_D1_U1_H1_650_Final.txt"

#2


1  

You should look at the result that you are passing to pattern:

你应该看看你传递给模式的结果:

"*U0*_0_Final.txt"

It is not going to pick up any of those filenames. The asterisk is saying zero or more instances of "0" between "U" and the underscore. If Type and Distance are not represented by T and D in the file names, then this delivers the correct pattern:

它不会取任何这些文件名。星号表示在“U”和下划线之间的“0”或更多实例。如果文件名中没有用T和D表示类型和距离,那么这就提供了正确的模式:

grep( pattern=paste0("_U", Type[D3Test],".*_", Distance[D2Test],"_Final\\.txt"), filename)
#-----------
#[1]  1  7 13   So matches 3 filenames

Notice that you need to escape (with two backslashes) the periods that you want to be only periods because periods are special characters. You also need to use ".*" to allow a gap in the pattern.

请注意,您需要转义(使用两个反斜杠)您希望仅作为句点的句点,因为句点是特殊字符。你也需要使用"。允许在模式中有一个间隙。

#3


0  

files <- list.files(path=data.folder, pattern=paste("*U", Type[D3Test], "....",Distance[D2Test], sep=""))

I revised my code and this one works! Basically the idea is to use dot to present each character between Type[D3Test] and Distance[D2Test], since the characters between these two are fixed at 4.

我修改了我的代码,这一个有效!基本的想法是使用点来表示类型[D3Test]和距离[D2Test]之间的每个字符,因为这两个字符之间的字符在4处是固定的。

Thanks to: http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

感谢:http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/