I hope this is not trivial but I am wondering the following:
我希望这不是小事,但我想知道的是:
If I have a specific folder with n csv
files, how could I iteratively read all of them, one at a time, and perform some calculations on their values?
如果我有一个包含n个csv文件的特定文件夹,我如何迭代地读取它们,一次读取一个,并对它们的值执行一些计算?
For a single file, for example, I do something like this and perform some calculations on the x
array:
例如,对于单个文件,我做这样的事情,并对x数组执行一些计算:
import csv
import os
directoryPath=raw_input('Directory path for native csv file: ')
csvfile = numpy.genfromtxt(directoryPath, delimiter=",")
x=csvfile[:,2] #Creates the array that will undergo a set of calculations
I know that I can check how many csv
files there are in a given folder (check here):
我知道我可以检查给定文件夹中有多少csv文件(请参阅这里):
import glob
for files in glob.glob("*.csv"):
print files
But I failed to figure out how to possibly nest the numpy.genfromtxt()
function in a for loop, so that I read in all the csv files of a directory that it is up to me to specify.
但是我没能弄清楚如何将numpi .genfromtxt()函数嵌套到for循环中,这样我就可以读取一个目录的所有csv文件,具体由我来指定。
EDIT
编辑
The folder I have only has jpg
and csv
files. The latter are named eventX.csv
, where X ranges from 1 to 50. The for
loop I am referring to should therefore consider the file names the way they are.
我的文件夹只有jpg和csv文件。后者被命名为eventX。csv, X在1到50之间。因此,我所指的for循环应该考虑文件名的方式。
3 个解决方案
#1
10
That's how I'd do it:
我就是这么做的:
import os
directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
f=open(file, 'r')
# perform calculation
f.close()
#2
4
I think you look for something like this
我想你应该找这样的东西。
import glob
for file_name in glob.glob(directoryPath+'*.csv'):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
Edit
编辑
If you want to get all csv
files from a folder (including subfolder) you could use subprocess
instead of glob
(note that this code only works on linux systems)
如果您想从一个文件夹(包括子文件夹)中获取所有的csv文件,您可以使用子进程而不是glob(请注意,此代码只适用于linux系统)
import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]
for i,file_name in enumerate(file_list):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
# now you can use i as an index
It first searches the folder and sub-folders for all file_names using the find
command from the shell and applies your calculations afterwards.
它首先使用shell中的find命令搜索文件夹和子文件夹中的所有文件名称,然后应用您的计算。
#3
2
According to the documentation of numpy.genfromtxt()
, the first argument can be a
根据numpy.genfromtxt()的文档,第一个参数可以是a
File, filename, or generator to read.
要读取的文件、文件名或生成器。
That would mean that you could write a generator that yields the lines of all the files like this:
这意味着您可以编写一个生成器,生成所有这样的文件的行:
def csv_merge_generator(pattern):
for file in glob.glob(pattern):
for line in file:
yield line
# then using it like this
numpy.genfromtxt(csv_merge_generator('*.csv'))
should work. (I do not have numpy installed, so cannot test easily)
应该工作。(我没有安装numpy,所以不容易测试)
#1
10
That's how I'd do it:
我就是这么做的:
import os
directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
f=open(file, 'r')
# perform calculation
f.close()
#2
4
I think you look for something like this
我想你应该找这样的东西。
import glob
for file_name in glob.glob(directoryPath+'*.csv'):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
Edit
编辑
If you want to get all csv
files from a folder (including subfolder) you could use subprocess
instead of glob
(note that this code only works on linux systems)
如果您想从一个文件夹(包括子文件夹)中获取所有的csv文件,您可以使用子进程而不是glob(请注意,此代码只适用于linux系统)
import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]
for i,file_name in enumerate(file_list):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
# now you can use i as an index
It first searches the folder and sub-folders for all file_names using the find
command from the shell and applies your calculations afterwards.
它首先使用shell中的find命令搜索文件夹和子文件夹中的所有文件名称,然后应用您的计算。
#3
2
According to the documentation of numpy.genfromtxt()
, the first argument can be a
根据numpy.genfromtxt()的文档,第一个参数可以是a
File, filename, or generator to read.
要读取的文件、文件名或生成器。
That would mean that you could write a generator that yields the lines of all the files like this:
这意味着您可以编写一个生成器,生成所有这样的文件的行:
def csv_merge_generator(pattern):
for file in glob.glob(pattern):
for line in file:
yield line
# then using it like this
numpy.genfromtxt(csv_merge_generator('*.csv'))
should work. (I do not have numpy installed, so cannot test easily)
应该工作。(我没有安装numpy,所以不容易测试)