使用谷歌应用引擎上传和解析csv文件

时间:2021-06-02 23:17:13

I'm wondering if anyone with a better understanding of python and gae can help me with this. I am uploading a csv file from a form to the gae datastore.

我想知道是否有人对python和gae有更好的理解可以帮助我。我正在将一个csv文件从表单上传到gae数据存储区。

class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.get('csv_import')
     fileReader = csv.reader(csv_file)
     for row in fileReader:       
       self.response.out.write(row) 

I'm running into the same problem that someone else mentions here - http://groups.google.com/group/google-appengine/browse_thread/thread/bb2d0b1a80ca7ac2/861c8241308b9717

我遇到了其他人在这里提到的问题 - http://groups.google.com/group/google-appengine/browse_thread/thread/bb2d0b1a80ca7ac2/861c8241308b9717

That is, the csv.reader is iterating over each character and not the line. A google engineer left this explanation:

也就是说,csv.reader迭代每个字符而不是行。一位谷歌工程师留下了这样的解释:

The call self.request.get('csv') returns a String. When you iterate over a string, you iterate over the characters, not the lines. You can see the difference here:

调用self.request.get('csv')返回一个String。迭代字符串时,迭代字符而不是行。你可以在这里看到不同之处:

 class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     self.response.out.write(self.request.get('csv')) 
     file = open(os.path.join(os.path.dirname(__file__), 'sample.csv')) 
     self.response.out.write(file) 

     # Iterating over a file 
     fileReader = csv.reader(file) 
     for row in fileReader: 
       self.response.out.write(row) 

     # Iterating over a string 
     fileReader = csv.reader(self.request.get('csv')) 
     for row in fileReader: 
       self.response.out.write(row) 

I really don't follow the explanation, and was unsuccessful implementing it. Can anyone provide a clearer explanation of this and a proposed fix?

我真的不遵循解释,并没有成功实施它。任何人都可以提供更明确的解释和建议的修复?

Thanks, August

谢谢,八月

3 个解决方案

#1


13  

Short answer, try this:

简短的回答,试试这个:

fileReader = csv.reader(csv_file.split("\n"))

Long answer, consider the following:

答案很长,请考虑以下事项:

for thing in stuff:
  print thing.strip().split(",")

If stuff is a file pointer, each thing is a line. If stuff is a list, each thing is an item. If stuff is a string, each thing is a character.

如果stuff是文件指针,则每个东西都是一行。如果stuff是一个列表,那么每个东西都是一个项目。如果stuff是一个字符串,那么每个东西都是一个字符。

Iterating over the object returned by csv.reader is going to give you behavior similar to iterating over the object passed in, only with each item CSV-parsed. If you iterate over a string, you'll get a CSV-parsed version of each character.

迭代csv.reader返回的对象会给你类似于迭代传入的对象的行为,只对每个CSV解析的项目。如果迭代字符串,您将获得每个字符的CSV解析版本。

#2


8  

I can't think of a clearer explanation than what the Google engineer you mentioned said. So let's break it down a bit.

我想不出比你提到的谷歌工程师所说的更明确的解释。所以让我们分解一下吧。

The Python csv module operates on file-like objects, that is a file or something that behaves like a Python file. Hence, csv.reader() expects to get a file object as it's only required parameter.

Python csv模块在类文件对象上运行,这是一个文件或行为类似于Python文件的东西。因此,csv.reader()希望获得一个文件对象,因为它只是必需的参数。

The webapp.RequestHandler request object provides access to the HTTP parameters that are posted in the form. In HTTP, parameters are posted as key-value pairs, e.g., csv=record_one,record_two. When you invoke self.request.get('csv') this returns the value associated with the key csv as a Python string. A Python string is not a file-like object. Apparently, the csv module is falling-back when it does not understand the object and simply iterating it (in Python, strings can be iterated over by character, e.g., for c in 'Test String': print c will print each character in the string on a separate line).

webapp.RequestHandler请求对象提供对窗体中发布的HTTP参数的访问。在HTTP中,参数作为键值对发布,例如csv = record_one,record_two。当您调用self.request.get('csv')时,它会将与键csv关联的值作为Python字符串返回。 Python字符串不是类文件对象。显然,当csv模块不理解对象并简单地迭代它时(在Python中,字符串可以通过字符迭代,例如,对于'测试字符串'中的c来说,csv模块是回落的:print c将打印出每个字符。字符串在一个单独的行)。

Fortunately, Python provides a StringIO class that allows a string to be treated as a file-like object. So (assuming GAE supports StringIO, and there's no reason it shouldn't) you should be able to do this:

幸运的是,Python提供了一个StringIO类,允许将字符串视为类文件对象。所以(假设GAE支持StringIO,并且没有理由不应该这样做)你应该能够这样做:

class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     self.response.out.write(self.request.get('csv')) 

     # Iterating over a string as a file 
     stringReader = csv.reader(StringIO.StringIO(self.request.get('csv')))
     for row in stringReader: 
        self.response.out.write(row) 

Which will work as you expect it to.

哪个会像你期望的那样工作。

Edit I'm assuming that you are using something like a <textarea/> to collect the csv file. If you're uploading an attachment, different handling may be necessary (I'm not all that familiar with Python GAE or how it handles attachments).

编辑我假设您正在使用类似的东西来收集csv文件。如果您要上传附件,可能需要进行不同的处理(我不熟悉Python GAE或它如何处理附件)。</p>

#3


0  

You need to call csv_file = self.request.POST.get("csv_import") and not csv_file = self.request.get("csv_import").

您需要调用csv_file = self.request.POST.get(“csv_import”)而不是csv_file = self.request.get(“csv_import”)。

The second one just gives you a string as you mentioned in your original post. But accessing via self.request.POST.get gives you a cgi.FieldStorage object.

第二个只是给你一个你在原帖中提到的字符串。但是通过self.request.POST.get访问会给你一个cgi.FieldStorage对象。

This means that you can call csv_file.filename to get the object’s filename and csv_file.type to get the mimetype. Furthermore, if you access csv_file.file, it’s a StringO object (a read-only object from the StringIO module), not just a string. As ig0774 mentioned in his answer, the StringIO module allows you to treat a string as a file.

这意味着您可以调用csv_file.filename来获取对象的文件名,使用csv_file.type来获取mimetype。此外,如果访问csv_file.file,它是一个StringO对象(StringIO模块中的只读对象),而不仅仅是一个字符串。正如ig0774在他的回答中提到的,StringIO模块允许您将字符串视为文件。

Therefore, your code can simply be:

因此,您的代码可以简单地:

class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.POST.get('csv_import')
     fileReader = csv.reader(csv_file.file)
     for row in fileReader:
       # row is now a list containing all the column data in that row
       self.response.out.write(row)

#1


13  

Short answer, try this:

简短的回答,试试这个:

fileReader = csv.reader(csv_file.split("\n"))

Long answer, consider the following:

答案很长,请考虑以下事项:

for thing in stuff:
  print thing.strip().split(",")

If stuff is a file pointer, each thing is a line. If stuff is a list, each thing is an item. If stuff is a string, each thing is a character.

如果stuff是文件指针,则每个东西都是一行。如果stuff是一个列表,那么每个东西都是一个项目。如果stuff是一个字符串,那么每个东西都是一个字符。

Iterating over the object returned by csv.reader is going to give you behavior similar to iterating over the object passed in, only with each item CSV-parsed. If you iterate over a string, you'll get a CSV-parsed version of each character.

迭代csv.reader返回的对象会给你类似于迭代传入的对象的行为,只对每个CSV解析的项目。如果迭代字符串,您将获得每个字符的CSV解析版本。

#2


8  

I can't think of a clearer explanation than what the Google engineer you mentioned said. So let's break it down a bit.

我想不出比你提到的谷歌工程师所说的更明确的解释。所以让我们分解一下吧。

The Python csv module operates on file-like objects, that is a file or something that behaves like a Python file. Hence, csv.reader() expects to get a file object as it's only required parameter.

Python csv模块在类文件对象上运行,这是一个文件或行为类似于Python文件的东西。因此,csv.reader()希望获得一个文件对象,因为它只是必需的参数。

The webapp.RequestHandler request object provides access to the HTTP parameters that are posted in the form. In HTTP, parameters are posted as key-value pairs, e.g., csv=record_one,record_two. When you invoke self.request.get('csv') this returns the value associated with the key csv as a Python string. A Python string is not a file-like object. Apparently, the csv module is falling-back when it does not understand the object and simply iterating it (in Python, strings can be iterated over by character, e.g., for c in 'Test String': print c will print each character in the string on a separate line).

webapp.RequestHandler请求对象提供对窗体中发布的HTTP参数的访问。在HTTP中,参数作为键值对发布,例如csv = record_one,record_two。当您调用self.request.get('csv')时,它会将与键csv关联的值作为Python字符串返回。 Python字符串不是类文件对象。显然,当csv模块不理解对象并简单地迭代它时(在Python中,字符串可以通过字符迭代,例如,对于'测试字符串'中的c来说,csv模块是回落的:print c将打印出每个字符。字符串在一个单独的行)。

Fortunately, Python provides a StringIO class that allows a string to be treated as a file-like object. So (assuming GAE supports StringIO, and there's no reason it shouldn't) you should be able to do this:

幸运的是,Python提供了一个StringIO类,允许将字符串视为类文件对象。所以(假设GAE支持StringIO,并且没有理由不应该这样做)你应该能够这样做:

class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     self.response.out.write(self.request.get('csv')) 

     # Iterating over a string as a file 
     stringReader = csv.reader(StringIO.StringIO(self.request.get('csv')))
     for row in stringReader: 
        self.response.out.write(row) 

Which will work as you expect it to.

哪个会像你期望的那样工作。

Edit I'm assuming that you are using something like a <textarea/> to collect the csv file. If you're uploading an attachment, different handling may be necessary (I'm not all that familiar with Python GAE or how it handles attachments).

编辑我假设您正在使用类似的东西来收集csv文件。如果您要上传附件,可能需要进行不同的处理(我不熟悉Python GAE或它如何处理附件)。</p>

#3


0  

You need to call csv_file = self.request.POST.get("csv_import") and not csv_file = self.request.get("csv_import").

您需要调用csv_file = self.request.POST.get(“csv_import”)而不是csv_file = self.request.get(“csv_import”)。

The second one just gives you a string as you mentioned in your original post. But accessing via self.request.POST.get gives you a cgi.FieldStorage object.

第二个只是给你一个你在原帖中提到的字符串。但是通过self.request.POST.get访问会给你一个cgi.FieldStorage对象。

This means that you can call csv_file.filename to get the object’s filename and csv_file.type to get the mimetype. Furthermore, if you access csv_file.file, it’s a StringO object (a read-only object from the StringIO module), not just a string. As ig0774 mentioned in his answer, the StringIO module allows you to treat a string as a file.

这意味着您可以调用csv_file.filename来获取对象的文件名,使用csv_file.type来获取mimetype。此外,如果访问csv_file.file,它是一个StringO对象(StringIO模块中的只读对象),而不仅仅是一个字符串。正如ig0774在他的回答中提到的,StringIO模块允许您将字符串视为文件。

Therefore, your code can simply be:

因此,您的代码可以简单地:

class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.POST.get('csv_import')
     fileReader = csv.reader(csv_file.file)
     for row in fileReader:
       # row is now a list containing all the column data in that row
       self.response.out.write(row)