
时间:2021-06-02 23:17:13

I'm wondering if anyone with a better understanding of python and gae can help me with this. I am uploading a csv file from a form to the gae datastore.


class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.get('csv_import')
     fileReader = csv.reader(csv_file)
     for row in fileReader:       

I'm running into the same problem that someone else mentions here - http://groups.google.com/group/google-appengine/browse_thread/thread/bb2d0b1a80ca7ac2/861c8241308b9717

我遇到了其他人在这里提到的问题 - http://groups.google.com/group/google-appengine/browse_thread/thread/bb2d0b1a80ca7ac2/861c8241308b9717

That is, the csv.reader is iterating over each character and not the line. A google engineer left this explanation:


The call self.request.get('csv') returns a String. When you iterate over a string, you iterate over the characters, not the lines. You can see the difference here:


 class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     file = open(os.path.join(os.path.dirname(__file__), 'sample.csv')) 

     # Iterating over a file 
     fileReader = csv.reader(file) 
     for row in fileReader: 

     # Iterating over a string 
     fileReader = csv.reader(self.request.get('csv')) 
     for row in fileReader: 

I really don't follow the explanation, and was unsuccessful implementing it. Can anyone provide a clearer explanation of this and a proposed fix?


Thanks, August


3 个解决方案



Short answer, try this:


fileReader = csv.reader(csv_file.split("\n"))

Long answer, consider the following:


for thing in stuff:
  print thing.strip().split(",")

If stuff is a file pointer, each thing is a line. If stuff is a list, each thing is an item. If stuff is a string, each thing is a character.


Iterating over the object returned by csv.reader is going to give you behavior similar to iterating over the object passed in, only with each item CSV-parsed. If you iterate over a string, you'll get a CSV-parsed version of each character.




I can't think of a clearer explanation than what the Google engineer you mentioned said. So let's break it down a bit.


The Python csv module operates on file-like objects, that is a file or something that behaves like a Python file. Hence, csv.reader() expects to get a file object as it's only required parameter.

Python csv模块在类文件对象上运行,这是一个文件或行为类似于Python文件的东西。因此,csv.reader()希望获得一个文件对象,因为它只是必需的参数。

The webapp.RequestHandler request object provides access to the HTTP parameters that are posted in the form. In HTTP, parameters are posted as key-value pairs, e.g., csv=record_one,record_two. When you invoke self.request.get('csv') this returns the value associated with the key csv as a Python string. A Python string is not a file-like object. Apparently, the csv module is falling-back when it does not understand the object and simply iterating it (in Python, strings can be iterated over by character, e.g., for c in 'Test String': print c will print each character in the string on a separate line).

webapp.RequestHandler请求对象提供对窗体中发布的HTTP参数的访问。在HTTP中,参数作为键值对发布,例如csv = record_one,record_two。当您调用self.request.get('csv')时,它会将与键csv关联的值作为Python字符串返回。 Python字符串不是类文件对象。显然,当csv模块不理解对象并简单地迭代它时(在Python中,字符串可以通过字符迭代,例如,对于'测试字符串'中的c来说,csv模块是回落的:print c将打印出每个字符。字符串在一个单独的行)。

Fortunately, Python provides a StringIO class that allows a string to be treated as a file-like object. So (assuming GAE supports StringIO, and there's no reason it shouldn't) you should be able to do this:


class ProcessUpload(webapp.RequestHandler): 
   def post(self): 

     # Iterating over a string as a file 
     stringReader = csv.reader(StringIO.StringIO(self.request.get('csv')))
     for row in stringReader: 

Which will work as you expect it to.


Edit I'm assuming that you are using something like a <textarea/> to collect the csv file. If you're uploading an attachment, different handling may be necessary (I'm not all that familiar with Python GAE or how it handles attachments).

编辑我假设您正在使用类似的东西来收集csv文件。如果您要上传附件,可能需要进行不同的处理(我不熟悉Python GAE或它如何处理附件)。</p>



You need to call csv_file = self.request.POST.get("csv_import") and not csv_file = self.request.get("csv_import").

您需要调用csv_file = self.request.POST.get(“csv_import”)而不是csv_file = self.request.get(“csv_import”)。

The second one just gives you a string as you mentioned in your original post. But accessing via self.request.POST.get gives you a cgi.FieldStorage object.


This means that you can call csv_file.filename to get the object’s filename and csv_file.type to get the mimetype. Furthermore, if you access csv_file.file, it’s a StringO object (a read-only object from the StringIO module), not just a string. As ig0774 mentioned in his answer, the StringIO module allows you to treat a string as a file.


Therefore, your code can simply be:


class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.POST.get('csv_import')
     fileReader = csv.reader(csv_file.file)
     for row in fileReader:
       # row is now a list containing all the column data in that row



Short answer, try this:


fileReader = csv.reader(csv_file.split("\n"))

Long answer, consider the following:


for thing in stuff:
  print thing.strip().split(",")

If stuff is a file pointer, each thing is a line. If stuff is a list, each thing is an item. If stuff is a string, each thing is a character.


Iterating over the object returned by csv.reader is going to give you behavior similar to iterating over the object passed in, only with each item CSV-parsed. If you iterate over a string, you'll get a CSV-parsed version of each character.




I can't think of a clearer explanation than what the Google engineer you mentioned said. So let's break it down a bit.


The Python csv module operates on file-like objects, that is a file or something that behaves like a Python file. Hence, csv.reader() expects to get a file object as it's only required parameter.

Python csv模块在类文件对象上运行,这是一个文件或行为类似于Python文件的东西。因此,csv.reader()希望获得一个文件对象,因为它只是必需的参数。

The webapp.RequestHandler request object provides access to the HTTP parameters that are posted in the form. In HTTP, parameters are posted as key-value pairs, e.g., csv=record_one,record_two. When you invoke self.request.get('csv') this returns the value associated with the key csv as a Python string. A Python string is not a file-like object. Apparently, the csv module is falling-back when it does not understand the object and simply iterating it (in Python, strings can be iterated over by character, e.g., for c in 'Test String': print c will print each character in the string on a separate line).

webapp.RequestHandler请求对象提供对窗体中发布的HTTP参数的访问。在HTTP中,参数作为键值对发布,例如csv = record_one,record_two。当您调用self.request.get('csv')时,它会将与键csv关联的值作为Python字符串返回。 Python字符串不是类文件对象。显然,当csv模块不理解对象并简单地迭代它时(在Python中,字符串可以通过字符迭代,例如,对于'测试字符串'中的c来说,csv模块是回落的:print c将打印出每个字符。字符串在一个单独的行)。

Fortunately, Python provides a StringIO class that allows a string to be treated as a file-like object. So (assuming GAE supports StringIO, and there's no reason it shouldn't) you should be able to do this:


class ProcessUpload(webapp.RequestHandler): 
   def post(self): 

     # Iterating over a string as a file 
     stringReader = csv.reader(StringIO.StringIO(self.request.get('csv')))
     for row in stringReader: 

Which will work as you expect it to.


Edit I'm assuming that you are using something like a <textarea/> to collect the csv file. If you're uploading an attachment, different handling may be necessary (I'm not all that familiar with Python GAE or how it handles attachments).

编辑我假设您正在使用类似的东西来收集csv文件。如果您要上传附件,可能需要进行不同的处理(我不熟悉Python GAE或它如何处理附件)。</p>



You need to call csv_file = self.request.POST.get("csv_import") and not csv_file = self.request.get("csv_import").

您需要调用csv_file = self.request.POST.get(“csv_import”)而不是csv_file = self.request.get(“csv_import”)。

The second one just gives you a string as you mentioned in your original post. But accessing via self.request.POST.get gives you a cgi.FieldStorage object.


This means that you can call csv_file.filename to get the object’s filename and csv_file.type to get the mimetype. Furthermore, if you access csv_file.file, it’s a StringO object (a read-only object from the StringIO module), not just a string. As ig0774 mentioned in his answer, the StringIO module allows you to treat a string as a file.


Therefore, your code can simply be:


class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.POST.get('csv_import')
     fileReader = csv.reader(csv_file.file)
     for row in fileReader:
       # row is now a list containing all the column data in that row