Python newb:什么阻止了这个功能的印刷?

Background

I haven't worked much with python, but I want to use it to generate some repetitive XML for me. Right now, I just want to parse CSV, then pass those values into the XML stanzas.

我没有使用python做很多工作，但是我想用它来为我生成一些重复的XML。现在，我只想解析CSV，然后将这些值传递到XML节中。

There's a catch: I need to rewrite some of the CSV before I write the XML. I have some if statements to take care of this for me, and I decided to reduce clutter by moving it to a separate function.

这里有一个问题:在编写XML之前，我需要重写一些CSV。我有一些if语句来处理这个问题，我决定通过将它移动到一个单独的函数来减少混乱。

This is where my problem arises. My writeTypes function appears to work as intended but when I return the re-written csvDict instance, I can no longer print values.

这就是我的问题所在。我的writeTypes函数似乎可以正常工作，但是当我返回重写的csvDict实例时，我再也不能打印值。

Clearly I am missing something, probably simple - but what? Script with comments below.

很明显，我错过了一些东西，可能很简单——但是什么呢?下面的脚本与评论。

Script

import csv

def parseCSV(vals):

    # read the csv

    dictReader = csv.DictReader(open(vals, 'rb'), fieldnames=['name', 'type', 'nullable', 'default', 'description', '#'], delimiter=',', quotechar='"')

    # some repetitive xml; I will finish this portion later...

    stanza = '''
    <var name="{0}" precision="1" scale="None" type="{1}">
        <label>{2}</label>
        <definition><![CDATA[@{3}({4})]]></definition>
    </var>'''

    # a function that simply writes new values to dictionary entries 

    writeTypes(dictReader)

    # I'm confused here - nothing is printed to the console. 
    # If i comment my 'writeTypes function, prints as expected

    for i in dictReader:
        print i
        print i['type']


# function to rewrite 'types' key in dictionary set
def writeTypes(d):

    for i in d:
        if i['type'] == 'text':
            i['type'] = 't'
        elif i['type'] == 'boolean':
            i['type'] = 'l'
        elif i['type'] == 'double precision':
            i['type'] = 'd'
        elif i['type'] == 'integer':
            i['type'] = 'i'
        else:
            i['type'] = i['type']

         # unsurprisingly, this function does seem to print the correct values    
        print i

    # it seems as though there's something wrong with this return statement...
    return d

Example CSV

(public data pulled from .gov site)

(公共数据来自。gov网站)

Name,Type,Nullable,Default,Description,#
control,text,true,,,1,false
flagship,boolean,true,,,1,false
groupid,text,true,,,1,false
hbcu,text,true,,,1,false
hsi,text,true,,,1,false
iclevel,text,true,,,1,false
landgrnt,text,true,,,1,false
matched_n_00_10_11,boolean,true,,,1,false
matched_n_05_10_6,boolean,true,,,1,false
matched_n_87_10_24,boolean,true,,,1,false
name,text,true,,,1,false
name_short,text,true,,,1,false
school,text,true,,,1,false
sector,text,true,,,1,false
sector_revised,text,true,,,1,false
top_50,boolean,true,,,1,false
virginia,boolean,true,,,1,false

2 个解决方案

#1

@Jefftopia, the problem is that your first use of dictReader as an iterator "consumes" the whole file, so that there's nothing left to read when you try to iterate through it a second time.

@Jefftopia的问题是，您第一次使用dictReader作为迭代器“消耗”了整个文件，所以当您试图第二次遍历它时，就没有什么可读的了。

When you do this...

当你这样做……

# a function that simply writes new values to dictionary entries 

writeTypes(dictReader)

... the writeTypes function iterates through the rows of the CSV file, by way of dictReader:

…writeTypes函数通过dictReader对CSV文件的行进行迭代:

def writeTypes(d):
    for i in d:
        ...

Then you return from that function and try to iterate through dictReader again. The problem is that dictReader now has no data left to read from the underlying file, since it's gone through the whole thing already!

然后您从该函数返回，并尝试再次遍历dictReader。问题是，dictReader现在没有剩下可从底层文件读取的数据，因为它已经遍历了整个过程!

# I'm confused here - nothing is printed to the console. 
# If i comment my 'writeTypes function, prints as expected

for i in dictReader:
    print i
    print i['type']

When you use a file object or most similar objects as an iterator in Python, the iterator "consumes" the file. As a general rule, there's no way to reliably read a file-like object and then go back to the beginning to read it a second time (consider the case of a network socket which may stream data only once).

当您在Python中使用文件对象或最相似的对象作为迭代器时，迭代器“消费”文件。一般来说，没有办法可靠地读取类文件对象，然后再从头读取它一次(考虑一下网络套接字，它可能只流一次数据)。

In this particular case, you could simply re-open the file a second time before the second pass through the data. (There are even more kludge-y solutions, but I won't show 'em.)

在这种情况下，只需在第二次通过数据之前重新打开文件。(还有更多的不完善的解决方案，但我不展示它们。)

# reopen the file in order to read through it a second time
dictReader = csv.DictReader(open(vals, 'rb'), fieldnames=['name', 'type', 'nullable', 'default', 'description', '#'], delimiter=',', quotechar='"')
for i in dictReader:
    print i
    print i['type']

Multi-pass file processing can sometimes substantially simplify code like this, although it can hurt performance as well for large files. In this particular case, it'd be straightforward to do everything in one pass; you can simply rewrite the code slightly so as to gather the type fields as you iterate through the rows.

多路文件处理有时可以极大地简化这样的代码，尽管它也会损害大型文件的性能。在这种特殊的情况下，一次做完所有的事情是很简单的;您可以稍微重写代码，以便在遍历行时收集类型字段。

#2

dictReader is an iterator, and once it's read through the CSV file, it's exhausted: a further iteration will not do anything.

dictReader是一个迭代器，一旦它读过CSV文件，它就会精疲力尽:进一步的迭代将不会有任何作用。

The way to fix this is to create a new list of dicts in writeTypes, so that you assign the values there rather than in the original. You can then return that list, and iterate through that in the main function.

解决这个问题的方法是在writeTypes中创建一个新的dicts列表，这样您就可以在那里指定值而不是在原来的值中。然后可以返回该列表，并在main函数中进行迭代。

#1