将txt文件解析为字典以写入csv文件

Eprime outputs a .txt file like this:

Eprime输出一个.txt文件，如下所示：

*** Header Start ***
VersionPersist: 1
LevelName: Session
Subject: 7
Session: 1
RandomSeed: -1983293234
Group: 1
Display.RefreshRate: 59.654
*** Header End ***
    Level: 2
    *** LogFrame Start ***
    MeansEffectBias: 7
    Procedure: trialProc
    itemID: 7
    bias1Answer: 1
    *** LogFrame End ***
    Level: 2
    *** LogFrame Start ***
    MeansEffectBias: 2
    Procedure: trialProc
    itemID: 2
    bias1Answer: 0

I want to parse this and write it to a .csv file but with a number of lines deleted.

我想解析它并将其写入.csv文件，但删除了许多行。

I tried to create a dictionary that took the text appearing before the colon as the key and the text after as the value:

我试图创建一个字典，将冒号前面的文本作为键，然后将文本作为值：

 {subject: [7, 7], bias1Answer : [1, 0], itemID: [7, 2]}

def load_data(filename):
    data = {}
    eprime = open(filename, 'r')
    for line in eprime:
        rows = re.sub('\s+', ' ', line).strip().split(':')
        try:
            data[rows[0]] += rows[1]
        except KeyError:
            data[rows[0]] = rows[1]
    eprime.close()
    return data

for line in open(fileName, 'r'):
    if ':' in line:
        row = line.strip().split(':')
        fullDict[row[0]] = row[1]
print fullDict

both of the scripts below produce garbage:

以下两个脚本都会产生垃圾：

{'\x00\t\x00M\x00e\x00a\x00n\x00s\x00E\x00f\x00f\x00e\x00c\x00t\x00B\x00i\x00a\x00s\x00': '\x00 \x005\x00\r\x00', '\x00\t\x00B\x00i\x00a\x00s\x002\x00Q\x00.\x00D\x00u\x00r\x00a\x00t\x00i\x00o\x00n\x00E\x00r\x00r\x00o\x00r\x00': '\x00 \x00-\x009\x009\x009\x009\x009\x009\x00\r\x00'

If I could set up the dictionary, I can write it to a csv file that would look like this!!:

如果我可以设置字典，我可以将它写入一个看起来像这样的csv文件!!：

 Subject  itemID ... bias1Answer 
  7       7             1
  7       2             0

4 个解决方案

#1

You don't need to create dictionary.

您不需要创建字典。

import codecs
import csv

with codecs.open('eprime.txt', encoding='utf-16') as f, open('output.csv', 'w') as fout:
    writer = csv.writer(fout, delimiter='\t')
    writer.writerow(['Subject', 'itemID', 'bias1Answer'])
    for line in f:
        if ':' in line:
            value = line.split()[-1]

        if 'Subject:' in line:
            subject = value
        elif 'itemID:' in line:
            itemID = value
        elif 'bias1Answer:' in line:
            bias1Answer = value
            writer.writerow([subject, itemID, bias1Answer])

#2

Your second approach would work but value for each dictionary key should be a list. Currently for each key in the dictionary you are storing only one value as a result of which only the last value is getting stored. You can modify your code so that value for each key is a list. The below code would achieve same:

您的第二种方法可行，但每个字典键的值应该是一个列表。目前，对于字典中的每个键，您只存储一个值，因此只存储最后一个值。您可以修改代码，以便每个键的值都是一个列表。以下代码将实现相同：

for line in open(fileName, 'r'):
    if ':' in line:
        row = line.strip().split(':')
        # Use row[0] as a key, initiate its value
        # to be a list and add row[1] to the list. 
        # In case already a key 'row[0]'
        # exists append row[1] to the existing value list
        fullDict.setdefault(row[0],[]).append(row[1])
print fullDict

#3

Seems like Eprime outputs is encoded with utf-16..

似乎Eprime输出用utf-16编码..

>>> print '\x00\t\x00M\x00e\x00a\x00n\x00s\x00E\x00f\x00f\x00e\x00c\x00t\x00B\x00i\x00a\x00s\x00'.decode('utf-16-be')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/encodings/utf_16_be.py", line 16, in decode
    return codecs.utf_16_be_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x00 in position 32: truncated data
>>> print '\x00\t\x00M\x00e\x00a\x00n\x00s\x00E\x00f\x00f\x00e\x00c\x00t\x00B\x00i\x00a\x00s\x00'.decode('utf-16-be', 'ignore')
    MeansEffectBias

#4

I know this is an older question so maybe you have long since solved it but I think you are approaching this in a more complex way than is needed. I figure I'll respond in case someone else has the same problem and finds this.

我知道这是一个较老的问题，所以也许你早就解决了它，但我认为你正在以比所需更复杂的方式接近它。我想我会回应以防其他人有同样的问题，并发现这一点。

If you are doing things this way because you do not have a software key, it might help to know that the E-Merge and E-DataAid programs for eprime don't require a key. You only need the key for editing build files. Whoever provided you with the .txt files should probably have an install disk for these programs. If not, it is available on the PST website (I believe you need a serial code to create an account, but not certain)

如果你这样做是因为你没有软件密钥，那么知道eprime的E-Merge和E-DataAid程序不需要密钥可能会有所帮助。您只需要用于编辑构建文件的密钥。无论谁为您提供.txt文件，都应该有这些程序的安装盘。如果没有，它可以在PST网站上找到（我相信你需要一个序列代码来创建一个帐户，但不确定）

Eprime generally creates a .edat file that matches the content of the text file you have posted an example of. Sometimes though if eprime crashes you don't get the edat file and only have the .txt. Luckily you can generate the edat file from the .txt file.

Eprime通常会创建一个.edat文件，该文件与您发布的文本文件的内容相匹配。有时候，如果eprime崩溃你没有得到edat文件并且只有.txt。幸运的是，您可以从.txt文件生成edat文件。

Here's how I would approach this issue:

以下是我将如何处理此问题：

If you do not have the edat files available first use E-DataAid to recover the files.

如果您没有可用的edat文件，请首先使用E-DataAid恢复文件。
Then presuming you have multiple participants you can use E-Merge to merge all of the edat files together for all participants in who completed this task.

然后假设您有多个参与者，您可以使用E-Merge将所有edat文件合并到一起完成此任务的所有参与者。
Open the merged file. It might look a little chaotic depending on how much you have in the file. You can got to Go to tools->Arrange columns. This will show a list of all your variables.

打开合并文件。它可能看起来有点乱，取决于你在文件中有多少。您可以转到工具 - >排列列。这将显示所有变量的列表。
Adjust so that only the desired variables are in the right hand box. Hit ok.

调整，以便只在右侧框中显示所需的变量。点击确定。
Then you should have something resembling your end goal which can be exported as a csv.

然后你应该有类似你的最终目标的东西，可以导出为csv。

If you have many procedures in the program you might at this point have lines that just have startup info and NULL in the locations where your variables or interest are. You can fix this by going to tools->filter and creating a filter to eliminate those lines.

如果程序中有许多过程，那么此时可能只有启动信息的行和变量或兴趣所在位置的NULL。您可以通过转到tools-> filter并创建一个过滤器来消除这些行来解决这个问题。

#1