I'm trying to extract subject-verb-object triplets and then attach an ID. I am using a loop so my list of extracted triplets keeping the results for the rows were no triplet was found. So it looks like:
我正在尝试提取subject-verb-object三元组,然后附加一个ID。我正在使用一个循环,所以我的提取三元组列表保持行的结果是没有三元组被发现。所以它看起来像:
[]
[trump,carried,energy]
[]
[clinton,doesn't,trust]
When I print mylist it looks as expected.
当我打印mylist时,它看起来像预期的那样。
However when I try and create a dataframe from mylist I get an error caused by the empty rows
但是当我尝试从mylist创建一个数据帧时,我得到一个由空行引起的错误
`IndexError: list index out of range`.
I tried to include an if statement to avoid this but the problem is the same. I also tried using reindex instead but the df2 came out empty.
我试图包含一个if语句来避免这种情况,但问题是一样的。我也尝试使用reindex,但df2出来了。
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import spacy
import textacy
import csv, string, re
import numpy as np
import pandas as pd
#Import csv file with pre-processing already carried out
import pandas as pd
df = pd.read_csv("pre-processed_file_1.csv", sep=",")
#Prepare dataframe to be relevant columns and unicode
df1 = df[['text_1', 'id']].copy()
import StringIO
s = StringIO.StringIO()
tweets = df1.to_csv(encoding='utf-8');
nlp = spacy.load('en')
count = 0;
df2 = pd.DataFrame();
for row in df1.iterrows():
doc = nlp(unicode(row));
text_ext = textacy.extract.subject_verb_object_triples(doc);
tweetID = df['id'].tolist();
mylist = list(text_ext)
count = count + 1;
if (mylist):
df2 = df2.append(mylist, ignore_index=True)
else:
df2 = df2.append('0','0','0')
Any help would be very appreciated. Thank you!
任何帮助将非常感激。谢谢!
1 个解决方案
#1
0
You're supposed to pass a DataFrame-shaped object to append
. Passing the raw data doesn't work. So df2=df2.append([['0','0','0']],ignore_index=True)
你应该传递一个DataFrame形状的对象来追加。传递原始数据不起作用。所以df2 = df2.append([['0','0','0']],ignore_index = True)
You can also wrap your processing in a function process_row
, then do df2 = pd.DataFrame([process_row(row) for row in df1.iterrows()])
. Note that while append
won't work with empty rows, the DataFrame constructor just fills them in with None
. If you want empty rows to be ['0','0','0']
, you have several options:
您还可以将函数包装在函数process_row中,然后执行df2 = pd.DataFrame([process_row(row)for df1.iterrows()]中的行)。请注意,虽然append不适用于空行,但DataFrame构造函数只是用None填充它们。如果您希望空行为['0','0','0'],您有以下几种选择:
-Have your processing function return ['0','0','0']
for empty rows
-Change the list comprehension to [process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()]
-Do df2=df2.fillna('0')
- 让处理函数返回['0','0','0']为空行 - 如果process_row(行),则将列表理解改为[process_row(row)else ['0','0','0 ']对于df1.iterrows()中的行] -Do df2 = df2.fillna('0')
#1
0
You're supposed to pass a DataFrame-shaped object to append
. Passing the raw data doesn't work. So df2=df2.append([['0','0','0']],ignore_index=True)
你应该传递一个DataFrame形状的对象来追加。传递原始数据不起作用。所以df2 = df2.append([['0','0','0']],ignore_index = True)
You can also wrap your processing in a function process_row
, then do df2 = pd.DataFrame([process_row(row) for row in df1.iterrows()])
. Note that while append
won't work with empty rows, the DataFrame constructor just fills them in with None
. If you want empty rows to be ['0','0','0']
, you have several options:
您还可以将函数包装在函数process_row中,然后执行df2 = pd.DataFrame([process_row(row)for df1.iterrows()]中的行)。请注意,虽然append不适用于空行,但DataFrame构造函数只是用None填充它们。如果您希望空行为['0','0','0'],您有以下几种选择:
-Have your processing function return ['0','0','0']
for empty rows
-Change the list comprehension to [process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()]
-Do df2=df2.fillna('0')
- 让处理函数返回['0','0','0']为空行 - 如果process_row(行),则将列表理解改为[process_row(row)else ['0','0','0 ']对于df1.iterrows()中的行] -Do df2 = df2.fillna('0')