如何将列表保存到保留空行的数据框?

时间:2022-06-30 19:31:28

I'm trying to extract subject-verb-object triplets and then attach an ID. I am using a loop so my list of extracted triplets keeping the results for the rows were no triplet was found. So it looks like:

我正在尝试提取subject-verb-object三元组,然后附加一个ID。我正在使用一个循环,所以我的提取三元组列表保持行的结果是没有三元组被发现。所以它看起来像:

[]
[trump,carried,energy]
[]
[clinton,doesn't,trust]

When I print mylist it looks as expected.

当我打印mylist时,它看起来像预期的那样。

However when I try and create a dataframe from mylist I get an error caused by the empty rows

但是当我尝试从mylist创建一个数据帧时,我得到一个由空行引起的错误

`IndexError: list index out of range`.

I tried to include an if statement to avoid this but the problem is the same. I also tried using reindex instead but the df2 came out empty.

我试图包含一个if语句来避免这种情况,但问题是一样的。我也尝试使用reindex,但df2出来了。

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import spacy
import textacy
import csv, string, re
import numpy as np
import pandas as pd

#Import csv file with pre-processing already carried out
import pandas as pd
df = pd.read_csv("pre-processed_file_1.csv", sep=",")

#Prepare dataframe to be relevant columns and unicode
df1 = df[['text_1', 'id']].copy()
import StringIO
s = StringIO.StringIO()
tweets = df1.to_csv(encoding='utf-8');
nlp = spacy.load('en')

count = 0;
df2 = pd.DataFrame();
for row in df1.iterrows():
  doc = nlp(unicode(row));
  text_ext = textacy.extract.subject_verb_object_triples(doc);
  tweetID = df['id'].tolist();
  mylist = list(text_ext)
  count = count + 1;
  if (mylist):
        df2 = df2.append(mylist, ignore_index=True)
  else:
        df2 = df2.append('0','0','0')

Any help would be very appreciated. Thank you!

任何帮助将非常感激。谢谢!

1 个解决方案

#1


0  

You're supposed to pass a DataFrame-shaped object to append. Passing the raw data doesn't work. So df2=df2.append([['0','0','0']],ignore_index=True)

你应该传递一个DataFrame形状的对象来追加。传递原始数据不起作用。所以df2 = df2.append([['0','0','0']],ignore_index = True)

You can also wrap your processing in a function process_row, then do df2 = pd.DataFrame([process_row(row) for row in df1.iterrows()]). Note that while append won't work with empty rows, the DataFrame constructor just fills them in with None. If you want empty rows to be ['0','0','0'], you have several options:

您还可以将函数包装在函数process_row中,然后执行df2 = pd.DataFrame([process_row(row)for df1.iterrows()]中的行)。请注意,虽然append不适用于空行,但DataFrame构造函数只是用None填充它们。如果您希望空行为['0','0','0'],您有以下几种选择:

-Have your processing function return ['0','0','0'] for empty rows
-Change the list comprehension to [process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()]
-Do df2=df2.fillna('0')

- 让处理函数返回['0','0','0']为空行 - 如果process_row(行),则将列表理解改为[process_row(row)else ['0','0','0 ']对于df1.iterrows()中的行] -Do df2 = df2.fillna('0')

#1


0  

You're supposed to pass a DataFrame-shaped object to append. Passing the raw data doesn't work. So df2=df2.append([['0','0','0']],ignore_index=True)

你应该传递一个DataFrame形状的对象来追加。传递原始数据不起作用。所以df2 = df2.append([['0','0','0']],ignore_index = True)

You can also wrap your processing in a function process_row, then do df2 = pd.DataFrame([process_row(row) for row in df1.iterrows()]). Note that while append won't work with empty rows, the DataFrame constructor just fills them in with None. If you want empty rows to be ['0','0','0'], you have several options:

您还可以将函数包装在函数process_row中,然后执行df2 = pd.DataFrame([process_row(row)for df1.iterrows()]中的行)。请注意,虽然append不适用于空行,但DataFrame构造函数只是用None填充它们。如果您希望空行为['0','0','0'],您有以下几种选择:

-Have your processing function return ['0','0','0'] for empty rows
-Change the list comprehension to [process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()]
-Do df2=df2.fillna('0')

- 让处理函数返回['0','0','0']为空行 - 如果process_row(行),则将列表理解改为[process_row(row)else ['0','0','0 ']对于df1.iterrows()中的行] -Do df2 = df2.fillna('0')