从json中创建的熊猫dataframe有一个未命名的列——由于未命名的列问题,无法插入到MySQL中

时间:2023-02-02 22:55:45

Right now I messing with some JSON data and I am trying to push it into the MySQL database on the fly. The JSON file is enormous so I have to carefully go through it line by line using yield function in Python, convert each JSON line into small pandas DF and write it into MySQL. The problem is that when I create DF from JSON it adds the index column. And it seems that when I write stuff to MySQL it ignores index=False option. Code below

现在我把一些JSON数据弄乱了,我试着把它动态地放到MySQL数据库中。JSON文件非常庞大,所以我必须使用Python中的yield函数逐行地检查它,将每一行JSON转换成小熊猫DF,并将其写入MySQL。问题是,当我从JSON创建DF时,它会添加索引列。当我写东西给MySQL时,它会忽略index=False选项。下面的代码

import gzip
import pandas as pd
from sqlalchemy import create_engine

#stuff to parse json file
def parseJSON(path):
  g = open(path, 'r')
  for l in g:
      yield eval(l)
#MySQL engine
engine = create_engine('mysql://login:password@localhost:1234/MyDB', echo=False)
#empty df just to have it
df = {}

for l in parseJSON("MyFile.json"):
    df = pd.DataFrame.from_dict(l, orient='index')
    df.to_sql(name='MyTable', con=engine, if_exists = 'append', index=False)

And I get a error:

我得到一个错误:

OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column '0' in 'field list'")

Any ideas what I am missing? Or is there a way to get around this stuff?

你知道我错过了什么吗?还是有办法解决这个问题?

UPD. I see that dataframe has an unnamed column with value 0 each time I create the dataframe in inner loop.

乌利希期刊指南。我看到,每次在内部循环中创建dataframe时,都会有一个值为0的未命名列。

Here is some info about DF:

这里有一些关于DF的信息:

df
Out[155]: 
                                                                0
reviewerID                                         A1C2VKKDCP5H97
asin                                                   0007327064
reviewerName                                        Donna Polston
helpful                                                    [0, 0]
unixReviewTime                                         1392768000
reviewText      love Oddie ,One of my favorite books are the O...
overall                                                         5
reviewTime                                            02 19, 2014
summary                                                       Wow

print(df.columns)
RangeIndex(start=0, stop=1, step=1)

1 个解决方案

#1


1  

You currently have a frame with one column named 0 with your intended column names as the index of your frame. Perhaps you can try

当前有一个框架,其中一列名为0,指定的列名作为框架的索引。也许你可以试试

df = pd.DataFrame.from_dict(l)

NOTE: I think you would have much better performance if you could build up a dict (or some other structure), convert all rows to a df then push to mysql. This one row at a time might be too slow

注意:我认为如果您可以构建一个命令(或其他结构),将所有行转换为df然后将其推到mysql,您将会有更好的性能。一次这一行可能太慢了。

#1


1  

You currently have a frame with one column named 0 with your intended column names as the index of your frame. Perhaps you can try

当前有一个框架,其中一列名为0,指定的列名作为框架的索引。也许你可以试试

df = pd.DataFrame.from_dict(l)

NOTE: I think you would have much better performance if you could build up a dict (or some other structure), convert all rows to a df then push to mysql. This one row at a time might be too slow

注意:我认为如果您可以构建一个命令(或其他结构),将所有行转换为df然后将其推到mysql,您将会有更好的性能。一次这一行可能太慢了。