使用Python从.txt文件填充SQLite3数据库

时间:2021-09-05 05:32:38

I am trying to setup a website in django which allows the user to send queries to a database containing information about their representatives in the European Parliament. I have the data in a comma seperated .txt file with the following format:

我正在尝试在django中设置一个网站,允许用户向包含欧洲议会代表信息的数据库发送查询。我将数据放在逗号分隔的.txt文件中,格式如下:

Parliament, Name, Country, Party_Group, National_Party, Position

议会,名称,国家,Party_Group,National_Party,职位

7, Marta Andreasen, United Kingdom, Europe of freedom and democracy Group, United Kingdom Independence Party, Member

7,Marta Andreasen,英国,欧洲**集团,英国独立党,成员

etc....

等等....

I want to populate a SQLite3 database with this data, but so far all the tutorials I have found only show how to do this by hand. Since I have 736 observations in the file I dont really want to do this.

我想用这些数据填充SQLite3数据库,但到目前为止,我发现的所有教程都只显示了如何手动执行此操作。由于我在文件中有736个观察结果,所以我真的不想这样做。

I suspect this is a simple matter, but I would be very grateful if someone could show me how to do this.

我怀疑这是一件简单的事情,但如果有人能告诉我如何做到这一点,我将非常感激。

Thomas

托马斯

5 个解决方案

#1


17  

So assuming your models.py looks something like this:

所以假设你的models.py看起来像这样:

class Representative(models.Model):
    parliament = models.CharField(max_length=128)
    name = models.CharField(max_length=128)
    country = models.CharField(max_length=128)
    party_group = models.CharField(max_length=128)
    national_party = models.CharField(max_length=128)
    position = models.CharField(max_length=128)

You can then run python manage.py shell and execute the following:

然后,您可以运行python manage.py shell并执行以下命令:

import csv
from your_app.models import Representative
# If you're using different field names, change this list accordingly.
# The order must also match the column order in the CSV file.
fields = ['parliament', 'name', 'country', 'party_group', 'national_party', 'position']
for row in csv.reader(open('your_file.csv')):
    Representative.objects.create(**dict(zip(fields, row)))

And you're done.

而且你已经完成了。

Addendum (edit)

附录(编辑)

Per Thomas's request, here's an explanation of what **dict(zip(fields,row)) does:

根据托马斯的要求,这里是** dict(zip(fields,row))的作用的解释:

So initially, fields contains a list of field names that we defined, and row contains a list of values that represents the current row in the CSV file.

因此,最初,字段包含我们定义的字段名称列表,而行包含表示CSV文件中当前行的值列表。

fields = ['parliament', 'name', 'country', ...]
row = ['7', 'Marta Andreasen', 'United Kingdom', ...]

What zip() does is it combines two lists into one list of pairs of items from both lists (like a zipper); i.e. zip(['a','b,'c'], ['A','B','C']) will return [('a','A'), ('b','B'), ('c','C')]. So in our case:

zip()的作用是将两个列表组合成两个列表中的一对项目列表(如拉链);即zip(['a','b,'c'],['A','B','C'])将返回[('a','A'),('b','B '),('c','C')]。在我们的案例中:

>>> zip(fields, row)
[('parliament', '7'), ('name', 'Marta Andreasen'), ('country', 'United Kingdom'), ...]

The dict() function simply converts the list of pairs into a dictionary.

dict()函数只是将对列表转换为字典。

>>> dict(zip(fields, row))
{'parliament': '7', 'name': 'Marta Andreasen', 'country': 'United Kingdom', ...}

The ** is a way of converting a dictionary into a keyword argument list for a function. So function(**{'key': 'value'}) is the equivalent of function(key='value'). So in out example, calling create(**dict(zip(field, row))) is the equivalent of:

**是一种将字典转换为函数的关键字参数列表的方法。所以函数(** {'key':'value'})等同于函数(key ='value')。因此在示例中,调用create(** dict(zip(field,row)))相当于:

create(parliament='7', name='Marta Andreasen', country='United Kingdom', ...)

Hope this clears things up.

希望这可以解决问题。

#2


4  

As SiggyF says and only slightly differently than Joschua:

正如SiggyF所说,与Joschua略有不同:

Create a text file with your schema, e.g.:

使用您的架构创建一个文本文件,例如:

CREATE TABLE politicians (
    Parliament text, 
    Name text, 
    Country text, 
    Party_Group text, 
    National_Party text, 
    Position text
);

Create table:

创建表格:

>>> import csv, sqlite3
>>> conn = sqlite3.connect('my.db')
>>> c = conn.cursor()
>>> with open('myschema.sql') as f:            # read in schema file 
...   schema = f.read()
... 
>>> c.execute(schema)                          # create table per schema 
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()                              # commit table creation

Use csv module to read file with data to be inserted:

使用csv模块读取要插入的数据的文件:

>>> csv_reader = csv.reader(open('myfile.txt'), skipinitialspace=True)
>>> csv_reader.next()                          # skip the first line in the file
['Parliament', 'Name', 'Country', ...

# put all data in a tuple
# edit: decoding from utf-8 file to unicode
>>> to_db = tuple([i.decode('utf-8') for i in line] for line in csv_reader)
>>> to_db                                      # this will be inserted into table
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Insert data:

插入数据:

>>> c.executemany("INSERT INTO politicians VALUES (?,?,?,?,?,?);", to_db)
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()

Verify that all went as expected:

验证所有内容是否符合预期:

>>> c.execute('SELECT * FROM politicians').fetchall()
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Edit:
And since you've decoded (to unicode) on input, you need to be sure to encode on output.
For example:

编辑:因为你已经在输入上解码(到unicode),你需要确保在输出上编码。例如:

with open('encoded_output.txt', 'w') as f:
  for row in c.execute('SELECT * FROM politicians').fetchall():
    for col in row:
      f.write(col.encode('utf-8'))
      f.write('\n')

#3


2  

You could read the data using the csv module. Then you can create an insert sql statement and use the method executemany:

您可以使用csv模块读取数据。然后你可以创建一个insert sql语句并使用executemany方法:

  cursor.executemany(sql, rows)

or use add_all if you use sqlalchemy.

如果您使用sqlalchemy,请使用add_all。

#4


2  

You asked what the create(**dict(zip(fields, row))) line did.

你问创建(** dict(zip(fields,row)))行是做什么的。

I don't know how to reply directly to your comment, so I'll try to answer it here.

我不知道如何直接回复你的评论,所以我会在这里试着回答。

zip takes multiple lists as args and returns a list of their correspond elements as tuples.

zip将多个列表作为args并返回其对应元素的列表作为元组。

zip(list1, list2) => [(list1[0], list2[0]), (list1[1], list2[1]), .... ]

zip(list1,list2)=> [(list1 [0],list2 [0]),(list1 [1],list2 [1]),....]

dict takes a list of 2-element tuples and returns a dictionary mapping each tuple's first element (key) to its second element (value).

dict获取一个2元素元组的列表,并返回一个字典,将每个元组的第一个元素(键)映射到它的第二个元素(值)。

create is a function that takes keyword arguments. You can use **some_dictionary to pass that dictionary into a function as keyword arguments.

create是一个接收关键字参数的函数。您可以使用** some_dictionary将该字典作为关键字参数传递给函数。

create(**{'name':'john', 'age':5}) => create(name='john', age=5)

create(** {'name':'john','age':5})=> create(name ='john',age = 5)

#5


0  

Something like the following should work: (not tested)

以下内容应该有效:(未经测试)

# Open database (will be created if not exists)
conn = sqlite3.connect('/path/to/your_file.db')

c = conn.cursor()

# Create table
c.execute('''create table representatives
(parliament text, name text, country text, party_group text, national_party text, position text)''')

f = open("thefile.txt")
for i in f.readlines():
    # Insert a row of data
    c.execute("""insert into representatives
                 values (?,?,?,?,?,?)""", *i.split(", ")) # *i.split(", ") does unpack the list as arguments

# Save (commit) the changes
conn.commit()

# We can also close the cursor if we are done with it
c.close()

#1


17  

So assuming your models.py looks something like this:

所以假设你的models.py看起来像这样:

class Representative(models.Model):
    parliament = models.CharField(max_length=128)
    name = models.CharField(max_length=128)
    country = models.CharField(max_length=128)
    party_group = models.CharField(max_length=128)
    national_party = models.CharField(max_length=128)
    position = models.CharField(max_length=128)

You can then run python manage.py shell and execute the following:

然后,您可以运行python manage.py shell并执行以下命令:

import csv
from your_app.models import Representative
# If you're using different field names, change this list accordingly.
# The order must also match the column order in the CSV file.
fields = ['parliament', 'name', 'country', 'party_group', 'national_party', 'position']
for row in csv.reader(open('your_file.csv')):
    Representative.objects.create(**dict(zip(fields, row)))

And you're done.

而且你已经完成了。

Addendum (edit)

附录(编辑)

Per Thomas's request, here's an explanation of what **dict(zip(fields,row)) does:

根据托马斯的要求,这里是** dict(zip(fields,row))的作用的解释:

So initially, fields contains a list of field names that we defined, and row contains a list of values that represents the current row in the CSV file.

因此,最初,字段包含我们定义的字段名称列表,而行包含表示CSV文件中当前行的值列表。

fields = ['parliament', 'name', 'country', ...]
row = ['7', 'Marta Andreasen', 'United Kingdom', ...]

What zip() does is it combines two lists into one list of pairs of items from both lists (like a zipper); i.e. zip(['a','b,'c'], ['A','B','C']) will return [('a','A'), ('b','B'), ('c','C')]. So in our case:

zip()的作用是将两个列表组合成两个列表中的一对项目列表(如拉链);即zip(['a','b,'c'],['A','B','C'])将返回[('a','A'),('b','B '),('c','C')]。在我们的案例中:

>>> zip(fields, row)
[('parliament', '7'), ('name', 'Marta Andreasen'), ('country', 'United Kingdom'), ...]

The dict() function simply converts the list of pairs into a dictionary.

dict()函数只是将对列表转换为字典。

>>> dict(zip(fields, row))
{'parliament': '7', 'name': 'Marta Andreasen', 'country': 'United Kingdom', ...}

The ** is a way of converting a dictionary into a keyword argument list for a function. So function(**{'key': 'value'}) is the equivalent of function(key='value'). So in out example, calling create(**dict(zip(field, row))) is the equivalent of:

**是一种将字典转换为函数的关键字参数列表的方法。所以函数(** {'key':'value'})等同于函数(key ='value')。因此在示例中,调用create(** dict(zip(field,row)))相当于:

create(parliament='7', name='Marta Andreasen', country='United Kingdom', ...)

Hope this clears things up.

希望这可以解决问题。

#2


4  

As SiggyF says and only slightly differently than Joschua:

正如SiggyF所说,与Joschua略有不同:

Create a text file with your schema, e.g.:

使用您的架构创建一个文本文件,例如:

CREATE TABLE politicians (
    Parliament text, 
    Name text, 
    Country text, 
    Party_Group text, 
    National_Party text, 
    Position text
);

Create table:

创建表格:

>>> import csv, sqlite3
>>> conn = sqlite3.connect('my.db')
>>> c = conn.cursor()
>>> with open('myschema.sql') as f:            # read in schema file 
...   schema = f.read()
... 
>>> c.execute(schema)                          # create table per schema 
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()                              # commit table creation

Use csv module to read file with data to be inserted:

使用csv模块读取要插入的数据的文件:

>>> csv_reader = csv.reader(open('myfile.txt'), skipinitialspace=True)
>>> csv_reader.next()                          # skip the first line in the file
['Parliament', 'Name', 'Country', ...

# put all data in a tuple
# edit: decoding from utf-8 file to unicode
>>> to_db = tuple([i.decode('utf-8') for i in line] for line in csv_reader)
>>> to_db                                      # this will be inserted into table
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Insert data:

插入数据:

>>> c.executemany("INSERT INTO politicians VALUES (?,?,?,?,?,?);", to_db)
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()

Verify that all went as expected:

验证所有内容是否符合预期:

>>> c.execute('SELECT * FROM politicians').fetchall()
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Edit:
And since you've decoded (to unicode) on input, you need to be sure to encode on output.
For example:

编辑:因为你已经在输入上解码(到unicode),你需要确保在输出上编码。例如:

with open('encoded_output.txt', 'w') as f:
  for row in c.execute('SELECT * FROM politicians').fetchall():
    for col in row:
      f.write(col.encode('utf-8'))
      f.write('\n')

#3


2  

You could read the data using the csv module. Then you can create an insert sql statement and use the method executemany:

您可以使用csv模块读取数据。然后你可以创建一个insert sql语句并使用executemany方法:

  cursor.executemany(sql, rows)

or use add_all if you use sqlalchemy.

如果您使用sqlalchemy,请使用add_all。

#4


2  

You asked what the create(**dict(zip(fields, row))) line did.

你问创建(** dict(zip(fields,row)))行是做什么的。

I don't know how to reply directly to your comment, so I'll try to answer it here.

我不知道如何直接回复你的评论,所以我会在这里试着回答。

zip takes multiple lists as args and returns a list of their correspond elements as tuples.

zip将多个列表作为args并返回其对应元素的列表作为元组。

zip(list1, list2) => [(list1[0], list2[0]), (list1[1], list2[1]), .... ]

zip(list1,list2)=> [(list1 [0],list2 [0]),(list1 [1],list2 [1]),....]

dict takes a list of 2-element tuples and returns a dictionary mapping each tuple's first element (key) to its second element (value).

dict获取一个2元素元组的列表,并返回一个字典,将每个元组的第一个元素(键)映射到它的第二个元素(值)。

create is a function that takes keyword arguments. You can use **some_dictionary to pass that dictionary into a function as keyword arguments.

create是一个接收关键字参数的函数。您可以使用** some_dictionary将该字典作为关键字参数传递给函数。

create(**{'name':'john', 'age':5}) => create(name='john', age=5)

create(** {'name':'john','age':5})=> create(name ='john',age = 5)

#5


0  

Something like the following should work: (not tested)

以下内容应该有效:(未经测试)

# Open database (will be created if not exists)
conn = sqlite3.connect('/path/to/your_file.db')

c = conn.cursor()

# Create table
c.execute('''create table representatives
(parliament text, name text, country text, party_group text, national_party text, position text)''')

f = open("thefile.txt")
for i in f.readlines():
    # Insert a row of data
    c.execute("""insert into representatives
                 values (?,?,?,?,?,?)""", *i.split(", ")) # *i.split(", ") does unpack the list as arguments

# Save (commit) the changes
conn.commit()

# We can also close the cursor if we are done with it
c.close()