如何使用python将.csv文件转换为.db文件?

时间:2021-03-29 23:56:21

I want to convert a csv file to a db (database) file using python. How should I do it ?

我想使用python将csv文件转换为db(数据库)文件。我该怎么办?

2 个解决方案

#1


0  

I don't think this can be done in full generality without out-of-band information or just treating everything as strings/text. That is, the information contained in the CSV file won't, in general, be sufficient to create a semantically “satisfying” solution. It might be good enough to infer what the types probably are for some cases, but it'll be far from bulletproof.

如果没有带外信息或只是将所有内容都视为字符串/文本,我认为这不能完全通用。也就是说,CSV文件中包含的信息通常不足以创建语义上“令人满意”的解决方案。在某些情况下推断出类型可能是多么好,但它远非防弹。

I would use Python's csv and sqlite3 modules, and try to:

我会使用Python的csv和sqlite3模块,并尝试:

  • convert the cells in the first CSV line into names for the SQL columns (strip “oddball” characters)
  • 将第一个CSV行中的单元格转换为SQL列的名称(剥离“oddball”字符)
  • infer the types of the columns by going through the cells in the second CSV file line (first line of data), attempting to convert each one first to an int, if that fails, try a float, and if that fails too, fall back to strings
  • 通过遍历第二个CSV文件行(第一行数据)中的单元格来推断列的类型,尝试将每个单元格首先转换为int,如果失败,请尝试浮动,如果失败,请退回字符串
  • this would give you a list of names and a list of corresponding probably types from which you can roll a CREATE TABLE statement and execute it
  • 这将为您提供一个名称列表和一个相应的可能类型列表,您可以从中滚动CREATE TABLE语句并执行它
  • try to INSERT the first and subsequent data lines from the CSV file
  • 尝试从CSV文件中插入第一个和后续数据行

There are many things to criticize in such an approach (e.g. no keys or indexes, fails if first line contains a field that is a string in general but just so happens to contain a value that's Python-convertible to an int or float in the first data line), but it'll probably work passably for the majority of CSV files.

在这种方法中有很多东西要批评(例如没有键或索引,如果第一行包含一般字符串的字段,但恰好包含一个Python可转换为int或float的值,则会失败数据线),但它可能适用于大多数CSV文件。

#2


1  

  1. You need to find a library that helps you to parse the csv file, or read the file line by line and parse it with standard python, it could be as simple as split the line on commas.

    您需要找到一个可以帮助您解析csv文件的库,或者逐行读取文件并使用标准python解析它,它可以像在逗号上拆分行一样简单。

  2. Insert in the Sqlite database. Here you have the python documentation on SQLite. You could also use sqlalchemy or other ORM .

    插入Sqlite数据库。这里有关于SQLite的python文档。您也可以使用sqlalchemy或其他ORM。

Another way, could be using the sqlite shell itself.

另一种方法,可能是使用sqlite shell本身。

#1


0  

I don't think this can be done in full generality without out-of-band information or just treating everything as strings/text. That is, the information contained in the CSV file won't, in general, be sufficient to create a semantically “satisfying” solution. It might be good enough to infer what the types probably are for some cases, but it'll be far from bulletproof.

如果没有带外信息或只是将所有内容都视为字符串/文本,我认为这不能完全通用。也就是说,CSV文件中包含的信息通常不足以创建语义上“令人满意”的解决方案。在某些情况下推断出类型可能是多么好,但它远非防弹。

I would use Python's csv and sqlite3 modules, and try to:

我会使用Python的csv和sqlite3模块,并尝试:

  • convert the cells in the first CSV line into names for the SQL columns (strip “oddball” characters)
  • 将第一个CSV行中的单元格转换为SQL列的名称(剥离“oddball”字符)
  • infer the types of the columns by going through the cells in the second CSV file line (first line of data), attempting to convert each one first to an int, if that fails, try a float, and if that fails too, fall back to strings
  • 通过遍历第二个CSV文件行(第一行数据)中的单元格来推断列的类型,尝试将每个单元格首先转换为int,如果失败,请尝试浮动,如果失败,请退回字符串
  • this would give you a list of names and a list of corresponding probably types from which you can roll a CREATE TABLE statement and execute it
  • 这将为您提供一个名称列表和一个相应的可能类型列表,您可以从中滚动CREATE TABLE语句并执行它
  • try to INSERT the first and subsequent data lines from the CSV file
  • 尝试从CSV文件中插入第一个和后续数据行

There are many things to criticize in such an approach (e.g. no keys or indexes, fails if first line contains a field that is a string in general but just so happens to contain a value that's Python-convertible to an int or float in the first data line), but it'll probably work passably for the majority of CSV files.

在这种方法中有很多东西要批评(例如没有键或索引,如果第一行包含一般字符串的字段,但恰好包含一个Python可转换为int或float的值,则会失败数据线),但它可能适用于大多数CSV文件。

#2


1  

  1. You need to find a library that helps you to parse the csv file, or read the file line by line and parse it with standard python, it could be as simple as split the line on commas.

    您需要找到一个可以帮助您解析csv文件的库,或者逐行读取文件并使用标准python解析它,它可以像在逗号上拆分行一样简单。

  2. Insert in the Sqlite database. Here you have the python documentation on SQLite. You could also use sqlalchemy or other ORM .

    插入Sqlite数据库。这里有关于SQLite的python文档。您也可以使用sqlalchemy或其他ORM。

Another way, could be using the sqlite shell itself.

另一种方法,可能是使用sqlite shell本身。