I have a big list of words (over 2 millions) in CSV file (size about 35MB). I wanted to import the CSV file into sqlite3 with index (primiary key). So I've imported it using sqlite command line tool. The DB has been created and size of the .sqlite file has grown to over 120MB! (50% because of primary key index)
我在CSV文件中有一个很大的单词列表(超过200万)(大小约为35MB)。我想用索引(原语键)将CSV文件导入sqlite3。我已经使用sqlite命令行工具导入了它。DB已经创建,.sqlite文件的大小已经增长到超过120MB!(主要关键指标占50%)
And here we get the problem: if I add this 120MB .sqlite file to the resources even after compressing to .ipa file it has >60MB. And I'd like if it will be less then 30MB (because of the limitiation through E/3G).
这里我们遇到了一个问题:如果我将这个120MB .sqlite文件添加到资源中,即使压缩到.ipa文件,它也有>60MB。我想要小于30MB(因为通过E/3G下载)。
Also because of the size I cannot import it (zipped sqlite file) by a web service (45MB * 1000 download = 45GB! it's my server's half year limit).
另外,由于我无法通过web服务导入它(压缩的sqlite文件)(45MB * 1000下载= 45GB!)这是我的服务器的半年限制)。
So I thought I could do something like this:
所以我想我可以这样做:
- compress the CSV file with words to ZIP and than the file will have only 7MB file.
- 压缩CSV文件与文字压缩,比文件将只有7MB的文件。
- add ZIP file to resources.
- 向参考资料中添加ZIP文件。
- in the application I can unzip the file and import data from the unzipped CSV file to sqlite.
- 在应用程序中,我可以解压该文件并将数据从解压的CSV文件导入到sqlite。
But I don't know how to do this. I've tried to do this :
但我不知道怎么做。我试着这样做:
sqlite3_exec(sqlite3_database, ".import mydata.csv mytable", callback, 0, &errMsg);
but it doesn't work. The reason for the failure is ".import" is a part of the command line interface and not in the C API.
但它不工作。失败的原因是。“导入”是命令行接口的一部分,不在C API中。
So I need to know how to import it(unzipped CSV file) to the SQLite file inside app (not during develompent using command line).
所以我需要知道如何将它(解压缩的CSV文件)导入到app中的SQLite文件中(而不是在develompent使用命令行时)。
3 个解决方案
#1
2
If the words that you are inserting are unique you could make the text the primary key.
如果插入的单词是唯一的,可以将文本作为主键。
If you only want to test whether words exist in a set (say for a spell checker), you could use an alternative data structure such as a bloom filter, which only requires 9.6 bits for each word with 1% false positives.
如果您只想测试一个集合中是否存在单词(比如拼写检查器),那么您可以使用另一种数据结构,比如bloom filter,它对每个单词只要求9.6位,其中有1%的误报。
http://en.wikipedia.org/wiki/Bloom_filter
http://en.wikipedia.org/wiki/Bloom_filter
#2
1
As FlightOfStairs mentioned depending on the requirements a bloom filter is one solution, if you need the full data another solution is to use a trie or radix tree data structure. You would preprocess your data and build these datastructures and then either put it in sqlite or some other external data format.
正如flightofstair所提到的,根据需求,bloom filter是一种解决方案,如果您需要完整的数据,另一种解决方案是使用trie或radix树数据结构。您将对数据进行预处理并构建这些数据结构,然后将其放入sqlite或其他外部数据格式。
#3
0
The simplest solution would be to write a CSV parser using NSScanner and insert the rows into the database one by one. That's actually a fairly easy job—you can find a complete CSV parser here.
最简单的解决方案是使用NSScanner编写CSV解析器,并将这些行逐个插入数据库。这实际上是一个相当简单的工作—您可以在这里找到一个完整的CSV解析器。
#1
2
If the words that you are inserting are unique you could make the text the primary key.
如果插入的单词是唯一的,可以将文本作为主键。
If you only want to test whether words exist in a set (say for a spell checker), you could use an alternative data structure such as a bloom filter, which only requires 9.6 bits for each word with 1% false positives.
如果您只想测试一个集合中是否存在单词(比如拼写检查器),那么您可以使用另一种数据结构,比如bloom filter,它对每个单词只要求9.6位,其中有1%的误报。
http://en.wikipedia.org/wiki/Bloom_filter
http://en.wikipedia.org/wiki/Bloom_filter
#2
1
As FlightOfStairs mentioned depending on the requirements a bloom filter is one solution, if you need the full data another solution is to use a trie or radix tree data structure. You would preprocess your data and build these datastructures and then either put it in sqlite or some other external data format.
正如flightofstair所提到的,根据需求,bloom filter是一种解决方案,如果您需要完整的数据,另一种解决方案是使用trie或radix树数据结构。您将对数据进行预处理并构建这些数据结构,然后将其放入sqlite或其他外部数据格式。
#3
0
The simplest solution would be to write a CSV parser using NSScanner and insert the rows into the database one by one. That's actually a fairly easy job—you can find a complete CSV parser here.
最简单的解决方案是使用NSScanner编写CSV解析器,并将这些行逐个插入数据库。这实际上是一个相当简单的工作—您可以在这里找到一个完整的CSV解析器。