I have 2 files, each having 2 words : "word1" and "word2"
我有2个文件,每个文件有2个单词:“word1”和“word2”
They are
他们是
- An XML
- 一个XML
<text>
<word id="word1">
<file>File1Name.txt</file>
<file>File2Name.txt</file>
<file>File3Name.txt</file>
</word>
<word id="word2">
<file>File1Name.txt</file>
<file>File4Name.txt</file>
</word>
</text>
- A CSV File
- CSV文件
word1, File1Name.txt, File2name.txt, File3Name.txt
word2, File1Name.txt, File4Name.txt
Suppose I have 1 million words in both formats and I have to search for one word. Which format would be faster to retrieve my required files which contain that word?
假设我在两种格式中都有100万个单词,我必须搜索一个单词。检索包含该单词的所需文件的格式会更快?
1 个解决方案
#1
-1
Hey I wanted to put my two cents here. https://github.com/elastic/elasticsearch
嘿,我想把我的两分钱放在这里。 https://github.com/elastic/elasticsearch
is something I highly recommend you look into for something like this. I would recommend JSON over either XML or CSV as far as performance. But if you are going to have a million records. Something like document store with a non-relational DB, such as MongoDB would provide you with the fastest result's possible most likely, especially if your data is flat.
我强烈建议你研究这样的事情。就性能而言,我建议使用XML或CSV格式的JSON。但如果你要有一百万条记录。像带有非关系数据库的文档存储之类的东西,例如MongoDB,可能会为您提供最快的结果,特别是如果您的数据是平的。
Alternatively if this is something you are loading into memory, I would try to use some type of caching solution, let me know if you have more questions. Something like redis might be useful for you. http://redis.io/topics/introduction
或者,如果您正在加载到内存中,我会尝试使用某种类型的缓存解决方案,如果您有更多问题,请告诉我。像redis这样的东西可能对你有用。 http://redis.io/topics/introduction
#1
-1
Hey I wanted to put my two cents here. https://github.com/elastic/elasticsearch
嘿,我想把我的两分钱放在这里。 https://github.com/elastic/elasticsearch
is something I highly recommend you look into for something like this. I would recommend JSON over either XML or CSV as far as performance. But if you are going to have a million records. Something like document store with a non-relational DB, such as MongoDB would provide you with the fastest result's possible most likely, especially if your data is flat.
我强烈建议你研究这样的事情。就性能而言,我建议使用XML或CSV格式的JSON。但如果你要有一百万条记录。像带有非关系数据库的文档存储之类的东西,例如MongoDB,可能会为您提供最快的结果,特别是如果您的数据是平的。
Alternatively if this is something you are loading into memory, I would try to use some type of caching solution, let me know if you have more questions. Something like redis might be useful for you. http://redis.io/topics/introduction
或者,如果您正在加载到内存中,我会尝试使用某种类型的缓存解决方案,如果您有更多问题,请告诉我。像redis这样的东西可能对你有用。 http://redis.io/topics/introduction