存储和遍历此数据的最有效方法是什么?

时间:2021-08-01 23:49:39

I want to check a blog post for occurrences of specific foreign words, and then link those words to sound files so they can be played.

我想检查博客文章中是否有特定外来词的出现,然后将这些词链接到声音文件,以便可以播放它们。

I have an XML file with 2500 words that I have sound files for, and I'm wondering what's the best way to store and traverse this list? The list isn't likely to change, and the function will be run on each blog post when viewed in full (not when excerpts are shown on archive pages etc).

我有一个2500字的XML文件,我有声音文件,我想知道存储和遍历此列表的最佳方法是什么?该列表不太可能改变,并且当完整查看时,该功能将在每个博客文章上运行(而不是在存档页面上显示摘录等)。

The XML file is 350KB, which I was loading into PHP with simplexml_load_file. I thought this was a bit large, so I converted it into a PHP file containing an indexed (by string) array of the words, which brings the file size down to about 60KB.

XML文件是350KB,我使用simplexml_load_file将其加载到PHP中。我认为这有点大,所以我把它转换成一个PHP文件,其中包含一个索引(字符串)字的数组,这使文件大小降低到大约60KB。

Should I be worrying so much about the file size, or more about how much time it will take to search through the data? Is there a better way of doing this or would it be best in a database? Any help would be appreciated!

我是否应该过多担心文件大小,或者更多关于搜索数据需要多长时间?有没有更好的方法来做到这一点,还是在数据库中最好?任何帮助,将不胜感激!

4 个解决方案

#1


3  

If you find parsing and matching the XML file against the blogpost happens within reasonable time, then there is no need to optimize. Optimize when you notice any significant negative impact.

如果您发现解析和匹配XML文件与博客帖子在合理的时间内发生,那么就没有必要进行优化。当您发现任何重大负面影响时进行优化。

The easiest approach would probably be to simply cache the processed pages. Whenever the blog post or the word list changes, invalidate the cache, so it gets processed anew the next time it's called.

最简单的方法可能是简单地缓存已处理的页面。每当博客文章或单词列表发生更改时,缓存都会失效,因此下次调用时会重新处理它。

#2


0  

Converting your file into a PHP array is just great (you can't do better than that performance-wise unless you go into writing your own extension). Not only is the input file smaller, but you have also taken care of a pretty CPU-heavy (in relation to your other operations) XML-parsing step.

将您的文件转换为PHP数组非常棒(除非您编写自己的扩展,否则您不能比性能更好)。输入文件不仅更小,而且还处理了相当大的CPU(与其他操作相关)XML解析步骤。

An objection could have been raised because an array will force you to read all of the data in at once, but weighing in at 60K that's no problem.

可能会提出反对意见,因为数组会强制您立即读取所有数据,但权重为60K时没问题。

As for searching the data, since PHP arrays are associative they offer pretty good performance in this kind of scenario.

至于搜索数据,由于PHP数组是关联的,因此它们在这种情况下提供了相当好的性能。

Overall I 'd say your approach is the correct one.

总的来说,我说你的方法是正确的。

#3


0  

Indexing based on the array of words stored in a file is time consuming than searching in the XML.

基于存储在文件中的单词数组进行索引比在XML中搜索要耗时。

#4


0  

Without doubt the most extensible solution to this is to use a database. This can handle huge amounts of data without significant performance drops, so if you had more data in future it would be trivial to add it. In this case, you could use sqlite, which requires fairly little in terms of installation and configuration and yet is fairly quick and powerful.

毫无疑问,最可扩展的解决方案是使用数据库。这可以处理大量数据而不会显着降低性能,因此如果将来有更多数据,添加它将是微不足道的。在这种情况下,你可以使用sqlite,它在安装和配置方面要求相当少,而且相当快速和强大。

Your solution using a PHP array (presumably using include/require) is a pretty good one, and I wouldn't worry too much about changing it. You are absolutely right, however, to lose the XML file. That would be both excessively labour-intensive and slow.

使用PHP数组的解决方案(可能使用include / require)是一个相当不错的解决方案,我不会过于担心更改它。但是,丢失XML文件是完全正确的。这将是过度劳动密集型和缓慢的。

#1


3  

If you find parsing and matching the XML file against the blogpost happens within reasonable time, then there is no need to optimize. Optimize when you notice any significant negative impact.

如果您发现解析和匹配XML文件与博客帖子在合理的时间内发生,那么就没有必要进行优化。当您发现任何重大负面影响时进行优化。

The easiest approach would probably be to simply cache the processed pages. Whenever the blog post or the word list changes, invalidate the cache, so it gets processed anew the next time it's called.

最简单的方法可能是简单地缓存已处理的页面。每当博客文章或单词列表发生更改时,缓存都会失效,因此下次调用时会重新处理它。

#2


0  

Converting your file into a PHP array is just great (you can't do better than that performance-wise unless you go into writing your own extension). Not only is the input file smaller, but you have also taken care of a pretty CPU-heavy (in relation to your other operations) XML-parsing step.

将您的文件转换为PHP数组非常棒(除非您编写自己的扩展,否则您不能比性能更好)。输入文件不仅更小,而且还处理了相当大的CPU(与其他操作相关)XML解析步骤。

An objection could have been raised because an array will force you to read all of the data in at once, but weighing in at 60K that's no problem.

可能会提出反对意见,因为数组会强制您立即读取所有数据,但权重为60K时没问题。

As for searching the data, since PHP arrays are associative they offer pretty good performance in this kind of scenario.

至于搜索数据,由于PHP数组是关联的,因此它们在这种情况下提供了相当好的性能。

Overall I 'd say your approach is the correct one.

总的来说,我说你的方法是正确的。

#3


0  

Indexing based on the array of words stored in a file is time consuming than searching in the XML.

基于存储在文件中的单词数组进行索引比在XML中搜索要耗时。

#4


0  

Without doubt the most extensible solution to this is to use a database. This can handle huge amounts of data without significant performance drops, so if you had more data in future it would be trivial to add it. In this case, you could use sqlite, which requires fairly little in terms of installation and configuration and yet is fairly quick and powerful.

毫无疑问,最可扩展的解决方案是使用数据库。这可以处理大量数据而不会显着降低性能,因此如果将来有更多数据,添加它将是微不足道的。在这种情况下,你可以使用sqlite,它在安装和配置方面要求相当少,而且相当快速和强大。

Your solution using a PHP array (presumably using include/require) is a pretty good one, and I wouldn't worry too much about changing it. You are absolutely right, however, to lose the XML file. That would be both excessively labour-intensive and slow.

使用PHP数组的解决方案(可能使用include / require)是一个相当不错的解决方案,我不会过于担心更改它。但是,丢失XML文件是完全正确的。这将是过度劳动密集型和缓慢的。