I have an RSS feed from Readability that I'm using to keep a record of articles that I've read. I'm grabbing the titles and URL's and inserting them into the database for my own use.
我有一个来自Readability的RSS提要,我用它来保存我读过的文章的记录。我抓取了标题和URL并将它们插入到数据库中以供自己使用。
However, my INSERT seems to be taking the entire feed and trying to reinsert it every time which is causing a duplicate error (see here). Now, I know I can remove that error by using INSERT IGNORE
but is there another way to go about this?
但是,我的插入似乎是在占用整个提要,并试图每次都重新插入它导致重复错误(参见这里)。现在,我知道我可以用插入忽略来删除这个错误但是还有别的方法吗?
Possibly by doing something like this:
可能是这样做的:
Check DB for last entry => Compare last entry to array data => INSERT what isn't there into DB.
最后一项检查DB =>,将最后一项与数组数据进行比较=>插入到数据库中没有的内容。
2 个解决方案
#1
1
You've got the right idea, sure; you could either get the most recent datetime from the database and only insert items newer than that, or (if you want to be really complete) get everything from the database, compare against everything in the feed, and only insert items which don't match something already in the database. But if you really want INSERT only to insert new data, as implied in your question title, then INSERT IGNORE
is the way to go, and doubtless the simplest implementation as well. Unless you've got a concern about the amount of traffic on the database, I'd stick with it.
你的想法是正确的;您可以从数据库获得最新的datetime,并且只插入更新的项,或者(如果您想要真正完整)从数据库获取所有内容,与提要中的所有内容进行比较,只插入与数据库中已经存在的内容不匹配的项。但是,如果你真的想插入新数据,就像你的问题标题所暗示的那样,那么插入忽略就是一种方法,无疑也是最简单的实现。除非你对数据库上的流量有顾虑,否则我还是会坚持下去。
#2
1
There is no shame in INSERT IGNORE
. Use it an be merry! (Seriously, data integrity logic you have to manually handle yourself is annoying and more error prone).
插入忽略是没有羞耻的。好好利用它吧!(严肃地说,你必须手动处理的数据完整性逻辑是恼人的,而且容易出错)。
Most SQL dialects have some concept of merging data, and this just happens to be the way that MySQL handles it. This means that not only will INSERT IGNORE
be a fast and easy way of handling data, it will also have the novelty of being good practice.
大多数SQL方言都有合并数据的概念,而这恰好是MySQL处理它的方式。这意味着不仅插入忽略是一种快速简单的处理数据的方式,它还将具有良好实践的新奇性。
Your other problem is that RSS doesn't really help in any other shortcut. I really like @AaronMiller's suggestion, but the pubDate
element is optional, meaning that unless you have complete control over the RSS (and I would guess that you don't, based on the fact that you're worried about storing it), you won't be able to rely on it being there.
你的另一个问题是,RSS在任何其他快捷方式中都没有帮助。我真的很喜欢@AaronMiller的建议,但是pubDate元素是可选的,这意味着除非你完全控制了RSS(我猜你不会,基于你担心存储它的事实),你将无法依赖它存在。
For that matter, the only data which is guaranteed to be a part of an RSS
item is the description. There is no guarantee that at a future date the feed may change and drop the title or the link elements. If that is not a guarantee, then it might be a good idea to use INSERT IGNORE
and pair it with some sort of hash to boot.
对于这个问题,唯一保证成为RSS条目一部分的数据就是描述。无法保证在将来的日期中,提要可能更改并删除标题或链接元素。如果这不是一个保证,那么使用INSERT忽略并将其与某种散列进行配对可能是一个好主意。
#1
1
You've got the right idea, sure; you could either get the most recent datetime from the database and only insert items newer than that, or (if you want to be really complete) get everything from the database, compare against everything in the feed, and only insert items which don't match something already in the database. But if you really want INSERT only to insert new data, as implied in your question title, then INSERT IGNORE
is the way to go, and doubtless the simplest implementation as well. Unless you've got a concern about the amount of traffic on the database, I'd stick with it.
你的想法是正确的;您可以从数据库获得最新的datetime,并且只插入更新的项,或者(如果您想要真正完整)从数据库获取所有内容,与提要中的所有内容进行比较,只插入与数据库中已经存在的内容不匹配的项。但是,如果你真的想插入新数据,就像你的问题标题所暗示的那样,那么插入忽略就是一种方法,无疑也是最简单的实现。除非你对数据库上的流量有顾虑,否则我还是会坚持下去。
#2
1
There is no shame in INSERT IGNORE
. Use it an be merry! (Seriously, data integrity logic you have to manually handle yourself is annoying and more error prone).
插入忽略是没有羞耻的。好好利用它吧!(严肃地说,你必须手动处理的数据完整性逻辑是恼人的,而且容易出错)。
Most SQL dialects have some concept of merging data, and this just happens to be the way that MySQL handles it. This means that not only will INSERT IGNORE
be a fast and easy way of handling data, it will also have the novelty of being good practice.
大多数SQL方言都有合并数据的概念,而这恰好是MySQL处理它的方式。这意味着不仅插入忽略是一种快速简单的处理数据的方式,它还将具有良好实践的新奇性。
Your other problem is that RSS doesn't really help in any other shortcut. I really like @AaronMiller's suggestion, but the pubDate
element is optional, meaning that unless you have complete control over the RSS (and I would guess that you don't, based on the fact that you're worried about storing it), you won't be able to rely on it being there.
你的另一个问题是,RSS在任何其他快捷方式中都没有帮助。我真的很喜欢@AaronMiller的建议,但是pubDate元素是可选的,这意味着除非你完全控制了RSS(我猜你不会,基于你担心存储它的事实),你将无法依赖它存在。
For that matter, the only data which is guaranteed to be a part of an RSS
item is the description. There is no guarantee that at a future date the feed may change and drop the title or the link elements. If that is not a guarantee, then it might be a good idea to use INSERT IGNORE
and pair it with some sort of hash to boot.
对于这个问题,唯一保证成为RSS条目一部分的数据就是描述。无法保证在将来的日期中,提要可能更改并删除标题或链接元素。如果这不是一个保证,那么使用INSERT忽略并将其与某种散列进行配对可能是一个好主意。