MySQL DB的最佳更新方法

时间:2021-03-09 22:50:31

I have read through the solutions to similar problems, but they all seem to involve scripts and extra tools. I'm hoping my problem simple enough to avoid that.

我已经阅读了类似问题的解决方案,但它们似乎都涉及脚本和额外的工具。我希望我的问题足够简单,以避免这种情况。

So the user uploads a csv of next week's data. It gets inserted into the DB, no problem.

因此用户上传了下周数据的csv。它被插入到DB中,没问题。

BUT

an hour later he gets feedback from everyone, and must make updates accordingly. He updates the csv and goes to upload it to the DB.

一小时后,他得到了每个人的反馈,并且必须相应地进行更新。他更新了csv并将其上传到数据库。

Right now, the system I'm using checks to see if the data for that week is already there, and if it is, pulls all of that data from the DB, a script finds the differences and sends them out, and after all of this, the data the old data is deleted and replaced with the new data.

现在,我正在使用的系统检查该周的数据是否已经存在,如果是,则从数据库中提取所有数据,脚本找到差异并将其发送出去,然后这样,旧数据被删除并用新数据替换的数据。

Obviously, it is a lot easier to just wipe it clean and reenter the data, but not the best method, especially if there are lots of changes or tons of data. But I have to know WHAT changes have been made to send out alerts. But I don't want a transaction log, as the alerts only need to be sent out the one time and after that, the old data is useless.

显然,清理它并重新输入数据要容易得多,但不是最好的方法,特别是如果有大量的变化或大量的数据。但我必须知道发出警报的变化。但我不想要事务日志,因为警报只需要一次发送,之后,旧数据就没用了。

So!

Is there a smart way to compare the new data to the already existing data, get only the rows that are changed/deleted/added, and make those changes? Right now it seems like I could do an update, but then I won't get any response on what has changed...

有没有一种智能方法可以将新数据与现有数据进行比较,只获取已更改/删除/添加的行,并进行这些更改?现在好像我可以做一个更新,但后来我不会对变化的内容得到任何回应......

Thanks!

Quick Edit:

No foreign keys are currently in use. This will soon change, but it shouldn't make a difference, because the foreign keys will only point to who the data effects and thus won't need to be changed. As far as primary keys go, that does present a bit of a dilemma:

目前没有使用外键。这很快就会改变,但它不应该有所作为,因为外键只会指向数据效果的人,因此不需要更改。就主键而言,这确实存在一些困境:

The data in question is everyone's work schedule. So it would be nice (for specific applications of this schedule beyond simple output) for each shift to have a key. But the problem is, let's say that user1 was late on Monday. The tardiness is recorded in a separate table and is tied to the shift using the shift key. But if on Tuesday there is some need to make some changes to the week already in progress, my fear is that it will become too difficult to insure that all entries in the DB that have already happened (and thus may have associations that shouldn't be broken) will get re-keyed in the process. Unfortunately, it is not as simple as only updating all events occurring AFTER the current time, as this would add work (and thus make it less marketable) to the people who do the uploading. Basically, they make the schedule on one program, export it to a CSV, and then upload it on a web page for all of the webapps that need that data. So it is simply much easier for them (and less stressful for everyone involved) to do the same routine every time of exporting the entire week and uploading it.

有问题的数据是每个人的工作时间表。因此,对于每个班次来说,拥有密钥会很好(对于超出简单输出的此计划的特定应用)。但问题是,让我们说用户1周一晚了。延迟记录在一个单独的表格中,并使用shift键与班次相关联。但是如果周二有一些需要对已在进行中的一周进行一些更改,我担心的是确保已经发生的数据库中的所有条目都变得太难了(因此可能会有不应该有的关联)将被打破)将在此过程中重新键入。不幸的是,它并不像仅更新当前时间之后发生的所有事件一样简单,因为这会增加上传人员的工作量(从而使其不太适合销售)。基本上,他们在一个程序上制定计划,将其导出为CSV,然后将其上载到需要该数据的所有Web应用程序的网页上。因此,每次导出整个星期并上传它时,对他们来说(对每个参与者来说压力较小)都会更加容易。

So my biggest concern is to make the upload script as smart as possible on both ends. It doesn't get bloated trying to find the changes, it can find the changes no matter the input AND none of the data that is unchanged risks getting re-keyed.

所以我最关心的是让两端的上传脚本尽可能智能化。尝试查找更改时不会变得臃肿,无论输入如何都可以找到更改。没有任何未更改的数据可能会重新键入。

Here's a related question:

这是一个相关的问题:

Suppose Joe User was schedule to wash dishes from 7:00 PM to 8:00 PM, but the new
data has him working 6:45 PM to 8:30 PM.  Has the shift been changed? Or has the old
one been deleted and a new one added?

And another one:

还有一个:

Say Jane was schedule to work 1:00 PM to 3:00 PM, but now everyone has a mandatory
staff meeting at 2:00 to 3:00. Has she lost one shift and gained two? Or has one
shift changed and she gained one?

I'm really interested in knowing how this kind of data is typically handled/approached, more than specific answers to the above.

我真的很想知道这种数据通常是如何处理/接近的,而不仅仅是上面的具体答案。

Again, thank you.

再次谢谢你。

2 个解决方案

#1


Right now, the system I'm using checks to see if the data for that week is already there, and if it is, pulls all of that data from the DB, a script finds the differences and sends them out, and after all of this, the data the old data is deleted and replaced with the new data.

现在,我正在使用的系统检查该周的数据是否已经存在,如果是,则从数据库中提取所有数据,脚本找到差异并将其发送出去,然后这样,旧数据被删除并用新数据替换的数据。

So your script knows the differences, right? And you don't want to use some extra extra tools, apart from your script and MySQL, right?

所以你的脚本知道差异,对吗?除了脚本和MySQL之外,你不想使用一些额外的工具,对吗?

I'm quite convinced that MySQL doesn't offer any 'diff' tool by itself, so the best you can achieve is making new CSV file for updates only. I mean - it should contain only changed rows. Updating would be quicker, and all changed data would be easily available.

我相信MySQL本身并不提供任何“差异”工具,因此您可以实现的最佳目标是仅为更新制作新的CSV文件。我的意思是 - 它应该只包含已更改的行。更新会更快,所有更改的数据都可以轻松获得。

#2


If you have a unique key on one of the fields, you can use:

如果您在其中一个字段上有唯一键,则可以使用:

LOAD DATA LOCAL INFILE '/path/to/data.csv' REPLACE INTO TABLE table_name

#1


Right now, the system I'm using checks to see if the data for that week is already there, and if it is, pulls all of that data from the DB, a script finds the differences and sends them out, and after all of this, the data the old data is deleted and replaced with the new data.

现在,我正在使用的系统检查该周的数据是否已经存在,如果是,则从数据库中提取所有数据,脚本找到差异并将其发送出去,然后这样,旧数据被删除并用新数据替换的数据。

So your script knows the differences, right? And you don't want to use some extra extra tools, apart from your script and MySQL, right?

所以你的脚本知道差异,对吗?除了脚本和MySQL之外,你不想使用一些额外的工具,对吗?

I'm quite convinced that MySQL doesn't offer any 'diff' tool by itself, so the best you can achieve is making new CSV file for updates only. I mean - it should contain only changed rows. Updating would be quicker, and all changed data would be easily available.

我相信MySQL本身并不提供任何“差异”工具,因此您可以实现的最佳目标是仅为更新制作新的CSV文件。我的意思是 - 它应该只包含已更改的行。更新会更快,所有更改的数据都可以轻松获得。

#2


If you have a unique key on one of the fields, you can use:

如果您在其中一个字段上有唯一键,则可以使用:

LOAD DATA LOCAL INFILE '/path/to/data.csv' REPLACE INTO TABLE table_name