MySQL:如何更新50%的行,随机选择?

时间:2023-01-31 09:17:20

I want to update 50% of the rows in a table, randomly selected. Is there any way to do that?

我想更新表中随机选择的50%的行。有没有办法做到这一点?

Edit: Just to clarify that it should always update 50% of the records, but of those 50% the rows must be randomly selected (not only the top 50% for instance). In other words, in avarage, every other record should be updated.

编辑:只是为了澄清它应该总是更新50%的记录,但是在那些50%的行中必须随机选择(例如,不仅仅是前50%)。换句话说,在avarage中,应该更新每个其他记录。

4 个解决方案

#1


24  

Should work like that:

应该这样工作:

UPDATE table SET x = y WHERE RAND() < 0.5

Yep, tested it, works. But of course, it is only 50% of the rows on average, not exactly 50%.

是的,经过测试,有效。但当然,它平均只有50%的行,而不是50%。

As written in the SQL 92 specification, the WHERE clause must be executed for each tuple, so the rand() must be reevaluated yielding the intended result (instead of either selecting all or no rows at all).

正如SQL 92规范中所写,必须为每个元组执行WHERE子句,因此必须重新评估rand()以产生预期结果(而不是选择全部或根本不选择行)。

Excerpt from the specification (emphasis mine):

摘自规范(强调我的):

General Rules

一般规则

1) The <search condition> is applied to each row of T. The result of the <where clause> is a table of those rows of T for which the result of the <search condition> is true.

1) 应用于T的每一行。 的结果是T的那些行的表,其中 的结果为真。 子句>

2) Each <subquery> in the <search condition> is effectively executed for each row of T and the results used in the application of the <search condition> to the given row of T. If any executed <subquery> contains an outer reference to a column of T, then the reference is to the value of that column in the given row of T.

2) 中的每个 对T的每一行有效执行,并且 的应用结果用于T的给定行。如果任何已执行的 包含外部引用到T列,然后引用是T的给定行中该列的值。

#2


7  

As I said, that's a long way, described in a sort of pseudocode. )

正如我所说,这是一个很长的路,用一种伪代码描述。 )

$x = SELECT COUNT(*) FROM some_table;
@ids = SELECT id FROM some_table ORDER BY RAND() LIMIT $x / 2;
UPDATE some_table WHERE id IN (@ids);

#3


0  

UPDATE table SET volumnvalue = x WHERE RAND() <= 0.5 will result in very near to 50% of the records

UPDATE表SET volumnvalue = x WHERE RAND()<= 0.5将导致非常接近50%的记录

#4


0  

RAND should be random and you will not get a solid percentile split.

兰德应该是随机的,你不会得到一个坚实的百分位分裂。

It would be better to use the modulus operator % to find every X number of items. This does work best with unique id columns like a Primary Key.

最好使用模数运算符%来查找每个X个项目。这对于像主键这样的唯一ID列最有效。

Try running this query, be sure to specify your table name and id column name:

尝试运行此查询,请务必指定您的表名和id列名:

Selecting every 2nd row, divisible by 2 SELECT * from <your_table_name> where <id_column_name> %2=0

选择每第2行,可从 中除以2 SELECT *,其中 %2 = 0

Selecting every 6th row, divisible by 6 SELECT * from <your_table_name> where <id_column_name> %6=0

选择每隔6行,可从 中的6 SELECT *整除,其中 %6 = 0

Once you hare happy that the SELECT results look good, you can change the query with update syntax to update the records, using the same WHERE clause

一旦您对SELECT结果看起来很满意,您可以使用更新语法更改查询以使用相同的WHERE子句更新记录

#1


24  

Should work like that:

应该这样工作:

UPDATE table SET x = y WHERE RAND() < 0.5

Yep, tested it, works. But of course, it is only 50% of the rows on average, not exactly 50%.

是的,经过测试,有效。但当然,它平均只有50%的行,而不是50%。

As written in the SQL 92 specification, the WHERE clause must be executed for each tuple, so the rand() must be reevaluated yielding the intended result (instead of either selecting all or no rows at all).

正如SQL 92规范中所写,必须为每个元组执行WHERE子句,因此必须重新评估rand()以产生预期结果(而不是选择全部或根本不选择行)。

Excerpt from the specification (emphasis mine):

摘自规范(强调我的):

General Rules

一般规则

1) The <search condition> is applied to each row of T. The result of the <where clause> is a table of those rows of T for which the result of the <search condition> is true.

1) 应用于T的每一行。 的结果是T的那些行的表,其中 的结果为真。 子句>

2) Each <subquery> in the <search condition> is effectively executed for each row of T and the results used in the application of the <search condition> to the given row of T. If any executed <subquery> contains an outer reference to a column of T, then the reference is to the value of that column in the given row of T.

2) 中的每个 对T的每一行有效执行,并且 的应用结果用于T的给定行。如果任何已执行的 包含外部引用到T列,然后引用是T的给定行中该列的值。

#2


7  

As I said, that's a long way, described in a sort of pseudocode. )

正如我所说,这是一个很长的路,用一种伪代码描述。 )

$x = SELECT COUNT(*) FROM some_table;
@ids = SELECT id FROM some_table ORDER BY RAND() LIMIT $x / 2;
UPDATE some_table WHERE id IN (@ids);

#3


0  

UPDATE table SET volumnvalue = x WHERE RAND() <= 0.5 will result in very near to 50% of the records

UPDATE表SET volumnvalue = x WHERE RAND()<= 0.5将导致非常接近50%的记录

#4


0  

RAND should be random and you will not get a solid percentile split.

兰德应该是随机的,你不会得到一个坚实的百分位分裂。

It would be better to use the modulus operator % to find every X number of items. This does work best with unique id columns like a Primary Key.

最好使用模数运算符%来查找每个X个项目。这对于像主键这样的唯一ID列最有效。

Try running this query, be sure to specify your table name and id column name:

尝试运行此查询,请务必指定您的表名和id列名:

Selecting every 2nd row, divisible by 2 SELECT * from <your_table_name> where <id_column_name> %2=0

选择每第2行,可从 中除以2 SELECT *,其中 %2 = 0

Selecting every 6th row, divisible by 6 SELECT * from <your_table_name> where <id_column_name> %6=0

选择每隔6行,可从 中的6 SELECT *整除,其中 %6 = 0

Once you hare happy that the SELECT results look good, you can change the query with update syntax to update the records, using the same WHERE clause

一旦您对SELECT结果看起来很满意,您可以使用更新语法更改查询以使用相同的WHERE子句更新记录