Disclaimer: first time I've used DBI.
免责声明:我第一次使用DBI。
I have a MySQL table with a lot of indexed fields (f1, f2, f3, etc) that are used to generate WHERE clauses by long-running processes that iterate over chunks of the database performing various cleaning and testing operations.
我有一个MySQL表,其中包含许多索引字段(f1,f2,f3等),这些字段用于通过长时间运行的进程生成WHERE子句,这些进程遍历执行各种清理和测试操作的数据库块。
The current version of this code works something like this:
此代码的当前版本的工作方式如下:
sub get_list_of_ids() {
my ($value1, $value2, $value3...) = @_;
my $stmt = 'SELECT * FROM files WHERE 1';
my @args;
if (defined($value1)) {
$stmt .= ' AND f1 = ?';
push(@args, $value1);
}
# Repeat for all the different fields and values
my $select_sth = $dbh->prepare($stmt) or die $dbh->errstr;
$select_sth->execute(@args) or die $select_sth->errstr;
my @result;
while (my $array = $select_sth->fetch) {
push(@result, $$array[0]);
}
return \@result;
}
sub function_A() {
my ($value1, $value2, $value3...) = @_;
my $id_aref = get_list_of_ids($value1, $value2, $value3...);
foreach my $id (@$id_aref) {
# Do something with $id
# And something else with $id
}
}
sub function_B() {
my ($value1, $value2, $value3...) = @_;
my $id_aref = get_list_of_ids($value1, $value2, $value3...);
foreach my $id (@$id_aref) {
# Do something different with $id
# Maybe even delete the row
}
}
Anyway, I'm about to dump an awful lot more rows in the database, and am well aware that the code above wont scale up. I can think of several ways to fix it based on other languages. What is the best way to handle it in Perl?
无论如何,我将要在数据库中转储更多的行,并且我很清楚上面的代码不会扩展。我可以想到几种基于其他语言修复它的方法。在Perl中处理它的最佳方法是什么?
Key points to note are that the logic in get_list_of_ids()
is too long to replicate in each function; and that the operations on the selected rows are very varied.
要注意的要点是get_list_of_ids()中的逻辑太长而无法在每个函数中复制;并且所选行的操作变化很大。
Thanks in advance.
提前致谢。
1 个解决方案
#1
6
I presume by "scale up" you mean in maintenance terms rather than performance.
我认为“按比例增加”你的意思是维护术语而不是表现。
The key change to your code is to pass in your arguments as column/value pairs rather than a list of values with an assumed set of columns. This will allow your code to handle any new columns you might add.
对代码的关键更改是将参数作为列/值对传递,而不是使用假定的列集传递值。这将允许您的代码处理您可能添加的任何新列。
DBI->selectcol_arrayref
is both convenient and a bit faster, being written in C.
DBI-> selectcol_arrayref既方便又快一点,用C语言编写。
If you turn on RaiseError
in your connect
call, DBI will throw an exception on errors rather than having to write or die ...
all the time. You should do that.
如果在连接调用中打开RaiseError,DBI将在错误上抛出异常,而不是一直写或死。你应该这样做。
Finally, since we're writing SQL from possibly untrusted user input, I've taken care to escape the column name.
最后,因为我们从可能不受信任的用户输入编写SQL,所以我注意要转义列名。
The rest is explained in this Etherpad, you can watch your code be transformed step by step.
其余部分在此Etherpad中进行了解释,您可以逐步观察您的代码。
sub get_ids {
my %search = @_;
my $sql = 'SELECT id FROM files';
if( keys %search ) {
$sql .= " WHERE ";
$sql .= join " AND ", map { "$_ = ?" }
map { $dbh->quote_identifier($_) }
keys %search;
}
return $dbh->selectcol_arrayref($sql, undef, values %search);
}
my $ids = get_ids( foo => 42, bar => 23 );
If you expect get_ids
to return a huge list, too much to keep in memory, then instead of pulling out the whole array and storing it in memory you can return the statement handle and iterate with that.
如果你希望get_ids返回一个巨大的列表,而不是保留在内存中,那么你可以返回语句句柄并迭代它,而不是拉出整个数组并将其存储在内存中。
sub get_ids {
my %search = @_;
my $sql = 'SELECT id FROM files';
if( keys %search ) {
$sql .= " WHERE ";
$sql .= join " AND ", map { "$_ = ?" }
map { $dbh->quote_identifier($_) }
keys %search;
}
my $sth = $dbh->prepare($sql);
$sth->execute(values %search);
return $sth;
}
my $sth = get_ids( foo => 42, bar => 23 );
while( my $id = $sth->fetch ) {
...
}
You can combine both approaches by returning a list of IDs in array context, or a statement handle in scalar.
您可以通过在数组上下文中返回ID列表或在标量中返回语句句柄来组合这两种方法。
sub get_ids {
my %search = @_;
my $sql = 'SELECT id FROM files';
if( keys %search ) {
$sql .= " WHERE ";
$sql .= join " AND ", map { "$_ = ?" }
map { $dbh->quote_identifier($_) }
keys %search;
}
# Convenient for small lists.
if( wantarray ) {
my $ids = $dbh->selectcol_arrayref($sql, undef, values %search);
return @$ids;
}
# Efficient for large ones.
else {
my $sth = $dbh->prepare($sql);
$sth->execute(values %search);
return $sth;
}
}
my $sth = get_ids( foo => 42, bar => 23 );
while( my $id = $sth->fetch ) {
...
}
my @ids = get_ids( baz => 99 );
Eventually you will want to stop hand coding SQL and use an Object Relation Mapper (ORM) such as DBIx::Class. One of the major advantages of an ORM is it is very flexible and can do the above for you. DBIx::Class can return a simple list of results, or very powerful iterator. The iterator is lazy, it will not perform the query until you start fetching rows, allowing you to change the query as needed without having to complicate your fetch routine.
最终,您将需要停止手动编码SQL并使用对象关系映射器(ORM),如DBIx :: Class。 ORM的一个主要优点是它非常灵活,可以为您完成上述任务。 DBIx :: Class可以返回一个简单的结果列表,或者非常强大的迭代器。迭代器是惰性的,在您开始获取行之前它不会执行查询,允许您根据需要更改查询,而不必使您的获取例程复杂化。
my $ids = get_ids( foo => 23, bar => 42 );
$ids->rows(20)->all; # equivalent to adding LIMIT 20
#1
6
I presume by "scale up" you mean in maintenance terms rather than performance.
我认为“按比例增加”你的意思是维护术语而不是表现。
The key change to your code is to pass in your arguments as column/value pairs rather than a list of values with an assumed set of columns. This will allow your code to handle any new columns you might add.
对代码的关键更改是将参数作为列/值对传递,而不是使用假定的列集传递值。这将允许您的代码处理您可能添加的任何新列。
DBI->selectcol_arrayref
is both convenient and a bit faster, being written in C.
DBI-> selectcol_arrayref既方便又快一点,用C语言编写。
If you turn on RaiseError
in your connect
call, DBI will throw an exception on errors rather than having to write or die ...
all the time. You should do that.
如果在连接调用中打开RaiseError,DBI将在错误上抛出异常,而不是一直写或死。你应该这样做。
Finally, since we're writing SQL from possibly untrusted user input, I've taken care to escape the column name.
最后,因为我们从可能不受信任的用户输入编写SQL,所以我注意要转义列名。
The rest is explained in this Etherpad, you can watch your code be transformed step by step.
其余部分在此Etherpad中进行了解释,您可以逐步观察您的代码。
sub get_ids {
my %search = @_;
my $sql = 'SELECT id FROM files';
if( keys %search ) {
$sql .= " WHERE ";
$sql .= join " AND ", map { "$_ = ?" }
map { $dbh->quote_identifier($_) }
keys %search;
}
return $dbh->selectcol_arrayref($sql, undef, values %search);
}
my $ids = get_ids( foo => 42, bar => 23 );
If you expect get_ids
to return a huge list, too much to keep in memory, then instead of pulling out the whole array and storing it in memory you can return the statement handle and iterate with that.
如果你希望get_ids返回一个巨大的列表,而不是保留在内存中,那么你可以返回语句句柄并迭代它,而不是拉出整个数组并将其存储在内存中。
sub get_ids {
my %search = @_;
my $sql = 'SELECT id FROM files';
if( keys %search ) {
$sql .= " WHERE ";
$sql .= join " AND ", map { "$_ = ?" }
map { $dbh->quote_identifier($_) }
keys %search;
}
my $sth = $dbh->prepare($sql);
$sth->execute(values %search);
return $sth;
}
my $sth = get_ids( foo => 42, bar => 23 );
while( my $id = $sth->fetch ) {
...
}
You can combine both approaches by returning a list of IDs in array context, or a statement handle in scalar.
您可以通过在数组上下文中返回ID列表或在标量中返回语句句柄来组合这两种方法。
sub get_ids {
my %search = @_;
my $sql = 'SELECT id FROM files';
if( keys %search ) {
$sql .= " WHERE ";
$sql .= join " AND ", map { "$_ = ?" }
map { $dbh->quote_identifier($_) }
keys %search;
}
# Convenient for small lists.
if( wantarray ) {
my $ids = $dbh->selectcol_arrayref($sql, undef, values %search);
return @$ids;
}
# Efficient for large ones.
else {
my $sth = $dbh->prepare($sql);
$sth->execute(values %search);
return $sth;
}
}
my $sth = get_ids( foo => 42, bar => 23 );
while( my $id = $sth->fetch ) {
...
}
my @ids = get_ids( baz => 99 );
Eventually you will want to stop hand coding SQL and use an Object Relation Mapper (ORM) such as DBIx::Class. One of the major advantages of an ORM is it is very flexible and can do the above for you. DBIx::Class can return a simple list of results, or very powerful iterator. The iterator is lazy, it will not perform the query until you start fetching rows, allowing you to change the query as needed without having to complicate your fetch routine.
最终,您将需要停止手动编码SQL并使用对象关系映射器(ORM),如DBIx :: Class。 ORM的一个主要优点是它非常灵活,可以为您完成上述任务。 DBIx :: Class可以返回一个简单的结果列表,或者非常强大的迭代器。迭代器是惰性的,在您开始获取行之前它不会执行查询,允许您根据需要更改查询,而不必使您的获取例程复杂化。
my $ids = get_ids( foo => 23, bar => 42 );
$ids->rows(20)->all; # equivalent to adding LIMIT 20