Retrieve large batches of rows from mysql
Posted on 2014-10-28 10:27:06
by Geert Vandeweyer
The following snippits are some approaches for rapid high performance fetching of vast amounts of rows from a mysql database. All of them should reduce the load on the mysql server and increase the performance of your scripts. I've gathered them from various places, including perl monks, so credit goes there.
Fetching batches of rows as arrayref.
If you need to process many rows, possible beyond the RAM capacity, fetch the rows in batches, with a maximum number of rows per batch.
# define query
my $query = "SELECT colA, colB, colC FROM `Big_Table`";
# prepare the query (dbh exists already)
my $sth = $dbh->prepare($query);
# see how many rows there are, and set the batch size appropriately.
my $sth = $dbh->prepare($query);
$rows = $sth->execute();
if ($rows > 10000) {
$max = 10000;
}
else {
$max = $rows;
}
my $rowcache;
# fetch the results and loop them.
# result rows are taken from the rowcache, until it's empty, then a new batch is fetched.
while (my $row = shift(@$rowcache) || shift( @{$rowcache = $sth->fetchall_arrayref(undef,$max)|| []})) {
# do something with the result
my $colA = $row->[0];
}
$sth->finish();
Fetching array ref for multiple variants of the same query.
In case you need to run the same query multiple times with different values for the parameters, it might be efficient to use the selectall_arrayref statement with a prepared statment. Here, the result for each query must fit in memory !
# create a query with placeholders.
my $query = "SELECT id, value FROM `table` WHERE attribute = ? AND SecondValue = ?";
my $sth = $dbh->prepare($query);
# in case you want to put all results in one aref
my $AllResults;
my @SecondValues = qw/a b c d/;
# run query multiple times:
for (my $i = 1; $i <= 5; $i++) {
foreach(@SecondValues) {
my $subref = $dbh->selectall_arrayref($sth,undef,($i,$_);
# add to general result
push(@$AllResults,@$subref);
# or process
foreach(@$subref) {
my $id = $_->[0];
my $v = $_->[1];
}
}
}
Fetching results into hashref
In some cass it's more convenient to have the results returned as a array ref of hash references. This allows to access the results in each result row by the column names. This is a bit slower and requires more memory.
# define query
my $query = "SELECT colA, colB, colC FROM `Big_Table`";
# prepare the query (dbh exists already)
my $sth = $dbh->prepare($query);
# see how many rows there are, and set the batch size appropriately.
my $sth = $dbh->prepare($query);
$rows = $sth->execute();
if ($rows > 10000) {
$max = 10000;
}
else {
$max = $rows;
}
my $rowcache;
# fetch the results and loop them.
# result rows are taken from the rowcache, until it's empty, then a new batch is fetched.
# mind the {} in the statement, indicating to return as hashref
while (my $row = shift(@$rowcache) || shift( @{$rowcache = $sth->fetchall_arrayref({},$max)|| {}})) {
# do something with the result
my $colA = $row->{'colA'};
}
$sth->finish();
Batch, mysql, Perl
Comments
Loading Comments