Performing batch data operations on cron run (Drupal 6)

Because when I called the import functionality directly from within my hook_cron(), weird things started to happen. I then realized that Drupal's batch API isn't supposed to be called from within cron. After posting a message to the Drupal development mailing list, my thoughts were confirmed there. Luckily my message received a lot of good replies with tips on how to properly perform batch data operations upon a cron run. You can read the whole thread on the maillist archive, or read on for my personal synopsis below.

Here is an overview of the possible options to be considered when trying to perform heavy operations that should be split up into chunks, to avoid server or script timeouts. Roughly 4 possible solutions to this problem were pointed out by the various people replying on this thread. If you have other suggestions, I'd love to hear about them in the comments!

Using the drupal_queue module

This is the solution I ended up using to solve my problem. The drupal_queue module is a direct backport of the Queue API that is available in Drupal 7. This is most definitely the best approach if you want to use a queueing system in a module that will be released to the community, as it will be trivial to update your code to Drupal 7 when the time comes. But also your own custom module might greatly benefit from using this standardized API functionality.

At the end of this blogpost I added some example code on how I integrated the drupal_queue module into my own custom module.

Using the job_queue module

The job_queue module is another Drupal queue module that provides similar functionalities as the drupal_queue module. The API is pretty straight forward to work against, but since the Queue API is now in Drupal 7 core, I assume it's safer to use the drupal_queue module, unless you'd need the job_queue module for specific reasons.

Using a command line (CLI) or drush script

I haven't really explored this option, since the website I was investigating this for, runs on a shared hosting account, and I don't have shell access.
One important remark with this approach is that, even though running your import script from the commandline on cron might avoid timeouts, you can still run into PHP memory limits. So as pointed out by Alex Barth, you'd ideally still want to combine this method with one of the others mentioned here, to process your operations in batches.

Writing your own queueing logic

A last option to process a lot of data would be to write your own script that splits the data to be processed into chunks, and keeps track of what has already been processed. 

Conclusion

I think it's fair to conclude that the drupal_queue module is probably the best module in most cases where you want to process a lot of data automatically upon cron run, since the batch API functionality was only intended to be used from within the UI. The fact that a queue API is now part of Drupal 7 will hopefully standardize queueing mechanisms across modules.

Example code for using the drupal_queue module:

<?php
/**
* Implementation of hook_cron_queue_info().
*/
function mymodule_cron_queue_info() {
 
$queue['myqueuename'] = array(
  
'worker callback' => 'mymodule_queue_worker',
  );
  return
$queue;
}

/**
* Implementation of hook_cron().
*/
function mymodule_cron() {
 
$items = mymodule_get_data_to_process(); // replace by your own retrieval function
 
$queue = drupal_queue_get('myqueuename');
 
$queue->createQueue();
  if (
$items) {
    foreach (
$items as $item) {
     
$queue->createItem($item);
    }
  }
}

/**
* Callback function for queue worker
*/
function mymodule_queue_worker($item) {
 
// perform operations on $item object here
}
?>

Finally you will have to copy the drupal_queue_cron.php file to your Drupal installation root, and make sure it is called by cron pretty frequently (I set it to be called every 5 minutes), otherwise the jobs won't be processed...

Comments

This was exactly the information I was looking for. I was running into a max memory issue and so I ended up using a queue and then extending the SystemQueue class to stop if it is going to hit the memory wall. We are then running the queue cron every minute using drush.

is this example code correct? Where does the $data come from in mymodule_cron() ?

I assume it is meant to be
if (count($items) > 0) {

or if $items is an array, then the if-statement is not needed at all anyway.

You're right about that :)
I've edited the original code now.

Thanks for this, great read and very useful! :)