Magento Open Source is slow and CPU usage high? This is the reason

Issue overview

We have noticed that scheduled tasks executed by cron are slower and slower on some of our servers and on our local environments with Magento Open Source projects. After our investigation we have found cause of it inside of cron_schedule table. Some of schedules stays in "running" status, those schedules were blocking execution of other pending schedules with same job_code, so they won't be run, they won't be marked as missed and number of these schedules will permanently grow which lead cron:run to work slower and slower. We have also found that there is more Magento users with growing cron_cron schedule table, and there is already created issue on magento github: https://github.com/magento/magento2/issues/11002. We hope that this issue will be fixed in one of next Magento Open Source update, but for this moment we are taking care about it by ourselves.

Lets take a look on some of rows in cron_schedule table:

| 112044 | sales_send_order_creditmemo_emails | running | NULL | 2018-01-27 11:23:02 | 2018-01-27 11:28:00
| 112045 | sales_send_order_creditmemo_emails | pending | NULL | 2018-01-27 11:23:02 | 2018-01-27 11:29:00
| 112046 | sales_send_order_creditmemo_emails | pending | NULL | 2018-01-27 11:23:02 | 2018-01-27 11:30:00
| 112047 | sales_send_order_creditmemo_emails | pending | NULL | 2018-01-27 11:23:02 | 2018-01-27 11:31:00
| 112048 | sales_send_order_creditmemo_emails | pending | NULL | 2018-01-27 11:23:02 | 2018-01-27 11:32:00
| 112049 | sales_send_order_creditmemo_emails | pending | NULL | 2018-01-27 11:23:02 | 2018-01-27 11:33:00
| 112050 | sales_send_order_creditmemo_emails | pending | NULL | 2018-01-27 11:23:02 | 2018-01-27 11:34:00
| 112051 | sales_send_order_creditmemo_emails | pending | NULL | 2018-01-27 11:23:02 | 2018-01-27 11:35:00

Reason

Shedules stays in "running" status when schedule job has not been finished. It can happen that schedule will never end, for example:

something has gone wrong on the website's server
critical application error occured
server or virtual machine has been terminated during schedule execution

In Magento Open Source, some of schedule jobs are executing every 1 minute, so if you are stopping or restarting your server, its very likely that one of schedule jobs are running in same time, and it will stays in "running" status forever.

Now, when we know why schedules can stays in "running" status, lets check why cron_schedule table is growing.

We can find logic of schedule executions in class "Magento\Cron\Observer\ProcessCronQueueObserver", lets look what execute() method does:

collect all pending schedules (in status "pending")
clean old schedules - scheudles in status "success", "missed" or "error" are removed if they are old (depends of schedule lifetimes configuration)
generate new schedules for future executions (with "pending" status)
iterate pending schedules, and try to run their jobs (look at below part of code)

try {
   if ($schedule->tryLockJob()) {
         $this->_runJob($scheduledTime, $currentTime, $jobConfig, $schedule, $groupId);
   }
} catch (\Exception $e) {
   ...
}

Method _runJob(), is running schedule or is changing schedule status to "missed" if its too late to run, but this method is called only if tryLockJob is true.
Method tryLockJob() returns true only if there are no schedules in "running" status with same job_code as $schedule.

Example: We did restart our server. Lets say that we have interrupted execution of "sales_send_order_emails" schedule, it has not been finished and stays in "running" status.
After server restart:

our "running" schedule wont be never deleted because it has "running" status
next "sales_send_order_emails" schedules will be generated for future executions (with "pending" status)
method tryLockJob() will be false for all of "sales_send_order_emails" schedules
method _runJob() wont be called for any of "sales_send_order_emails" schedule, so they wont be executed and wont be set as "missed".

As You can see on this example, schedules with job_code "sales_send_order_emails" will stay in "pending" status forever, but new schedules will be still generated, so table cron_schedule will grow and grow, and cron will be slower and slower.

Solutions

You can check manually or automatically table cron_schedule in Your project databases, and remove or change statuses of old "running" schedules it they exists.

You can also use our plugin. It sets status "error" for all of "running" schedules which were not finished within 3 hours (its enough for our cases).

You can find this solution on github: https://github.com/Alekseon/CleanRunningJobs

If you want to apply it by yourself, below You can find code of plugin. Feel free to use it.

Plugin declaration:

    <type name="Magento\Cron\Observer\ProcessCronQueueObserver">
        <plugin name="FixOldRunningSchedulesPlugin" type="Alekseon\CronFix\Plugin\FixOldRunningSchedulesPlugin" sortOrder="20" />
    </type>

Plugin code:

<?php
/**
 * Copyright © Alekseon sp. z o.o.
 * http://www.alekseon.com/
 */
namespace Alekseon\CronFix\Plugin;

use \Magento\Cron\Observer\ProcessCronQueueObserver;
f
/**
 * @SuppressWarnings(PHPMD.CouplingBetweenObjects)
 */
class FixOldRunningSchedulesPlugin
{
    /**
     * @var \Magento\Cron\Model\ScheduleFactory
     */
    private $scheduleFactory;
    /**
     * @var \Magento\Framework\Stdlib\DateTime\DateTime
     */
    private $dateTime;

    /**
     * FixOldRunningSchedules constructor.
     * @param \Magento\Cron\Model\ScheduleFactory $scheduleFactory
     * @param \Magento\Framework\Stdlib\DateTime\DateTime $dateTime
     */
    public function __construct(
        \Magento\Cron\Model\ScheduleFactory $scheduleFactory,
        \Magento\Framework\Stdlib\DateTime\DateTime $dateTime
    ) {
        $this->dateTime = $dateTime;
        $this->scheduleFactory = $scheduleFactory;
    }

    /**
     * @param $subject
     * @param \Magento\Framework\Event\Observer $observer
     * @return array
     */
    public function beforeExecute($subject, \Magento\Framework\Event\Observer $observer)
    {
        $runningLifetimeInMinutes = 180;

        $runningSchedules = $this->scheduleFactory->create()->getCollection()->addFieldToFilter(
            'status',
            \Magento\Cron\Model\Schedule::STATUS_RUNNING
        );

        $runningTimeLimit = $this->dateTime->gmtTimestamp() - $runningLifetimeInMinutes * ProcessCronQueueObserver::SECONDS_IN_MINUTE;
        foreach($runningSchedules as $schedule) {
            if (strtotime($schedule->getExecutedAt()) < $runningTimeLimit) {
                $schedule->setMessages(__('Schedule not finished after %1 minutes.', $runningLifetimeInMinutes));
                $schedule->setStatus(\Magento\Cron\Model\Schedule::STATUS_ERROR);
                $schedule->save();
            }
        }

        return [$observer];
    }
}