We have noticed that scheduled tasks executed by cron are slower and slower on some of our servers and on our local environments with Magento 2 projects. After our investigation we have found cause of it inside of cron_schedule table. Some of schedules stays in "running" status, those schedules were blocking execution of other pending schedules with same job_code, so they won't be run, they won't be marked as missed and number of these schedules will permanently grow which lead cron:run to work slower and slower. We have also found that there is more Magento users with growing cron_cron schedule table, and there is already created issue on magento github: https://github.com/magento/magento2/issues/11002. We hope that this issue will be fixed in one of next Magento 2 update, but for this moment we are taking care about it by ourselves.
Lets take a look on some of rows in cron_schedule table:
Shedules stays in "running" status when schedule job has not been finished. It can happen that schedule will never end, for example:
- something has gone wrong on the website's server
- critical application error occured
- server or virtual machine has been terminated during schedule execution
In Magento 2, some of schedule jobs are executing every 1 minute, so if you are stopping or restarting your server, its very likely that one of schedule jobs are running in same time, and it will stays in "running" status forever.
Now, when we know why schedules can stays in "running" status, lets check why cron_schedule table is growing.
We can find logic of schedule executions in class "Magento\Cron\Observer\ProcessCronQueueObserver", lets look what execute() method does:
- collect all pending schedules (in status "pending")
- clean old schedules - scheudles in status "success", "missed" or "error" are removed if they are old (depends of schedule lifetimes configuration)
- generate new schedules for future executions (with "pending" status)
- iterate pending schedules, and try to run their jobs (look at below part of code)
Method _runJob(), is running schedule or is changing schedule status to "missed" if its too late to run, but this method is called only if tryLockJob is true.
Method tryLockJob() returns true only if there are no schedules in "running" status with same job_code as $schedule.
Example: We did restart our server. Lets say that we have interrupted execution of "sales_send_order_emails" schedule, it has not been finished and stays in "running" status.
After server restart:
- our "running" schedule wont be never deleted because it has "running" status
- next "sales_send_order_emails" schedules will be generated for future executions (with "pending" status)
- method tryLockJob() will be false for all of "sales_send_order_emails" schedules
- method _runJob() wont be called for any of "sales_send_order_emails" schedule, so they wont be executed and wont be set as "missed".
As You can see on this example, schedules with job_code "sales_send_order_emails" will stay in "pending" status forever, but new schedules will be still generated, so table cron_schedule will grow and grow, and cron will be slower and slower.
You can check manually or automatically table cron_schedule in Your project databases, and remove or change statuses of old "running" schedules it they exists.
You can also use our plugin. It sets status "error" for all of "running" schedules which were not finished within 3 hours (its enough for our cases).
You can find this solution on github: https://github.com/Alekseon/CleanRunningJobs
If you want to apply it by yourself, below You can find code of plugin. Feel free to use it.