Schedules not running across projects
Incident Report for Dataform
Postmortem

We rolled out a change that broke scheduling. This was not properly tested in our staging environment due to a mis-configuration of the staging environment, hence the faulty roll-out.

We identified the issue at 10:47, 22 minutes after it started.
We pushed a fix for the issue at 11:05.
The fix was applied to production around 11:25.

Detection of the issue and rollback of the issue could have been faster, and we have action items below to address those issues.

Follow up actions items:

  1. Fix the issue with our staging environment not running the pre-production release changes.
  2. Change our alerting configuration to reduce the time it would take for us to get paged from 22 minutes to 10.
  3. Make some changes to our roll-back procedure to reduce the time it takes to fully roll-out a roll-back to production.
Posted Nov 25, 2019 - 17:22 UTC

Resolved
This incident has been resolved.
Posted Nov 25, 2019 - 17:17 UTC
Monitoring
Schedules stopped running at 10:25AM with a recent release. This has now been rolled back and appears to have been resolved. We will continue to monitor.
Posted Nov 25, 2019 - 11:56 UTC
This incident affected: Dataform Web.