Increase in number of runs failing due to timeouts
Incident Report for Dataform
Resolved
This incident has been resolved.
Posted Oct 23, 2019 - 18:48 UTC
Update
Correction: this issue has been around for longer than we initially realised, first occurrences were on the 2019-10-15 at around 5PM GMT.
Posted Oct 23, 2019 - 16:54 UTC
Monitoring
Since around 6PM GMT yesterday (2019-10-22) we experienced an increased number of run failures that manifested themselves as timeouts. These failed runs weren't actually timing out but had failed mid execution due to a crashing server.

A fix for the issue was pushed today (2019-1023) at around 1245PM GMT, to handle errors thrown from Postgres based connections in a more orderly way, which we are now monitoring.
Posted Oct 23, 2019 - 13:17 UTC
This incident affected: Dataform Web.