PROD POST / PUT Operations are Timing out
Resolved
Jun 10, 2026 at 3:04pm UTC
Addressing Recent API Reliability Issues
Status: Resolved.
API traffic has returned to normal. Over the past ~20 hours, we have been monitoring and believe the issue is fully resolved.
What happened
Beginning late Friday, 06-05, a small number of API operations (POST, PUT, and DELETE) began intermittently timing out. Through early Monday afternoon, 06-08, these timeouts affected only a very small percentage of total API traffic and remained below all of our configured alert thresholds.
By early afternoon on Monday, 06-08, the issue escalated to the point where all POST, PUT, and DELETE operations began timing out.
We traced the problem to our backend Azure API Management (APIM) instance, which routes traffic between the PBD API and our Warehouse Management System (WMS) API.
As a result, calls between PBD and the WMS were failing. Create and update operations could not reach the WMS, and our data sync processes were all failing.
Root cause
Our investigation found that the Azure-managed SSL certificate for the backend APIM instance had expired and failed to auto-renew.
The underlying cause was an overly restrictive rule on the Network Security Group (NSG) applied to the subnet hosting APIM.
Per Azure's documentation, outbound traffic on port 80 must be permitted for certificate auto-renewal to occur. That traffic was being blocked by the NSG applied to the APIM subnet.
Resolution
We added the required rule to the NSG and reloaded the APIM network configuration to apply the change, after which traffic resumed normally.
Several hours later, however, the intermittent timeouts returned. This culminated in a second partial outage on Tuesday, 06-09. Working with Azure support, we identified a secondary rule that was overriding our new rule. We removed it, re-applied the NSG configuration, and reloaded the APIM network, restoring normal traffic.
Intermittent timeouts then tapered off over the following four to five hours before stopping entirely. We have been assured that this tapering is expected as network configuration changes fully propagate.
Current status
Over the past ~20 hours, we have observed no further timeouts, and the issue appears to be fully resolved.
Going Forward
To catch issues like this sooner, PBD has updated our alerting criteria to trigger on any timed-out call, rather than only when broader thresholds are crossed.
Affected services
Updated
Jun 9, 2026 at 6:09pm UTC
This has been resolved. We will continue to monitor.
Affected services
Updated
Jun 9, 2026 at 5:49pm UTC
POST / PUT Timing Out again.
Problem appears to be intermittent. Fix underway.
Affected services
Updated
Jun 8, 2026 at 6:19pm UTC
New network policy is in place and services have returned to normal.
Affected services
Updated
Jun 8, 2026 at 5:59pm UTC
Problem has been identified and solution is in progress.
Problem: our back-end API services rely on Azure's APIM. The SSL cert built-in to Azure's APIM expired and was unable to auto-renew due to an overly restrictive Network Security Group Policy.
The new network configuration is being applied now.
Affected services
Created
Jun 8, 2026 at 5:50pm UTC
All create / update operations are currently timing out. We are investigating.
GET requests appear to be unaffected
Affected services