Incident Summary
On December 9th, we identified an issue where a bulk enrollment operation of big AD groups triggered the User Service and caused performance degradation for other services hosted on the same App Service Plan.
The root cause was an error in our code that allowed the User Service to operate inefficiently under certain conditions. We identified the issue swiftly and rolled out a fix the same day to all regions to prevent further occurrences.
Impact
Performance Degradation Periods:
Resolution
Immediate Fix:
Monitoring:
Mitigation Steps
To reduce the likelihood of similar incidents in the future, we are taking the following steps:
Improved QA Protocols:
Enhance testing for high-load scenarios, particularly for bulk operations.
Monitoring System Enhancements:
Implement additional resource usage alerts to detect and isolate high-impact operations earlier.
We sincerely apologize for the inconvenience caused to our users, particularly in the UK region. We deeply regret any disruption this may have caused and are committed to learning from this incident to serve you better in the future.