Potential Performance Issues
Incident Report for ZENSAI
Postmortem

Incident Summary
On December 9th, we identified an issue where a bulk enrollment operation of big AD groups triggered the User Service and caused performance degradation for other services hosted on the same App Service Plan.

 

The root cause was an error in our code that allowed the User Service to operate inefficiently under certain conditions. We identified the issue swiftly and rolled out a fix the same day to all regions to prevent further occurrences.

 

Impact

  • Region Affected: UK
  • Performance Degradation Periods:

    • The most significant impact occurred on Monday between 10:59 and 11:26 UTC, during which the system was still accessible but experienced higher-than-normal loading times.

Resolution

  • Immediate Fix:

    • Updated the User Service code to handle enrollment operations more efficiently.
    • Deployed the fix to all regions to mitigate potential impact elsewhere.
  • Monitoring:

    • Our existing monitoring systems quickly pinpointed the source of the problem, enabling us to act swiftly.

Mitigation Steps
To reduce the likelihood of similar incidents in the future, we are taking the following steps:

Improved QA Protocols:

Enhance testing for high-load scenarios, particularly for bulk operations.

Monitoring System Enhancements:

Implement additional resource usage alerts to detect and isolate high-impact operations earlier.

We sincerely apologize for the inconvenience caused to our users, particularly in the UK region. We deeply regret any disruption this may have caused and are committed to learning from this incident to serve you better in the future.

Posted Dec 11, 2024 - 14:38 CET

Resolved
This incident has been resolved.
Posted Dec 10, 2024 - 17:20 CET
Monitoring
We have implemented a fix for the issue and are actively monitoring the situation.

We sincerely apologize for any inconvenience caused and remain committed to delivering reliable services.
Posted Dec 10, 2024 - 10:42 CET
Investigating
We are aware of a potential issue that could affect customers in the UK region. We are actively analyzing the situation and will provide a status update as our investigation continues.
Posted Dec 09, 2024 - 16:55 CET
This incident affected: Learn365 United Kingdom.