Invalid notifications are being generated and sent
Incident Report for UpGuard CyberRisk
Postmortem

PIR Date: December 13th, 2022
Incident Date: December 7th, 2022
Incident Time: 03:10 UTC
Incident Number: INCI-024
Severity Level: 2 Critical (Single service affected, partial outage, multiple/all customers potentially affected)
Affected Services: UpGuard CyberRisk Notification Service
Outage Duration: 3Hours 25Minutes

Incident Summary

On Wednesday December 7th at 03:10 UTC UpGuard were first alerted to invalid notifications being sent for risks and vulnerabilities. On investigation, notifications were being sent for risks and vulnerabilities associated with orphaned domains and IP addresses.

Notifications were paused during the investigation of the incident, and pending invalid notifications were purged. A full analysis and diagnosis were completed, and a full fix deployed as of December 7th at 09:40 UTC. Analysis shows <5% of UpGuard customers experienced the invalid notification.

Fault

Some UpGuard customers were receiving multiple notifications relating to alerts for risks and vulnerabilities, the contents of these notifications didn't relate to their account or any domains or IPs owned by them. Customers with email notifications setup were receiving many hundreds of emails.

Detection

The UpGuard Support team worked through several support tickets from customers which then were escalated as an incident to the product team on December 7th at 05:42 UTC.

Impact

  1. Outage: UpGuard CyberRisk Notification Service was halted between 06:15 UTC and 09:40 UTC on December 7th 
  2. Notifications: Invalid notifications were being sent for some UpGuard customers for around 3hrs between 03:10 UTC and 06:15 UTC on December 7th. Analysis shows <5% of UpGuard customers experienced the invalid notification

Recovery

  1. After an initial investigation, the UpGuard CyberRisk Notification Service was halted at 06:15 UTC
  2. Further investigation found that the issue was a coding error, which was fixed, tested, and released into Production by 09:40 UTC
  3. All pending notifications that were invalid were purged from the system by 09:40 UTC
  4. The UpGuard CyberRisk Notification Service was restarted, and an analysis of the impacted customers was completed
  5. Customer notifications were sent out to effected customers on December 7th

Timeline

December 7th 2023
03:10 UTC - Initial customer ticket logged

04:52 UTC - Second customer ticket was logged

05:18 UTC - Third customer ticket was logged

05:42 UTC - Issues escalated to UpGuard product team from second customer ticket

06:15 UTC - UpGuard CyberRisk Notification Service shutdown by Product team during initial investigation

06:28 UTC - Product Incident Meeting Underway

07:04 UTC - Product Incident Meeting Continues

07:36 UTC - INITIAL Status Page Update - Notification of incident and advising that notifications have been halted 

07:37 UTC - Status Page Update - Notification that investigations are still underway

07:47 UTC - Status Page Update - Notification that investigations are still underway

07:56 UTC - Status Page Update - Notification that the issue has been identified and a fix being prepared, confirmation that notifications are still paused

07:58 UTC - Fix being prepared; Status Page added to the product intercom widget to display updates

09:13 UTC - Status Page Update - Notification of a fix being deployed

09:40 UTC - UpGuard CyberRisk Notification Service restarted

10:41 UTC - FINAL Update on Status Page posted

10:48 UTC - Analysis of impacted customers completed

Root Cause

Due to a coding error, certain fields in the production database were being set incorrectly.  These fields affected the UpGuard CyberRisk Notification Service.

Corrective Actions

  1. Purged any unread invalid notifications or pending notifications within the platform.
  2. Analyzed the invalid notifications that were sent by UpGuard CyberRisk on the morning of December 7th, defined a list of affected customers and individual users and contacted the customers affected with a description of the impact and incident.
  3. Implemented a Referential Integrity check on the Org to Vendor de normalized link, so that it cannot become invalid again. 
  4. Raised a ticket to investigate our ability to never read domains or IPs that are not linked to a vendor currently.
Posted Dec 21, 2022 - 03:28 UTC

Resolved
A fix has been released to Production. We monitored for any further or additional issues, and the issue is now resolved.
All invalid notifications have been purged from the platform, and the Notification process has been restarted.
Posted Dec 07, 2022 - 10:41 UTC
Update
A fix is being deployed for this issue.
Posted Dec 07, 2022 - 09:13 UTC
Identified
The issue has been identified and a fix is being prepared.
Notifications will remain halted until the fix is implemented and verified.
Posted Dec 07, 2022 - 07:56 UTC
Update
We are continuing to investigate this issue.
Posted Dec 07, 2022 - 07:47 UTC
Update
We are continuing to investigate this issue.
Posted Dec 07, 2022 - 07:37 UTC
Investigating
We are investigating an issue where many notifications are being generated.

These notifications are invalid, and do not contain any information from other customer accounts, as the IPs that are generating the notifications are not linked to any customer or vendor within our platform.

Notification generation has been halted at this point in time.
Posted Dec 07, 2022 - 07:36 UTC
This incident affected: UpGuard CyberRisk (Web App).