Login outage

Incident Report for ChargeOver

Postmortem

Incident details

We received messages from internal & external users that they were not able to log into their instances. After investigation, we confirmed that the login sequence was taking an abnormal amount of time. This behavior was present on July 18th from 14:00 CST to 15:14 CST.

The impact of this incident was limited to only the login process for ChargeOver. This did not affect payment processing, hosted pages, or other automated processes.

Root cause

ChargeOver uses a third party metrics service, PostHog, to track user input to help improve the platform.

Requests to PostHog were determined to be the cause of the slow login sequence. We found a URL endpoint was no longer being serviced, updated the pointing of our load balancer, and rebooted any systems in order for the change to take affect.

Incident timeline

  • 14:00 CDT - Received notice of logins not working.
  • 14:11 CDT - Assembled a team to start investigating root causes and update our status page.
  • 14:44 CDT - Identified the issue with the third party metrics service. Determined that logins were working, but they were taking significantly longer than expected.
  • 15:07 CDT - Implemented fix for third party metrics service and deployed to user instances. Began monitoring for any further issues.
  • 15:14 CDT - Verified that logins were working as expected and began drawing up plans for future remediation.

Remediation plan

We understand that it is frustrating to lose access to your ChargeOver instance and we are very sorry that this has happened.

In the future, we will handle these requests so that they do not get in the way of normal use by separating the logic independently of the rest of ChargeOver. We will also be implementing an SOP to make sure that any services that may cause disruptions won't affect the app in this way.

Posted Jul 23, 2025 - 14:09 CDT

Resolved

ChargeOver had longer than expected wait times from 14:11 CDT to 15:07 CDT.

We're monitoring the solution to the root cause. A postmortem will follow.
Posted Jul 18, 2025 - 15:14 CDT

Monitoring

Our third party analytics service was not being responsive and we've implemented a fix to bypass this issue. Logins should resume as expected. We're monitoring and will update with information as it becomes available.
Posted Jul 18, 2025 - 15:07 CDT

Identified

We believe we have identified an issue with third party analytics service causing the login process to take longer than expected to give users access to their instance.
Posted Jul 18, 2025 - 14:44 CDT

Investigating

We are currently investigating this issue.
Posted Jul 18, 2025 - 14:11 CDT
This incident affected: Main Application.