Simwood Status
All Systems Operational
Voice ? Operational
90 days ago
100.0 % uptime
Today
API and Portal ? Operational
90 days ago
99.86 % uptime
Today
Availability Zones Operational
90 days ago
100.0 % uptime
Today
London ? Operational
90 days ago
100.0 % uptime
Today
Slough ? Operational
90 days ago
100.0 % uptime
Today
Manchester ? Operational
90 days ago
100.0 % uptime
Today
Availability Zones (US) Operational
90 days ago
100.0 % uptime
Today
San Jose (US West) Operational
90 days ago
100.0 % uptime
Today
New York (US East) Operational
90 days ago
100.0 % uptime
Today
Operations Desk ? Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
had a major outage
had a partial outage
Past Incidents
Oct 15, 2019

No incidents reported today.

Oct 14, 2019

No incidents reported.

Oct 13, 2019

No incidents reported.

Oct 12, 2019

No incidents reported.

Oct 11, 2019

No incidents reported.

Oct 10, 2019

No incidents reported.

Oct 9, 2019
Resolved - Billing has fully caught up. Thanks for your patience.
Oct 9, 00:18 UTC
Monitoring - Failover is largely complete and CDRs are now being processed.
Oct 8, 16:46 UTC
Update - We are about to commence failover to the standby cluster as this query rollback is showing no signs of concluding.

Once this is concluded we'll mark this incident as 'monitoring'. There are several million CDRs to catch up on so we will leave it unresolved until they are processed.
Oct 8, 15:55 UTC
Update - This remains ongoing but we are making progress.

The offending query remains on one node and continues to be in the process of rolling back. Unfortunately, rolling back is less efficient than the problem it caused in the first place. Note this is not an issue with the query per-se (a single row delete) but an internal Galera issue triggered by it. Until this rollback completes the cluster remains effectively write locked but serviceable for reads.

We know why this happened and how to prevent it going forwards and have backup nodes with current data ready to takeover should we decide to fail-over from the existing cluster. As we have no idea whatsoever how long the trigger query will take to roll back on the final node, we have held off failing over in anger in the hope it may be soon, but cannot delay indefinitely.

Call traffic remains unaffected and our ops team have been handling most urgent customer issues such as locked balances. We will therefore continue monitoring and update here should anything change.
Oct 8, 12:05 UTC
Identified - Whilst not affecting call traffic, we are presently unable to write to our primary database cluster. This is due to an overnight job triggering a bug. The query will eventually work through but we have no way presently of determining how long that will take. We are meanwhile investigating more invasive options.

In the interim, this means portal, API and administration options which would normally update the database (e.g. billing, number allocation and pre-pay top-ups) are delayed or non-functional.

We're sorry for any impact this will have but, to repeat, call traffic is not affected.
Oct 8, 07:17 UTC
Oct 7, 2019

No incidents reported.

Oct 6, 2019

No incidents reported.

Oct 5, 2019

No incidents reported.

Oct 4, 2019

No incidents reported.

Oct 3, 2019
Resolved - The database issues were resolved as of 2154 UK time and CDRs were catching up thereafter. All has been back to normal for some time and we're therefore closing this incident. Thanks for your patience.
Oct 3, 22:52 UTC
Monitoring - The database appears to be recovering and we're continuing to monitor the situation.
Oct 3, 21:02 UTC
Identified - We are monitoring a situation with our primary database cluster. We have an expectation of this remedying itself in the next few hours but have an action plan in place for overnight if it does not. In the interim, whilst there is zero impact on production call traffic CDRs, number provisioning and other configuration changes will be delayed.

Given the late hour and non-impact on call traffic, we do not intend sending notifications for updates until resolution or a dramatic change in circumstances. We will however update this page where possible.
Oct 3, 20:35 UTC
Oct 2, 2019

No incidents reported.

Oct 1, 2019

No incidents reported.