One of Clearwater’s biggest strengths is its massive scalability and its performance under load. We have tested Clearwater to above 10 million subscribers and 20 million busy hour call attempts and are confident that it will scale much further – but our credit cards just can’t take the Amazon Web Services fees!
This post explores some of the ways that Clearwater achieves this scalability and performance, and also provides some help and guidance to those of you that are looking to verify this for yourselves. (In the past we’ve seen some cases where users have struggled to get high performance out of Clearwater due to a misunderstanding of our overload controls.)
For those of you who do want to give this a go, we have provided a set of tools that you can use to test your deployment here. (We recognise that these tools aren’t as simple as they could be to set up and use though – we’re working to improve this, so watch this space!)
When a Clearwater node hits its capacity, it does not attempt to service all requests (as doing so will end up with service to all users being degraded). Instead, the node tries to provide good service to the maximum number of requests that it can, and reject the rest as cleanly as possible (allowing a node in the deployment that isn’t overloaded to service it if the request is retried, for example by an external P-CSCF).
What does a Clearwater node do?
Clearly any piece of software has a limit to the capacity it can provide, which is likely to be influenced by the physical resources it is running on. A Clearwater node determines its proximity to its capacity limit by comparing its measured latency with a configured target latency – that is, Clearwater’s definition of capacity is where it can no longer service requested within this configured latency target.
It does this by using a token bucket. This controls whether the node is allowed to process a request; every request costs one token, and if the token bucket is empty, the request is rejected by the node. The tokens in the token bucket are replenished over time, and the node learns what an appropriate token replacement rate is. The token replacement rate therefore determines the rate at which requests are accepted.
Every twenty requests the node compares how long it’s taking to process requests to the maximum acceptable latency. This algorithm is based on the Welsh and Culler “Adaptive Overload Control for Busy Internet Servers” paper, although it uses a smoothed mean latency in its comparison.
If the current latency is less, then the token rate is increased. How much it is increased by depends on how far below the current average latency is to the target latency. If the current latency is more, then the token rate is decreased – again, how much it’s decreased by depends on how much greater the current average latency is compared to the target latency. The token bucket has a minimum refill rate; this means that even a slow system will still try and service requests.
This throttling algorithm ensures that a Clearwater node will reject any requests beyond a certain number, to avoid overloading the node. This number starts small and is grown as more and more requests are successfully handled. If the requests start taking too long to process, this number is reduced again. This algorithm has various advantages:
- Its approximation improves significantly as the current load approaches the maximum capacity.
- If a Clearwater node is overloaded, the rate ramps down quickly.
- It converges (rather than oscillating).
However, this does mean that a node has a ‘warm up’ period when it first starts, during which the throttling is very aggressive. This has led to some misleadingly pessimistic load-testing results, with the node rejecting a large number of requests before it’s had a chance to learn an appropriate replenishment rate.
As an example, take a Sprout node that starts up with a token bucket that contains twenty tokens, and has a token refill rate of ten tokens/sec. The Sprout is then immediately sent forty requests to handle.
The Sprout is able to handle the first twenty requests, as it has these tokens available. It will then recalculate the token refill rate and increase it to, say, twelve tokens/sec. It will then try and handle the remaining twenty requests of the original forty, and will reject most of them due to overload, as it won’t have enough tokens. It will then recalculate the token refill rate and increase it again, this time to, say, fourteen tokens/sec.
In the next second, the Sprout is sent forty requests again. It will reject most of these as the Sprout only has fourteen tokens. At the end of the second though, the rate will have increased again to, say, eighteen tokens/sec.
This will continue, with the Sprout rejecting some of the requests, but increasing its token refill rate. After a small amount of time therefore, the token rate will increase to forty tokens/sec, and the Sprout node is now able to process all the requests sent to it.
If you’re seeing aggressive throttling on your deployment, then we suggest you give the node a small amount of time just to adjust to the new load. Alternatively, we recommend that you ramp up the load over a few minutes; this gives a more realistic scenario as well.
We’ve now also (by popular request) made the initial settings used by the token bucket configurable so that you can tune the start of day behaviour for your deployment. This will be available in the upcoming Zelda release. Check our configuration options page after the release is out to find out how to configure this on your deployment.