Tweaking Timeouts

#webperf #fast #mobile Alternative title: Stop Blaming the Network

I have a weakness for a particular Android game that shall remain nameless (to protect the innocent, of course). It frequently complains that I have "a poor network connection." Even when I'm one hop from a fiber ring. Interestingly enough, it is almost always the case that this notification (and poor responsiveness on the part of the game) occurs during an "episode" in which hundreds of thousands of users are all active. At the same time.

Yes, I found that telling, too.

Needless to say, I am almost certain I do not have "poor network" performance and really, unless they're performing some network latency tests on round trip time under the covers (which I seriously doubt) or doing some mathematically deduction based on TCP window sizes, there's no way an Android game developer is going to be able to tell me I have "poor network" performance. I will grant that it could be that my network is the problem. But if that's the case then I must have some sort of magical impact on "the network" wherever I go because my "network" connection is always at fault, no matter where I may be.

It's more likely an error in response to poor response time from an overloaded server. A server that's overloaded because the game is constantly exchanging messages with the server. Lots and lots of connections from lots and lots of users.

Reasons why the server would have poor response times are many. One of those is a poorly configured connection timeout value that is chewing up resources sitting in LISTEN or one of the many X_WAIT states to close. Another is the client application not reusing connections, and constantly introducing overhead by requiring a full TCP session setup and teardown for every single call.

Yet another is reusing the connection while the game is in an active state, but ignoring it (necessarily) when the game is suspended - idle time on the client or the user switched to something else. When the game is reinitiated, that connection is long gone. In the interim, however, the server sat and waited and waited (for its idle timeout to expire) until it could close the unused connection. Unused connections sitting in an idle state consume resources.

And I don't have to remind you of the Second Operational Axiom (but of course I will): as load increases, performance decreases.

Load is not just processing, it's also overhead - such as that incurred by maintaining TCP connections. The impact of the load from connection management is certainly not as heavy as that of actual processing but it does add up.

It is important for a variety of reasons that developers of applications - games or otherwise - understand the performance profile and usage pattern of their application before unleashing it on the public (or the employee community). It's important not to blame the user (i.e. their "network" connection) when it really isn't their network, but your server and capacity that's causing the problem. Test, retest, and understand the impact of both connection management and processing on the capacity of the server-side application your app relies on. Overload it, stress it out, and understand how to differentiate between capacity-related performance issues and actual network-related issues.  (Hint: it's almost never the network these days, and unless you write code to determine it is, don't blame the network).

While you're testing tweak the timeout values, find a setting that's optimal for game play that doesn't sit around and wait forever to close and free up those resources. Recognize that default values on any web or application server with respect to TCP are just that - default. They aren't specific to the performance and capacity you need for your application. If you haven't done some testing and tweaked those values, you're part of the problem.

Test, retest, tweak and re-tweak.

Your users will thank you for it. 


 

Published Jul 08, 2013
Version 1.0
No CommentsBe the first to comment