Windows Vista Performance Issue Illustrates Importance of Context
Decisions about routing at every layer require context
A friend forwarded a blog post to me last week mainly because it contained a reference to F5, but upon reading it (a couple of times) I realized that this particular post contained some very interesting information that needed to be examined further. The details of the problems being experienced by the poster (which revolve around a globally load-balanced site that was for some reason not being distributed very equally) point to an interesting conundrum: just how much control over site decisions should a client have?
Given the scenario described, and the conclusion that it is primarily the result of an over-eager client implementation in Windows Vista of a fairly obscure DNS-focused RFC, the answer to how much control a client should have over site decisions seems obvious: none.
The problem (which you can read about in its full detail here) described is that Microsoft Vista, when presented with multiple A records from a DNS query, will select an address “which shares the most prefix bits with the source address is selected, presumably on the basis that it's in some sense "closer" in the network.”
This is not a bad thing. This implementation was obviously intended to aid in the choice of a site closer to the user, which is one of the many ways in which application network architects attempt to improve end-user performance: reducing the impact of physical distance on the transfer of application data. The problem is, however, that despite the best intentions of those who designed IP, it is not guaranteed that having an IP address that is numerically close to yours means the site is physically close to you. Which kind of defeats the purpose of implementing the RFC in the first place.
Now neither solution (choosing random addresses versus one potentially physically closer) is optimal primarily because neither option assures the client that the chosen site is actually (a) available and (b) physically closer. Ostensibly the way this should work is that the DNS resolution process would return a single address (the author’s solution) based on the context in which the request was made. That means the DNS resolver needs to take into consideration the potential (in)accuracy of the physical location when derived from an IP address, the speed of the link over which the client is making the request (which presumably will not change between DNS resolution and application request) and any other information it can glean from the client.
The DNS resolver needs to return the IP address of the site that at the time the request is made appears best able to serve the user’s request quickly. That means the DNS resolver (usually a global load balancer) needs to be contextually aware of not only the client but the sites as well. It needs to know (a) which sites are currently available to serve the request and (b) how well each is performing and (c) where they are physically located. That requires collaboration between the global load balancer and the local application delivery mechanisms that serve as an intermediary between the data center and the clients that interact with it.
Yes, I know. A DNS request doesn’t carry information regarding which service will be accessed. A DNS lookup could be querying for an IP address for Skype, or FTP, or HTTP. Therein lies part of the problem, doesn’t it? DNS is a fairly old, in technical terms, standard. It is service agnostic and unlikely to change. But providing even basic context would help – if the DNS resolver knows a site is unreachable, likely due to routing outages, then it shouldn’t return that IP address to the client if another is available. Given the ability to do so, a DNS resolution solution could infer service based on host name – as long as the site were architected in such a way as to remain consistent with such conventions. For example, ensuring that www.example.com is used only for HTTP, and ftp.example.com is only used for FTP would enable many DNS resolvers to make better decisions. Host-based service mappings, inferred or codified, would aid in adding the context necessary to make better decisions regarding which IP address is returned during a DNS lookup – without changing a core standard and potentially breaking teh Internets.
The problem with giving the client control over which site it accesses when trying to use an application is that it lacks the context necessary to make an intelligent decision. It doesn’t know whether a site is up or down or whether it is performing well or whether it is near or at capacity. It doesn’t know where the site is physically located and it certainly can’t ascertain the performance of those sites because it doesn’t even know where they are yet, that’s why it’s performing a DNS lookup.
A well-performing infrastructure is important to the success of any web-based initiative, whether that’s cloud-based applications or locally hosted web sites. Part of a well-performing infrastructure is having the ability to route requests intelligently, based on the context in which those requests are made. Simply returning IP addresses – and choosing which one to use – in a vacuum based on little or no information about the state of those sites is asking for poor performance and availability problems.
- Lori_MacVittieEmployeeI am talking about client DNS request and interaction with DNS resolver. Some GLB implementations are also DNS resolvers, so may have confused the issue. My apologies.