LineRate: Excessive HTTP 404 Throttling
Fusker thwarting using the LineRate Node.js datapath scripting engine Fuskering is so fun to say, I couldn't resisting writing article about it. But, aside from just raising eyebrows when you use the term, fuskering is a real problem for some site maintainers. And having been in this position myself, I can verify that it's a difficult problem to solve. A flexible, programmable data-path, like the LineRate load balancer, makes light work of solving these kinds of problems. Background So, what exactly is fuskering? Simply stated, fuskering is requesting successive URL paths using a known pattern. For example, if you knew example.com had images stored at http://example.com/img01.jpg and http://example.com/img02.jpg . You might venture to guess that there is also an image at http://example.com/img03.jpg . And if I find that img03.jpg was there, I might as well try img04.jpg . Utilities, like curl, make automating this process extremely easy. Photo sites are a typical target for fuskering because image filenames are usually pretty predictable. Think about a URL like http://example.com/shard1/user/jane/springbreak14/DSCN5029.jpg and you start to see where this could be a problem. Not only is this a potential privacy concern, but it's also a huge burden on the datacenter assets serving those files. In some multi-tier architectures, serving a 404 is actually more burdensome than serving an asset that exists. When you request something that doesn't exist, it's possible that all of the following could happen: Cache miss on CDN, CDN requests from origin Front end load balancer receives requests, makes balancing decision, forwards request Web tier receives request, processes and sends to caching tier Caching tier receives request and consults memory cache and then disk cache. Another cache miss Services API tier receives request for URL to file system mapping At this point, either your well written object storage API will signal to you that the file really doesn't exist or you're not using object storage and you have to actually make a request to the local disk or NAS. In either case, that's a lot of work just to find out that something doesn't exist. Assets that do exist, end up in one of the caching tiers and are found and served much earlier in this process, typically right from CDN and the request never even touches your infrastructure. If your site is handling a lot of requests and they are spread across many servers - and possibly many data centers - correlating all the data to try and mitigate this problem can be tedious and time consuming. You have to log each request, aggregate it somewhere, perform analytics and then take action. All the while, your infrastructure is suffering. Options and limitations Requiring authentication and filename scrambling are two ways to reduce the likelihood that your site will attract fuskers in the first place. Of course, these methods do not actually make fuskering impossible, but by requiring the user to enter identifying information or by making the filenames extremely difficult to guess, the potential consequences and level of effort become too great, and the user will likely move on. The technique detailed in this article is just one of many ways to combat fuskers. Some other possible solutions are user-agent string checking, using CAPTCHA, using services like CloudFlare, traffic scrubbing facilities, etc. None of these is a silver bullet (or cheap, in some cases), but you could evaluate them all and figure out what works best for your environment and your wallet. There are also ways for a determined user to subvert a lot of these protective measures: Tor, X-Forwarded-For spoofing, using multiple source IPs, and adaptive scripts to minimize the effect of the block time window (i.e. Send max_404 requests, wait time_window , repeat), etc. [One] Solution Having a programmable data path makes solving (or at least mitigating) this issue easy. We can track each HTTP session, analyze the request and response, and have all the details we need to detect excessive 404's and throttle them. I provide a complete script to do this at the end; I'll describe the solution and how the script works next. This article uses fuskering as motivation for this solution, but realize that the source of excessive 404's could come from a variety of sources, such as: people that automate data collection from your site (and forget to update request paths when they change), a misbehaving application, resources that moved without a proper 301/302 redirect, or a true 404 DoS attack (which is meant to exploit all the things I mention in the Background section), just to name a few. Script overview We're going to use the local LineRate Redis instance for tracking the 404 info. We're storing a key-value pair, where the key is the client's IP address and the value is the number of 404 responses that client has received in a configurable time window. This "time window" is handled by setting an expiration on the key-value pair and then extending it if necessary. If no 404's are detected during the grace period, the entry expires and the client is not subject to any request throttling. When a new request is received, the client's IP is determined (see next section on Source IP) and checked against the Redis database. If a db entry is found and the corresponding value exceeds the allowed number of 404's, we intercept the request and respond directly from the load balancer with an HTTP 403. On the response side, when we detect a 404 is being returned to a client, we increment the counter for the client IP in the Redis db. If the client's IP doesn't exist, we add it and init the value to '1'. In either case, the time window is also set. The time window and the maximum number of 404's are configurable via the config object. Source IP To accurately analyze the 404's, you need to know the client's true source IP. Remember that any connection coming through a proxy is not going to have the actual client's source IP. Enter the X-Forwarded-For (XFF) header. If present, the XFF header will contain an ordered, comma-separated list of IP addresses. Each IP identifies another proxy, load balancer or forwarding device that the request passed through before it got to you. IPs are appended to this list, so the first IP is that of the actual client. In our script logic, we can check for the XFF header and if it's present, use the first IP in this list as the client IP. In the absence of a XFF header, we'll simply use the 'remoteAddress' of the connection object. redis There's a couple important things to point out in regards to using the included Redis server. First, the LineRate load balancer runs multiple instances of the Node.js engine and variables are unique to that instance. If you were to store 404 tracking info in local variables, you might get results that you don't expect. See here for more info. Second, using redis lends itself especially well to this example because you can run this script on all the virtual-servers for your site and get instant aggregated analysis and action. The Script If you're not already familiar with Node.js and the LineRate scripting engine, be sure to check out the LineRate Scripting Developer's Guide. requires and config Load the required modules and initialize the config object. You might want to tune config.time_window and config.max_404 to your environment. Continue reading to gain a better understanding of the implications of changing these values. var vsm = require('lrs/virtualServerModule'); var async = require('async'); var redis = require('redis').createClient(); // Change config as needed. var config = { vs: 'vs_http', // name of virtual-server time_window: 10, // window in seconds max_404: 10 // max 404's per time window }; redis Pretty basic stuff here, but note that we're loading the module and creating a client object all in one line. var redis = require('redis').createClient(); redis.on('error', function (err) { console.log('Error' + err); }); redis.on('ready', function () { console.log('Connected to redis'); }); onRequest() - async waterfall The async module is used to provide some structure to the code and to ensure that things happen in the proper order. When we receive a new request from a client, we get the client's IP, check it against the database and then handle the the rest of the request/response processes. Each of the functions are detailed next. function onRequest(servReq, servResp, cliReq) { async.waterfall([ function(callback) { get_client_ip(servReq, callback); }, function(client_ip, callback) { check_client(client_ip, callback); }, function(throttle, client_ip, callback) { doRequest(servResp, cliReq, throttle, client_ip, callback); }, ], function (err, result) { if (err) { throw new Error(err); // blow up } }); } get_client_ip() Check for the presence of the XFF header. If present, get the client IP from the header value. If not, use remoteAddress from the servReq connection object. function get_client_ip(servReq, callback) { var client_ip; // check xff header for client ip first if ('x-forwarded-for' in servReq.headers) { client_ip = servReq.headers['x-forwarded-for'].split(',').shift(); } else { client_ip = servReq.connection.remoteAddress; } return callback(null, client_ip); } check_client() check_client() is where we determine whether to block the request. If the client's IP is in redis and the corresponding value exceeds that of config.max_404 , we set throttle to true . Else, throttle remains false . throttle is used in the next function to either allow or block the request. function check_client(client_ip, callback) { var throttle = false; redis.get(client_ip, function (err, reply) { if (err) { return callback(err); } if (reply >= config.max_404) { throttle = true; } return callback(null, throttle, client_ip); }); } doRequest() In doRequest() , the first thing we do is check to see if throttle is true . If it is, we simply return a 403 and close the connection. If you wanted more aggressive throttling, you could also update the expiration time of the redis key associated with this client's IP here. If there is no throttle, we register a listener for the 'response' to the cliReq() and send the request on to the client. When we receive the response, we check the status code. If it's a 404, we increment the redis 404 counter. For any client that requests more than config.max_404 in a rolling window of config.time_window will start to get blocked. Once the time window passes, the requests will be allowed again. function doRequest(servResp, cliReq, throttle, client_ip, callback) { if (throttle) { servResp.writeHead(403); servResp.end('404 throttle. Your IP has been recorded.\n'); // note you could choose to bump the redis key timeout here // and effectively lock out the user completely (even for good requests) // until they stop ALL requests for 'time_window' return callback(null, 'done'); } else { cliReq.on('response', function(cliResp) { var status_code = cliResp.statusCode; if (status_code === 404) { redis.multi() .incr(client_ip) .expire(client_ip, config.time_window) .exec(function (err, replies) { if (err) { return callback(err); } }) } // Fastpipe response cliResp.bindHeaders(servResp); cliResp.fastPipe(servResp); }); cliReq(); return callback(null, 'done'); } } Testing This bash one-liner will use curl to send 15 consecutive requests for an image that doesn't exist and results in a 404 response. Note the change from '404' to '403' from request #10 to request #11. This is the throttling in action. > for i in $(seq -w 1 15);do echo -n "${i}: `date` :: "; curl -w "%{http_code}\n" -o /dev/null -s http://example.com/noexist.jpg;done 01: Fri Feb 13 09:59:44 MST 2015 :: 404 02: Fri Feb 13 09:59:44 MST 2015 :: 404 03: Fri Feb 13 09:59:44 MST 2015 :: 404 04: Fri Feb 13 09:59:44 MST 2015 :: 404 05: Fri Feb 13 09:59:44 MST 2015 :: 404 06: Fri Feb 13 09:59:44 MST 2015 :: 404 07: Fri Feb 13 09:59:44 MST 2015 :: 404 08: Fri Feb 13 09:59:44 MST 2015 :: 404 09: Fri Feb 13 09:59:44 MST 2015 :: 404 10: Fri Feb 13 09:59:44 MST 2015 :: 404 11: Fri Feb 13 09:59:44 MST 2015 :: 403 12: Fri Feb 13 09:59:44 MST 2015 :: 403 13: Fri Feb 13 09:59:44 MST 2015 :: 403 14: Fri Feb 13 09:59:44 MST 2015 :: 403 15: Fri Feb 13 09:59:44 MST 2015 :: 403 Pulling it all together, here's the full script. Happy cloning! Please leave a comment or reach out to us with any questions or suggestions and if you're not a LineRate user yet, remember you can try it out for free.419Views0likes0CommentsLineRate HTTP to HTTPS redirect
Here's a quick LineRate proxy code snippet to convert an HTTP request to a HTTPS request using the embedded Node.js engine. The relevant parts of the LineRate proxy config are below, as well. By modifying the redirect_domain variable, you can redirect HTTP to HTTPS as well as doing a non-www to a www redirect. For example, you can redirect a request for http://example.com to https://www.example.com . The original URI is simply appended to the redirected request, so a request for http://example.com/page1.html will be redirected to https://www.example.com/page1.html . This example uses the self-signed SSL certificate that is included in the LineRate distribution. This is fine for testing, but make sure to create a new SSL profile with your site certificate and key when going to production. As always, the scripting docs can be found here. redirect.js: Put this script in the default scripts directory - /home/linerate/data/scripting/proxy/ and update the redirect_domain and redirect_type variables for your environment. "use strict"; var vsm = require('lrs/virtualServerModule'); // domain name to which to redirect var redirect_domain = 'www.example.com'; // type of redirect. 301 = temporary, 302 = permanent var redirect_type = 302; vsm.on('exist', 'vs_example.com', function(vs) { console.log('Redirect script installed on Virtual Server: ' + vs.id); vs.on('request', function(servReq, servResp, cliReq) { servResp.writeHead(redirect_type, { 'Location': 'https://' + redirect_domain + servReq.url }); servResp.end(); }); }); LineRate config: real-server rs1 ip address 10.1.2.100 80 admin-status online ! virtual-ip vip_example.com ip address 192.0.2.1 80 admin-status online ! virtual-ip vip_example.com_https ip address 192.0.2.1 443 attach ssl profile self-signed admin-status online ! virtual-server vs_example.com attach virtual-ip vip_example.com default attach real-server rs1 ! virtual-server vs_example.com_https attach virtual-ip vip_example.com_https default attach real-server rs1 ! script redirect source file "proxy/redirect.js" admin-status online Example: user@m1:~/ > curl -L -k -D - http://example.com/test HTTP/1.1 302 Found Location: https://www.example.com/test Date: Wed, 03-Sep-2014 16:39:53 GMT Transfer-Encoding: chunked HTTP/1.1 200 OK Content-Type: text/plain Date: Wed, 03-Sep-2014 16:39:53 GMT Transfer-Encoding: chunked hello world216Views0likes0CommentsEnforcing HSTS (HTTP Strict Transport Security) in LineRate
HTTP Strict Transport Securityis a policy between your customer's browsers and your servers to increase security. It forces the browser to always use HTTPS when connecting to your site. The server or proxy needs to set the Strict-Transport-Security header. If the client connects sometime in the future and isn't offered a valid SSL cert, it should show an error to the user. Also, if the client is somehow directed to a plaintext URL at your site, for instance from the address bar, it should turn it into an HTTPS URL before connecting. This policy can prevent simple attacks where an attacker is positioned in the network temporarily between a client and your site. If they can connect to the client plaintext, and the user isn't carefully checking for the green browser lock symbol, they can act as a man in the middle and read all the data flowing between clients and servers. SeeMoxie Marlinspike's presentation. HSTS also saves you if you're worried about some piece of infrastructure accidentally emitting a non-secure link. Modern sites are a hybrid of client-side and server-side code; browser facing content and APIs; core applications and peripheral systems like analytics, support, or advertising. What are the odds that one of these systems will someday use a URL that's not HTTPS? With HSTS, you can prevent attacks that take advantage of this accident. To make HSTS effective in this case, you should place it in a proxyoutsideof all of these systems. First I'll show a simple script to enable HSTS on LineRate; next I'll show an enhancement to detect the plaintext URLs that are leaking on clients that don't obey HSTS. Simple: Add HSTS Header to Responses To enable HSTS for your site, simply catch responses and add the "Strict-Transport-Security" header: var vsm = require('lrs/virtualServerModule'); // Set this to the amount of time a browser should obey HSTS for your site var maxAge = 365*24*3600; // One year, in seconds function setHsts(servReq, servRes, cliReq) { cliReq.on('response', function (cliRes) { cliRes.bindHeaders(servRes); servRes.setHeader('Strict-Transport-Security', 'max-age=' + maxAge); cliRes.fastPipe(servRes); }); cliReq(); } vsm.on('exist', 'yourVirtualServerName', function (vs) { vs.on('request', setHsts); }); For any requests that come through a virtual server named yourVirtualServerName , LineRate will add the header to the response. In this example, the maxAge variable means that the browser should enforce HTTPS for your site for the next year from when it saw this response header. As long as users visit your site at least once a year, their browser will enforce that all URLs are HTTPS. Advanced: Detect plaintext leaks and HSTS issues In a more advanced script, you can also detect requests to URLs that aren't HTTPS. Note that HSTS requires the browser to enforce the policy; some browsers don't support it (Internet Explorer does not as of this writing; Safari didn't until Mavericks). For those users, you'll still need to detect any plaintext "leaks". Or, maybe you're a belt-and-suspenders kind of person, and you want to tell your servers to add HSTS, but also detect a failure to do so in your proxy. In these cases, the script below will detect the problem, collect information, record it, and workaround by redirecting to the HTTPS URL. var vsm = require('lrs/virtualServerModule'); var util = require('util'); // Set this to the domain name to redirect to var yourDomain = 'www.yoursite.com'; // Set this to the amount of time a browser should obey HSTS for your site var maxAge = 365*24*3600; // One year, in seconds var stsValParser = /max-age=([0-9]+);?/; function detectAndFixHsts(servReq, servRes, cliReq) { cliReq.on('response', function (cliRes) { cliRes.bindHeaders(servRes); var stsVal = cliRes.headers['strict-transport-security']; var stsMatch = stsVal ? stsValParser.match(stsVal) : []; if (stsMatch.length !== 1) { // Strict-Transport-Security header not valid. console.log('[WARNING] Strict-Transport-Security header not set ' + 'properly for URL %s. Value: %s. Request Headers: %s' + ', response headers: %s', servReq.url, stsVal, util.inspect(servReq.headers), util.inspect(cliRes.headers)); servRes.setHeader('Strict-Transport-Security', 'max-age=' + maxAge); } cliRes.fastPipe(servRes); }); cliReq(); } function redirectToHttps(servReq, servRes, cliReq) { // This is attached to the non-SSL VIP. var referer = servReq.headers['referer']; if (referer === undefined || (referer.lastIndexOf('http://' + yourDomain, 0) == -1)) { // Referred from another site on the net; not a leak in your site. } else { // Leaked a plaintext URL or user is using a deprecated client console.log('[WARNING] Client requested non-HTTPS URL %s. ' + 'User-Agent: %s, headers: %s', servReq.url, servReq.headers['user-agent'], util.inspect(servReq.headers)); } var httpsUrl = 'https://' + yourDomain + servReq.url; var redirectBody = '<html><head><title> ' + httpsUrl + ' Moved</title>' + '</head><body><p>This page has moved to <a href="' + httpsUrl + '">' + httpsUrl + '</a></p></body></html>'; servRes.writeHead(302, { 'Location': httpsUrl, 'Content-Type' : 'text/html', 'Content-Length' : redirectBody.length }); servRes.end(redirectBody); } vsm.on('exist', 'yourHttpsVirtualServer', function (vs) { vs.on('request', detectAndFixHsts); }); vsm.on('exist', 'yourPlainHttpVirtualServer', function (vs) { vs.on('request', redirectToHttps); }); Note that logging every non-HTTPS request can limit performance and fill up disk. Alternatives include throttled logging (try googling "npm log throttling"), or recording URLs to a database, or keeping a cache of URLs that we've already reported. If you're interested, let me know in the comments and I can cover some of these topics in future blog posts.1.2KViews0likes0CommentsIntroducing LineRate Lightning series (and Snippet #1 - HTTP referer blocking)
We're big fans here of the iRules 20 Lines or Less series for F5's BIG-IP. LineRate proxy uses a Node.js scripting engine embedded into the HTTP data path, so we can't directly use iRules scripts. However, the power and flexibility of the Node.js engine allows us to do some pretty cool things - and a lot of these things don't require much code and can be implemented quickly. The LineRate Lightning series will contain snippets of code that aim to be quick, powerful and maybe a little bit flashy - kind of like real lightning! If you want to know more about scripting, be sure to check out the official documentation and the LineRate Scripting Guide. And remember that the F5 LineRate staff monitors DevCentral closely so this is a great place to get all your questions answered. If you're already a LineRate user, you have a good handle on what we can do with scripting. If you're not, this will be a great way for us to show you. Sit back and hold on while we ride the lightning. This first snippet is a simple one that does HTTP referrer blocking based on a whitelist of permitted referrers. Simply add the refering domains that you'd like to permit in the domain_whitelist list and change vs_http to match the name of your virtual server. Here's the script:305Views0likes0CommentsConditional high-res image serving with LineRate
There is a vast array of screen sizes, resolutions and pixel densities out there. As someone who is serving content to users, how do you ensure that you're serving the best quality images to screens where it actually makes a difference? The problem: If you're serving standard resolution images to high pixel density screens (such as Apple Retina, iPhones, Samsung Galaxy and myriad other devices), user experience may suffer. If you're serving higher resolution images to lower quality screens, you're chewing up unnecessary bandwidth and increasing page load times to users who won't benefit. So you've decided you want to give your users the best user experience you can when it comes to images, how do you go about it? One solution is to maintain multiple copies of the image at various qualities and to conditionally serve the right quality depending on the client. The basic premise of the solution documented here is to use a cookie to signal the client's device pixel ratio to the server. Once we know the device pixel ratio, we can decide which version of an image to serve. The device pixel ratio isn't something that is normally sent from client to server, so we have to give it a little nudge. There are a couple of ways to accomplish this - this article details two methods well. If you're already using javascript, I'd recommend using the javascript method. If you're not, or you just want to avoid javascript for whatever reason, go with the CSS method. Either way, we'll assume that you've implemented this and that the cookie is named "device-pixel-ratio" and that the value is an integer. This script below will check if the request is for an image (ex. image1.png ). If it is, it will also check for the presence and value of the device-pixel-ratio cookie. If the value is greater than 1, the request will be re-written to request the high res image (ex. image1.png@2x ). Here's the script: 'use strict'; var vsm = require('lrs/virtualServerModule'); var url = require('url'); var cookie = require('cookie'); // append this to image names for highres var suffix = '@2x'; // image types to check for highres var ext = [ '.jpg', '.jpeg', '.gif', '.png', '.webp' ]; var request_callback = function (vs_object) { console.log('pixel_ratio script installed on Virtual Server: ' + vs_object.id); vs_object.on('request', function (servReq, servResp, cliReq) { // check header for device-pixel-ratio if ('Cookie' in servReq.headers) { var cookies = cookie.parse(servReq.headers.Cookie); console.log(cookies); if ('device-pixel-ratio' in cookies && cookies['device-pixel-ratio'] > 1) { var u = url.parse(servReq.url); for (var i = 0; i < ext.length; i++) { if (u.pathname.slice(-ext[i].length) === ext[i]) { servReq.bindHeaders(cliReq); servReq.url = u.pathname + suffix + (u.search || ''); break; } }; } } cliReq(); }); }; vsm.on('exist', 'vs_http', request_callback);155Views0likes0CommentsLineRate logging, JSON and LogStash
Data collections and data mining are big business all over the web. Whether you are trying to monetize the data or simply track data points for statistical analysis, data storage, formatting and display are essential. Most web-facing applications have logging capabilities built in and work pretty well for storing this data on direct- or network-attached storage. But when you have tens, hundreds and even thousands of applications logging data, storing, aggregating, parsing and, ultimately, visualizing this data can get pretty difficult. Some large web properties can generate 10's or even 100's of GB's of log data per day. One way to simplify this process is to log everything to a central point in the network. Think syslog on steroids. Since all your traffic is passing through it anyway, the load balancer is a great place to do this. Not only can you record "normal" HTTP statistics, like request method and URI, but logging from the load balancer also allows you to include virtual server and real server data for each request. In this article, I'll show you how you can gather various data points for HTTP requests using the LineRate proxy and the embedded Node.js scripting engine. You can then format and ship this data to any number of data/log collection services. In this example, I'm going to send JSON formatted data to logstash - "a tool for managing events and logs". (By default, logstash includes ElasticSearch for it's data store and the Kibana web interface for data visualization.) logstash is an open source project and installs easily on Linux. The logstash 10 minute walkthrough should get you started. We're going to configure logstash to ingest JSON formatted data by listening on a TCP port. On the LineRate side, we'll build a JSON object with the data we're interested in and use a TCP stream to transmit the data. I'll use the TCP input in logstash to ingest the data and then the JSON filter to convert the incoming JSON messages to a logstash event. logstash adds a few fields to the data, but essentially leaves all the original JSON in it's original structure, so this filter is perfect if you're already working with JSON. Here's a simple logstash config file for this setup: input { tcp { mode => server port => 9999 } } filter { json { source => "message" } geoip { source => "[request][source_ip]" } } output { elasticsearch { host => localhost } stdout { codec => rubydebug } } You might notice that I also added the 'geoip' filter. This is another great built-in feature of logstash. I simply enable the filter and tell logstash which of my input data fields contains the IP address that I want geo data for and it takes care of the rest. It uses its built in GeoIP database (via the GeoLiteCity database from MaxMind) to get data about that IP and adds the data to the message. By default, this filter adds a lot of geo data to the message. You might want to trim some of the fields if it's more than you need. (For example, you might not want latitude and longitude.) Here's a sample screenshot of logstash/kibana with data logged from a LineRate proxy: Here's the Node.js script for LineRate. 'use strict'; var vsm = require('lrs/virtualServerModule'); var net = require('net'); var os = require('os'); var cm = require('connman'); // change these variables to match your env var virtual_server = 'vs_http'; var logging_server = { 'host': '10.190.5.134', 'port': 9999 }; var hostname = os.hostname(); var pid = process.pid; var sock; function onData() { // noop } function onReset() { // noop } function onReconnect() { // noop } function onConnect(socket) { console.log("Socket connected, pid = ", process.pid); sock = socket; } function log_message(message) { try { sock.write(JSON.stringify(message) + '\r\n'); } catch (ex) { // if you want an exception logged // when a write failure occurs, uncomment // the next line. Note this could have // very negative effects if you are logging // at a high request rate //console.log(ex); } } cm.connect(logging_server.port, logging_server.host, onConnect, onReset, onData); var processRequest = function(servReq, servResp, cliReq) { // object to store stats var message = {}; // record request start time var time_start = process.hrtime(); message['request'] = { 'method': servReq.method, 'url': servReq.url, 'version': servReq.httpVersion, 'source_ip': servReq.connection.remoteAddress, 'source_port': servReq.connection.remotePort, 'headers': servReq.headers }; message['vip'] = { 'virtual_ip': servReq.connection.address().address, 'virtual_port': servReq.connection.address().port, 'virtual_family': servReq.connection.address().family }; message['lb'] = { 'virtual_server': virtual_server, 'hostname': hostname, 'process_id': pid }; servReq.on('response', function onResp(cliResp) { message['real'] = { 'ip': cliResp.connection.remoteAddress, 'port': cliResp.connection.remotePort }; cliResp.bindHeaders(servResp); cliResp.pipe(servResp); // get total request time var time_diff = process.hrtime(time_start); //console.log('setting response time: ' + time_diff[1]); message['response'] = { 'headers': cliResp.headers, 'status_code': cliResp.statusCode, 'version': cliResp.httpVersion, 'time_s': time_diff[0] + (time_diff[1] / 1e9) }; log_message(message); }); /* continue with request processing, send the request to real-server */ cliReq(); }; var createCallback = function(virtualServerObject) { console.log('Logging script installed on Virtual Server: ' + virtualServerObject.id); virtualServerObject.on('request',processRequest); }; vsm.on('exist', virtual_server, createCallback); Finally, here's a sample message in raw JSON format retrieved from the elasticsearch datastore: { "_index": "logstash-2014.11.03", "_type": "logs", "_id": "PZTeM0KWRfiju9qUifmDhQ", "_score": null, "_source": { "message": "(redacted)", "@version": "1", "@timestamp": "2014-11-03T22:32:22.930Z", "host": "192.168.88.155:59625", "geoip": { "area_code": 614, "city_name": "Columbus", "continent_code": "NA", "country_code2": "US", "country_code3": "USA", "country_name": "United States", "dma_code": 535, "ip": "172.16.87.1", "latitude": 39.96119999999999, "location": [ -82.9988, 39.96119999999999 ], "longitude": -82.9988, "postal_code": "43218", "real_region_name": "Ohio", "region_name": "OH", "timezone": "America/New_York" }, "request": { "method": "GET", "url": "/test/page", "version": "1.1", "source_ip": "172.16.87.1", "source_port": 59721, "headers": { "User-Agent": "curl/7.30.0", "Connection": "close", "Accept": "*/*", "Host": "lrosvm", "X-Fake-Header": "fake-header-value" } }, "vip": { "virtual_ip": "172.16.87.153", "virtual_port": 80, "virtual_family": "IPv4" }, "lb": { "virtual_server": "vs_http", "hostname": "lros01", "process_id": 8177 }, "real": { "ip": "192.168.233.1", "port": 15000 }, "response": { "headers": { "Content-Length": "36", "Date": "Mon, 03 Nov 2014 22:32:22 GMT", "Content-Type": "text/plain" }, "status_code": 200, "version": "1.1", "time_s": 0.00395572 }, "_source": {} }, "sort": [ 1415053942930, 1415053942930 ] } Please leave a comment or reach out to us with any questions or suggestions and if you're not a LineRate user yet, remember you can try it out for free.663Views0likes0CommentsSnippet #4: LineRate and GeoIP Node.js module
GeoIP allows you to find the geological location of a host based on its IP address. The information typically includes the country, city, latitude, and longitude. The types of data you get and their granularity and accuracy depend on database you use. This article demonstrates how we can utilize the geoip-lite Node.js module on LineRate. In this example, a LineRate VirtualServer injects two headers containing GeoIP information to the client requests in order to provide the information to the back-end HTTP servers for use in their applications. To install the geoip-lite Node.js module, execute the following LineRate command. scripting npm install geoip-lite If you have received an error message, please refer to the section on LRS-26399 in LineRate Release Notes, Version 2.5.0. Here is the script. Make sure to change the virtual server ID in the 2nd argument of vsm.on (line #18). 'use strict'; var geoip = require('geoip-lite'); var vsm = require('lrs/virtualServerModule'); var proc = function(servReq, servResp, cliReq) { var g = geoip.lookup(servReq.connection.remoteAddress); if (g) { if (g['ll']) { servReq.addHeader('Geo-Position', g['ll'].join(';')); } if (g['country']) { servReq.addHeader('Geo-Country', g['country']); } } cliReq(); } vsm.on('exist', 'vs1', function(vso) { vso.on('request', proc); };215Views0likes0CommentsSnippet #6: Converting Internationalized Domain Name with LineRate
Modern browsers convert the Internationalized Domain Names (IDN) to the set of ASCII characters permitted in the Domain Name System prior to name resolution. The mechanism employed is called Punycode, and is defined in RFC 3492. For example, an UTF-8 represented “日本語.jp” is converted to “xn--wgv71a119e.jp”, or vice versa. The latter representation is also used in the HTTP’s Host request header. For example, “Host: xn--wgv71a119e.jp”. The Punycode Node.js module is bundled in LineRate. The toASCII() method converts an UTF-8 represented domain name to ASCII, and the toUnicode() does the reverse. The module is handy when you want to display readable UTF-8 domain names or compare the Host header values and UTF-8 names. The snippet below converts Host header values to UTF-8 and writes to the console. 'use strict'; var fp = require('lrs/forwardProxyModule'); var puny = require('punycode'); var proc = function(servReq, servResp, cliReq) { try { var host_puny = servReq.headers['Host']; var host_utf8 = puny.toUnicode(host_puny); console.log(host_puny + ' -> ' + host_utf8); } catch(e) { // do nothing } cliReq(); } fp.on('exist', 'fp', function(fpo) { console.log(fpo.id + ' exists.'); fpo.on('request', proc); }); Here are some examples: LROS: xn--wgv71a119e.jp → 日本語.jp LROS: xn--6krz9fba47sz4d44x8h7asr0c.tw → 國立暨南國際大學.tw LROS: xn--fhqu4ykwbs65a.cn → 上海大学.cn LROS: xn--9d0bw1iq6js1kwhq.kr → 전북대학교.kr 233Views0likes0CommentsLineRate: HTTP session ID persistence in scripting using memcache
Using the LineRate Node.js datapath scripting engine to achieve session-based client/server affinity LineRate introduced the selectServer() method and the newServerSelected event extensions to the built-in Node.js http module in version 2.4. This opens up a lot of possibilities for dynamically selecting real-servers based on HTTP session information, such as HTTP headers (cookies, user agents) or any other metric, such as server load, time of day, geolocation, or pretty much any other metric you can think of. One use-case is particularly useful: obtaining server affinity based on a session ID such as PHPSESSION (PHP), JSESSIONID (tomcat), ASP.NET_SessionId (ASP), and connect.sid (Express/Connect). Read on to see how. A quick aside: LineRate added cookie-based real-sever affinity way back in version 1.6. This is a powerful feature in it's own right and might get you want you want with a minimal amount of effort - I urge you to check it out. But, if you're averse to injecting a new cookie or you just need more fine-grained control of the affinity algorithm, read on. I'll show you a script and walk through some of the highlights to accomplish session ID persistence via scripting below. For bonus points, I'm going to use memcache to store a key-value pair, which will consist of the session ID (key) and the real-server name (value). (See here for why you don't want to just store this data in a javascript data structure.) A Redis server is already pre-installed and ready to go on your LineRate instance, and it works great, but I already wrote an example of using Redis in another article, so I'll demo something different here. (This is in contrast to memcache where you'll need a separate memcache server/cluster running somewhere on your network.) The selectServer() method extends the http.clientRequest class and allows you to specify the real-server to receive the request. The newServerSelected event fires whenever the system forwards the request to a real-server that you did not specify. This is particularly useful (and critical for this example) when you don't care which server handles the session, just that the same server continues to handle that session. Keep reading for a discussion on the various pieces of the script; see the very bottom for the full script. If you're not already familiar with Node.js and the LineRate scripting engine, be sure to check out the LineRate Scripting Developer's Guide. requires and config Load the required modules and update the config object as needed. var vsm = require("lrs/virtualServerModule"); var cookie = require("cookie"); var async = require("async"); var memcache = require('memcache'); // Change config as needed. var config = { session_id_key: "connect.sid", // session-id key being used vs: "vs_http", // name of virtual-server memcache_host: '172.16.87.154' }; memcache memcache is a "high-performance, distributed memory object caching system". It's a really good option for an in-memory (read: fast), distributed, key-value store. Since storing keys and values is exactly what we need to do for this script, we'll take it for a test drive. Here's what the memcache code is doing: We load the module, create a new client object, define some event listeners and then connect to the memcache server. We'll also use the get() and set() methods later. This memcache code could be re-used in any scenario where a caching server is needed. If the client emits a 'close' or 'error' event, we wait one second before trying to reconnect using setTimeout() . var memcache = require('memcache'); var memcache_client = new memcache.Client(11211, config.memcache_host); memcache_client.on('connect', function () { console.log('Connected to memcache server at ' + config.memcache_host + ':11211'); }); memcache_client.on('timeout', function () { console.log('Memcache connection timed out; reconnecting...'); memcache_client.connect(); }); memcache_client.on('close', function () { console.log('Memcache connection closed; reconnecting (waiting 1s)...'); // wait 1s before re-connecting setTimeout(function () { memcache_client.connect(); }, 1000); }); memcache_client.on('error', function (e) { console.log('Memcache connection error; reconnecting...'); console.log(e); // wait 1s before re-connecting setTimeout(function () { memcache_client.connect(); }, 1000); }); memcache_client.connect(); async waterfall When we receive a new request from a client, a few things need to happen serially. And each of those sub-processes relies on data from the previous process. This is a classic use-case for the 'waterfall' control flow from the async module. The waterfall flow will run a series of functions and pass the results of each function to the next function. In this case, we're doing three primary things: getting the session ID from the request, selecting the real-server based on the session ID and then making the request. Each of these three functions are detailed next. function onRequest(servReq, servResp, cliReq) { async.waterfall([ function(callback) { getSessionIdCookie(servReq, callback); }, function(sessionId, callback) { selectServer(sessionId, cliReq, callback); }, function (cachedServerName, callback) { doRequest(servReq, servResp, cliReq, cachedServerName, callback); } ], function (err, result) { if (err) { throw new Error(err); } }); } getSessionIdCookie() Just like the name of the function sounds, here we're getting the session ID from the cookie header in the request. The sessionID variable is initialized to undefined . If the request doesn't contain a session ID, the sessionID variable will remain undefined , otherwise sessionID gets set to, you guessed it, the session ID. function getSessionIdCookie(servReq, callback) { var sessionId; // check for existence of session-id cookie if (servReq.headers.cookie) { var cookies = cookie.parse(servReq.headers.cookie); sessionId = cookies[config.session_id_key]; } return callback(null, sessionId); } selectServer() Here's where things start to get a little more interesting; this is where we dynamically select a real-server to which to send the request. If the original request did not contain a session ID in the cookie header, the request will just be processed 'normally' - the LineRate system will pick a real-server based on the configured load balancing algorithm for the virtual server. If the request does contain a session ID cookie, we look up the session ID in memcache. If memcache already has an entry with this session ID, the request is part of a previous session and we send the request to the same server to which previous requests were sent using selectServer(). The unassuming cliReq.selectServer(serverName); line is the key to this whole script and is what gives us real-server affinity using the session ID. If the selected server is different than what the system would have chosen using the configured load balancing algorithm, this will cause the 'newServerSelected' event to fire (see next section). function selectServer(sessionId, cliReq, callback) { if (!sessionId) { // no session-id cookie, proceed to next step return callback(null, null); } // lookup session id in memcache memcache_client.get(sessionId, function (err, result) { if (err) { console.log(err); return callback(err); } else { var serverName = result; if (serverName) { cliReq.selectServer(serverName); } return callback(null, serverName); } }); } doRequest() There's some interesting stuff happening in this function. We're piping the original request to the real-server we found in the selectServer() function. We're also listening for the real-server's response (cliResp). Why? This is where we snoop for a 'set-cookie' header signalling to our script that this is the first response in a new session. This set-cookie contains the session ID for the session. We record this session ID and real-server name pair in memcache. The newServerSelected event is guaranteed to be emitted before the response event for cliReq . This ensures that we have all the data we need when the memcache set() is called in the response event handler. Also note that the "selected server" will be null for the first request of any new session. This is expected and allowed. The request will default to choosing a real-server based on the configured load balancing algorithm. Lastly, the astute reader will note that we're using the expiration time of the session cookie to configure the expiration time of the memcache entry. This will ensure that the session info is in memcache for the duration it's needed and then cleans up after itself once it's expired. function doRequest(servReq, servResp, cliReq, cachedServerName, callback) { var selectedServerName; // Register "newServerSelected" handler to save the real-server name // which was used to send out the request. cliReq.on("newServerSelected", function(newServerName) { selectedServerName = newServerName; }); // Register response handler to optionally update cache if a new server // selection was made. cliReq.on("response", function(cliResp) { if(!selectedServerName) { selectedServerName = cachedServerName; } // retrieve session-id cookie from "set-cookie" header in response var set_cookie_header = cliResp.headers["set-cookie"]; if (set_cookie_header) { // Note: the "set-cookie" header is always an array containing the // set-cookie string as the first element. var set_cookie_object = cookie.parse(set_cookie_header[0]); var sessionId = set_cookie_object[config.session_id_key]; // calculate memcache exptime from session expiration time var expiration = (new Date(set_cookie_object['Expires']).getTime())/1000; if (sessionId) { // set the memcache entry expiration to the expiration time // of the set-cookie memcache_client.set(sessionId, selectedServerName, function(err, result) { // error handling here; depends on your env }, expiration); } } // Fastpipe response cliResp.bindHeaders(servResp); cliResp.fastPipe(servResp); }); // Fastpipe request servReq.bindHeaders(cliReq); servReq.fastPipe(cliReq); return callback(null); } Testing I have a pair of test real-servers running Express with the session module, called 'srv1' and 'srv2'. Each Express server responds with a JSON object with some information about the session. Here's a simple series of commands demonstrating what happens when the script is running. Note the 'real-server' is the same for every request and the session ID is the same in the 2nd and 3rd request and in the cookie.jar file. (I truncated the session ID for readability.) > curl -c cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"HQ1xyGhmrAUjZeJCG15JO019Pkb8K4NJ","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"HQ1xyGhmrAUjZeJCG15JO019Pkb8K4NJ","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"HQ1xyGhmrAUjZeJCG15JO019Pkb8K4NJ","expires":300000} Here's a contrasting series of commands demonstrating what happens when the script is not running. Note the round-robin'ing between real-servers 'srv1' and 'srv2'. > curl -c cookie.jar http://172.16.87.157 {"real-server":"srv2","name":"connect.sid","session_id":"OlxEv3KPZHqZ0J0Nby0FVaEO_S8x061D","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"IJ2x2jxnEt5NP6y3pO_7AZalX2sHBfOu","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv2","name":"connect.sid","session_id":"OlxEv3KPZHqZ0J0Nby0FVaEO_S8x061D","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"SuIOpe7qSm9hj0bcimAAIqOturyir0-l","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv2","name":"connect.sid","session_id":"OlxEv3KPZHqZ0J0Nby0FVaEO_S8x061D","expires":300000} And, pulling it all together, here's the full script. Happy cloning! Please leave a comment or reach out to us with any questions or suggestions and if you're not a LineRate user yet, remember you can try it out for free.354Views0likes0CommentsA LineRate script with lengthy initialization process
In some cases, your LineRate script needs to gather necessary information before processing any incoming HTTP request. It could be user profiles such as customer categories (e.g., paid vs. free, or test vs. live), GeoIP database, list of sites that should be blocked, or data translation tables for converting contents of specific headers. For this, typically, the script needs to access external database or file servers, which may take some time. For this type of synchronous (or sequential) processing, we normally employ the Async NPM module: e.g., async.series([initialization, process]); The async.series() method ensures that the process function (2nd element in the array) kicks off after completion of the initilization function (1st element). Well, what happens when HTTP requests hit the LineRate while the script is still initializing? The LineRate will pass the requests through to backend servers without any processing. This means, under A/B Web traffic steering scenario where test and live customers are directed to the test and production servers respectively for example, both customers are forwarded to the same server (where the requests go to depends on how you configure LineRate). This might be acceptable behavior in some implementations, but in some cases, you might want to block access completely until the system becomes ready with full information. This can be achieved by preparing a function that starts up quick and runs only during the initialization process. Using the async.series() method, it can be coded as below: async.series([process_pre, initialization, process_post]); The process_pre registers a listener function to the request event. The listener is active once process_pre kicks in and during the process of initialization. Once the initialization completes, you de-register the existing listener for the request event before you register the new listener for the post-initialization processing. De-registration is necessary because you can only register one listener to virtualServer's request event. This can be done by the Event.removeListener() method as below. VirtualServerModule.removeListener('request', process_pre); In the sample Node.js code below, the initialization function ( initialize) does nothing but 10s sleep. While initializing, the 1st listener ( process_pre) returns the 503 "Service Unavailable" error message to clients. It also returns the Retry-After header to indicate the maximum time for the service to become fully available. You can replace it with your intialization codes (e.g., database population or remote file access). As this is just an sample, the post-initialization function ( process_post ) does not do much: It just inserts a proprietary header for debugging purpose. 'use strict'; var vsm = require('lrs/virtualServerModule'); var async = require('async'); var vso; var sleep = 10; // sleep time (s) var message = 'Service Unavailable\nTemporarily down for maintenance' // Script initialization required to do any request processing // - assuming it takes a long time to complete. var initialize = function(callback) { setTimeout(function() { callback(null, 'initialization completed'); }, sleep*1000); }; // Request handling before initialization // Send back 503 "Service Unavailable" var process_pre = function(servReq, servResp, cliReq) { servResp.writeHead(503, { 'Content-Type': 'text/plain', 'Content-Length': message.length, 'Retry-After': sleep}); servResp.end(message); }; var register_pre= function(callback) { vso.on('request', process_pre); callback(null, 'pre-initialization processing'); }; // request handling (after initialiation) var process_post = function(servReq, servResp, cliReq) { // do whatever you need to do to the requests with the info obtained // through the initialization process servReq.addHeader('X-LR', 'LineRate is ready.'); cliReq(); }; var register_post = function(callback) { vso.removeListener('request', process_pre); vso.on('request', process_post); callback(null, 'post-initialization processing'); }; vsm.on('exist', 'vs40', function(v) { vso = v; async.series([register_pre, initialize, register_post], function(e, stat) { if (e) console.error(e); else console.log(stat); }); }); if (! vso) { console.log('vs40 not found. Script is not running properly.'); } LineRate may spawn multiple HTTP processing engines (called lb_http) depending on a number of vCPUs (cores/hyper-threads). In the code above, all the engines perform the same initialization process. However, if the result from an initialization can be shared amongst the engines, you might want to run it just once on a designated process for all the processes: e.g., data population onto a shared database.While you can achieve this by running the initialization only on a master process using Process.isMaster() method (LineRate extension), you now need to come up with a mechanism to share information amongst the processes. You can find data sharing methods in "A Variable Is Unset That I Know Set" in our Product Documentation or "LineRate and Redis pub/sub" DevCentral article. Please leave a comment or reach out to us with any questions or suggestions and if you're not a LineRate user yet, remember you can try it out for free.297Views0likes0Comments