nodejs
17 TopicsExtracting X-Forwarded-For in Node.js
In a perfect world, developers would never need to know how a request go to their app. But we live on the Internet, where end-to-end protocols are used to transport messages across networks that are point-to-point, not end-to-end. Each point is represented by an intermediate device of diverse behavior that interacts with requests (and responses) in different ways. Some are transparent and do not modify data developers need. Some are not, by design and necessity, and inadvertently introduce challenges for developers in retrieving that data. The most common example of this reality is associated with load balancers / proxies. Load balancers and proxies can – and by default do in modern architectures – make modifications to network characteristics than can frustrate developers. Many apps and services need to know the original client IP address from which a request came. Fraud-detection, bot identification, tracking, identity and other needs drive this requirement. The problem is that proxies/load balancers often replace that IP address with their own. This is an architectural by-product of load balancing and full-proxies as well as some security-related application services. This configuration facilitates the need for optimization, isolation, and supports emerging architectures like those based on containers to enables intra-container networking to operate without the constraints placed on public-facing networking. That means that services and applications inside the application infrastructure are not receiving the original client IP address, but rather the IP address of the proxy performing ingress routing or load balancing. This is not a new problem. This has been an issue for developers for many years. The resolution remains the same: proxies can be configured to pass on the original client IP address via HTTP headers. Many do this by default, and though it is not an official standard it is best practices and a de facto standard to use the custom X-Forwarded-For HTTP header. What remains, then, is for developers to extract that value in the event they need the original client IP address. Which brings me to the focus of this post, which is getting the value of X-Forwarded-For from within node.js. It turns out that this is fairly straight forward. Like most languages for which HTTP is a staple protocol, node.js parses out HTTP and provides access to its headers as an array. Unlike those old skool structured languages (I’m looking at you C/C++) grabbing an HTTP header from that array is as simple as knowing what it’s called. The request object created by a successful parse of an incoming HTTP request contains an array referenced as “headers”. To extract a specific header, simply index the array with the name of that header, a la: request.headers[“X-Forwarded-For”]. That said, the use of “X-Forwarded-For” is not mandated by any specification or standard. It’s best practice and of course it’s polite to use it, but many proxies and frameworks use other custom headers in which to forward the same information. This is a great opportunity to talk to the networking team and find out what HTTP header to use. Or, if the thought makes you queasy, I encourage you to check out this node.js package called “forwarded-for” on github that seeks out the original client IP address in known locations based on a variety of proxies. As an added benefit, the module gracefully falls back to returning the client address. Because it’s open source, if your specific custom HTTP header isn’t included, you can always add it in – and don’t forget to contribute back!4.5KViews0likes0CommentsEnforcing HSTS (HTTP Strict Transport Security) in LineRate
HTTP Strict Transport Securityis a policy between your customer's browsers and your servers to increase security. It forces the browser to always use HTTPS when connecting to your site. The server or proxy needs to set the Strict-Transport-Security header. If the client connects sometime in the future and isn't offered a valid SSL cert, it should show an error to the user. Also, if the client is somehow directed to a plaintext URL at your site, for instance from the address bar, it should turn it into an HTTPS URL before connecting. This policy can prevent simple attacks where an attacker is positioned in the network temporarily between a client and your site. If they can connect to the client plaintext, and the user isn't carefully checking for the green browser lock symbol, they can act as a man in the middle and read all the data flowing between clients and servers. SeeMoxie Marlinspike's presentation. HSTS also saves you if you're worried about some piece of infrastructure accidentally emitting a non-secure link. Modern sites are a hybrid of client-side and server-side code; browser facing content and APIs; core applications and peripheral systems like analytics, support, or advertising. What are the odds that one of these systems will someday use a URL that's not HTTPS? With HSTS, you can prevent attacks that take advantage of this accident. To make HSTS effective in this case, you should place it in a proxyoutsideof all of these systems. First I'll show a simple script to enable HSTS on LineRate; next I'll show an enhancement to detect the plaintext URLs that are leaking on clients that don't obey HSTS. Simple: Add HSTS Header to Responses To enable HSTS for your site, simply catch responses and add the "Strict-Transport-Security" header: var vsm = require('lrs/virtualServerModule'); // Set this to the amount of time a browser should obey HSTS for your site var maxAge = 365*24*3600; // One year, in seconds function setHsts(servReq, servRes, cliReq) { cliReq.on('response', function (cliRes) { cliRes.bindHeaders(servRes); servRes.setHeader('Strict-Transport-Security', 'max-age=' + maxAge); cliRes.fastPipe(servRes); }); cliReq(); } vsm.on('exist', 'yourVirtualServerName', function (vs) { vs.on('request', setHsts); }); For any requests that come through a virtual server named yourVirtualServerName , LineRate will add the header to the response. In this example, the maxAge variable means that the browser should enforce HTTPS for your site for the next year from when it saw this response header. As long as users visit your site at least once a year, their browser will enforce that all URLs are HTTPS. Advanced: Detect plaintext leaks and HSTS issues In a more advanced script, you can also detect requests to URLs that aren't HTTPS. Note that HSTS requires the browser to enforce the policy; some browsers don't support it (Internet Explorer does not as of this writing; Safari didn't until Mavericks). For those users, you'll still need to detect any plaintext "leaks". Or, maybe you're a belt-and-suspenders kind of person, and you want to tell your servers to add HSTS, but also detect a failure to do so in your proxy. In these cases, the script below will detect the problem, collect information, record it, and workaround by redirecting to the HTTPS URL. var vsm = require('lrs/virtualServerModule'); var util = require('util'); // Set this to the domain name to redirect to var yourDomain = 'www.yoursite.com'; // Set this to the amount of time a browser should obey HSTS for your site var maxAge = 365*24*3600; // One year, in seconds var stsValParser = /max-age=([0-9]+);?/; function detectAndFixHsts(servReq, servRes, cliReq) { cliReq.on('response', function (cliRes) { cliRes.bindHeaders(servRes); var stsVal = cliRes.headers['strict-transport-security']; var stsMatch = stsVal ? stsValParser.match(stsVal) : []; if (stsMatch.length !== 1) { // Strict-Transport-Security header not valid. console.log('[WARNING] Strict-Transport-Security header not set ' + 'properly for URL %s. Value: %s. Request Headers: %s' + ', response headers: %s', servReq.url, stsVal, util.inspect(servReq.headers), util.inspect(cliRes.headers)); servRes.setHeader('Strict-Transport-Security', 'max-age=' + maxAge); } cliRes.fastPipe(servRes); }); cliReq(); } function redirectToHttps(servReq, servRes, cliReq) { // This is attached to the non-SSL VIP. var referer = servReq.headers['referer']; if (referer === undefined || (referer.lastIndexOf('http://' + yourDomain, 0) == -1)) { // Referred from another site on the net; not a leak in your site. } else { // Leaked a plaintext URL or user is using a deprecated client console.log('[WARNING] Client requested non-HTTPS URL %s. ' + 'User-Agent: %s, headers: %s', servReq.url, servReq.headers['user-agent'], util.inspect(servReq.headers)); } var httpsUrl = 'https://' + yourDomain + servReq.url; var redirectBody = '<html><head><title> ' + httpsUrl + ' Moved</title>' + '</head><body><p>This page has moved to <a href="' + httpsUrl + '">' + httpsUrl + '</a></p></body></html>'; servRes.writeHead(302, { 'Location': httpsUrl, 'Content-Type' : 'text/html', 'Content-Length' : redirectBody.length }); servRes.end(redirectBody); } vsm.on('exist', 'yourHttpsVirtualServer', function (vs) { vs.on('request', detectAndFixHsts); }); vsm.on('exist', 'yourPlainHttpVirtualServer', function (vs) { vs.on('request', redirectToHttps); }); Note that logging every non-HTTPS request can limit performance and fill up disk. Alternatives include throttled logging (try googling "npm log throttling"), or recording URLs to a database, or keeping a cache of URLs that we've already reported. If you're interested, let me know in the comments and I can cover some of these topics in future blog posts.1.1KViews0likes0CommentsWorking with Node.js variable type casts, raw binary data
I recently started writing an application in Node.js that dealt with reading in raw data from a file, did some action on it, then send the data over http connection in HTTP body as multipart/binary. Until now I always dealt with text and strings. Through data validation on bit level I've learned the hard way what creating type mismatches does to variables and how it compromises data integrity. Almost all examples I've come across online assume one is dealing with strings, and none when dealing with raw binary data. This article is product on my mishaps, and thorough analysis by colleagues who provided much valuable insight into Node.js rules. Let's start with a simple example, where we define several variables of different type, then use += operator to concatenate string data to these variables. "use strict"; var fs = require('fs'); var mydata = 'somedata'; var tmpvar1; var tmpvar2 = ''; var tmpvar3 = null; var tmpvar4 = []; var tmpvar5 = {}; var tmpvar6 = null; var tmpvar7 = 0; tmpvar1 += mydata; tmpvar2 += mydata; tmpvar3 += mydata; tmpvar4 += mydata; tmpvar5 += mydata; tmpvar6 = mydata; tmpvar7 += mydata; console.log('length of mydata is: ',mydata.length,' , length of tmpvar1 is: ',tmpvar1.length, ' , tmpvar1 contents are: '+tmpvar1); console.log('length of mydata is: ',mydata.length,' , length of tmpvar2 as \'\' is: ',tmpvar2.length, ' , tmpvar2 contents are: '+tmpvar2); console.log('length of mydata is: ',mydata.length,' , length of tmpvar3 as null is: ',tmpvar3.length, ' , tmpvar3 contents are: '+tmpvar3); console.log('length of mydata is: ',mydata.length,' , length of tmpvar4 as \[\] is: ',tmpvar4.length, ' , tmpvar4 contents are: '+tmpvar4); console.log('length of mydata is: ',mydata.length,' , length of tmpvar5 as \{\} is: ',tmpvar5.length, ' , tmpvar5 contents are: '+tmpvar5); console.log('length of mydata is: ',mydata.length,' , length of tmpvar6 as assigned (not appended) is: ',tmpvar6.length, ' , tmpvar6 contents are: '+tmpvar6); console.log('length of mydata is: ',mydata.length,' , length of tmpvar7 as 0 is: ',tmpvar7.length, ' , tmpvar7 contents are: '+tmpvar7); When ran this is the output it produces. Comparing length of variables, contents of the variables, and correlating that to type definition for each tmpvar we can explain what is occuring. Inline are comments on why things are behaving this way. length of mydata is: 8 , length of tmpvar1 is: 17 , tmpvar1 contents are:undefinedsomedata The first concern is normal Javascript behavior (undefined variable += string). Javascript attempts to be smart about type casting into what you are asking the script to do, so in this case add a string to an undefined variable.Undefined value is being converted to a string, and concatenated with mydata.The primitive type undefined will presume a value of string “undefined” when you attempt to add any string to it. In an actual program, it wouldd probably make sense to either 1) initialize the variable to a null string or 2) do a type check for undefined if(typeof tmpvar1 === ‘undefined’) and just assign the variable directly to the first string value { tmpvar1 = mydata; } length of mydata is: 8 , length of tmpvar2 as '' is: 8 , tmpvar2 contents are:somedata length of mydata is: 8 , length of tmpvar3 as null is: 12 , tmpvar3 contents are:nullsomedata length of mydata is: 8 , length of tmpvar4 as [] is: 8 , tmpvar4 contents are:somedata length of mydata is: 8 , length of tmpvar5 as {} is: 23 , tmpvar5 contents are:[object Object]somedata length of mydata is: 8 , length of tmpvar6 as assigned (not appended) is: 8 ,tmpvar6 contents are: somedata Assigning a string causes the resulting variable type to be a string. length of mydata is: 8 , length of tmpvar7 as 0 is: 9 , tmpvar7 contents are: 0somedata 0 is converted to string and concatenated Now our example is changed to read raw binary data from a file: var t1 = 1; var tmpvar21; var tmpvar22 = ''; var tmpvar23 = null; var tmpvar24 = []; var tmpvar25 = {}; var tmpvar26 = null; var tmpvar27 = 0; var fsread = fs.createReadStream("file.sample", { end: false }); // file.samle is any binary file larger then 64KB. fsread.on('error',function(e){ console.log('debug -- got file read error: ',e); }).on('readable', function() { if(t1 == 1) { var chunk = fsread.read(); t1 = 0; } // Reads in a chunk from file, chunk size is default else { var chunk = fsread.read(20); t1 = 1;} //Reads in a chunk from file, chunk size is 20 tmpvar21 += chunk; tmpvar22 += chunk; tmpvar23 += chunk; tmpvar24 += chunk; tmpvar25 += chunk; tmpvar26 = chunk; tmpvar27 += chunk; console.log('length of chunk is: ',chunk.length,' , length of tmpvar21 is: ',tmpvar21.length); console.log('length of chunk is: ',chunk.length,' , length of tmpvar22 as \'\' is: ',tmpvar22.length); console.log('length of chunk is: ',chunk.length,' , length of tmpvar23 as null is: ',tmpvar23.length); console.log('length of chunk is: ',chunk.length,' , length of tmpvar24 as \[\] is: ',tmpvar24.length); console.log('length of chunk is: ',chunk.length,' , length of tmpvar25 as \{\} is: ',tmpvar25.length); console.log('length of chunk is: ',chunk.length,' , length of tmpvar26 as assigned (not appended) is: ',tmpvar26.length); console.log('length of chunk is: ',chunk.length,' , length of tmpvar27 as 0 is: ',tmpvar27.length); if(t1) { process.exit(0); } }).on('end', function() { process.exit(1); }) Output I get running node v0.12 is: length of chunk is: 65536 , length of tmpvar21 is: 65544 length of chunk is: 65536 , length of tmpvar22 as '' is: 65535 Since we have not called fsread.setEncoding(), fs.read() is returning a buffer. Hence this is astring + buffer operation, or interpreted by node as string + buffer.toString().This indicates that the toString() on the buffer returns 65535 characters, from 65536 bytes. Since data read in is raw binary,guess is that we have a non UTF8 character that gets removes when converted to a string. length of chunk is: 65536 , length of tmpvar23 as null is: 65539 length of chunk is: 65536 , length of tmpvar24 as [] is: 65535 length of chunk is: 65536 , length of tmpvar25 as {} is: 65550 length of chunk is: 65536 , length of tmpvar26 as assigned (not appended) is: 65536 length of chunk is: 65536 , length of tmpvar27 as 0 is: 65536 This is number + buffer. Looks like both are converted to strings, the length will be one more then tmpvar22, which it is. length of chunk is: 20 , length of tmpvar21 is: 65564 length of chunk is: 20 , length of tmpvar22 as '' is: 65555 length of chunk is: 20 , length of tmpvar23 as null is: 65559 length of chunk is: 20 , length of tmpvar24 as [] is: 65555 length of chunk is: 20 , length of tmpvar25 as {} is: 65570 length of chunk is: 20 , length of tmpvar26 as assigned (not appended) is: 20 length of chunk is: 20 , length of tmpvar27 as 0 is: 65556 Lesson here is do not mix variables with different type definitions, and if you do ensure you are getting the result you want! So how do we deal with raw data, if there is no raw data variable type. Node.js uses Buffer class for this. If you plan to use a buffer variable type to append data to you need to initialize it with new Buffer(0). Also note that using += operator to append Buffer data containing raw binary data does not work. We need to use Buffer.concat() for this. Here is sample code: var mybuff = new Buffer(0); var fsread = fs.createReadStream("file.sample"); fsread.on('error',function(e){ console.log(‘Error reading file: ‘,e); }).on(‘data’, function(chunk) { mybuff = Buffer.concat([mybuff,chunk]); }).on('end', function() { process.exit(1); }); If you have a large amount of raw data you want to read in, then take action on, suggestion is not to use Buffer.concat() to create one large buffer. Instead, for better performance push the data into an array and iterate through array elements at the end. If at all possible deal with the data on the spot avoiding having to cache it, making your app more dynamic and less dependent on memory resources. Certainly, if you are just reading and writing raw data from streams(filesystem to HTTP, or vice versa), using Node.js stream.pipe() is the way to do it. var myarray = []; var fsread = fs.createReadStream("file.sample"); fsread.on('error',function(e){ console.log(‘Error reading file: ‘,e); }).on(‘data’, function(chunk) { myarray.push(chunk); }).on('end', function() { process.exit(1); });836Views0likes0CommentsLineRate logging, JSON and LogStash
Data collections and data mining are big business all over the web. Whether you are trying to monetize the data or simply track data points for statistical analysis, data storage, formatting and display are essential. Most web-facing applications have logging capabilities built in and work pretty well for storing this data on direct- or network-attached storage. But when you have tens, hundreds and even thousands of applications logging data, storing, aggregating, parsing and, ultimately, visualizing this data can get pretty difficult. Some large web properties can generate 10's or even 100's of GB's of log data per day. One way to simplify this process is to log everything to a central point in the network. Think syslog on steroids. Since all your traffic is passing through it anyway, the load balancer is a great place to do this. Not only can you record "normal" HTTP statistics, like request method and URI, but logging from the load balancer also allows you to include virtual server and real server data for each request. In this article, I'll show you how you can gather various data points for HTTP requests using the LineRate proxy and the embedded Node.js scripting engine. You can then format and ship this data to any number of data/log collection services. In this example, I'm going to send JSON formatted data to logstash - "a tool for managing events and logs". (By default, logstash includes ElasticSearch for it's data store and the Kibana web interface for data visualization.) logstash is an open source project and installs easily on Linux. The logstash 10 minute walkthrough should get you started. We're going to configure logstash to ingest JSON formatted data by listening on a TCP port. On the LineRate side, we'll build a JSON object with the data we're interested in and use a TCP stream to transmit the data. I'll use the TCP input in logstash to ingest the data and then the JSON filter to convert the incoming JSON messages to a logstash event. logstash adds a few fields to the data, but essentially leaves all the original JSON in it's original structure, so this filter is perfect if you're already working with JSON. Here's a simple logstash config file for this setup: input { tcp { mode => server port => 9999 } } filter { json { source => "message" } geoip { source => "[request][source_ip]" } } output { elasticsearch { host => localhost } stdout { codec => rubydebug } } You might notice that I also added the 'geoip' filter. This is another great built-in feature of logstash. I simply enable the filter and tell logstash which of my input data fields contains the IP address that I want geo data for and it takes care of the rest. It uses its built in GeoIP database (via the GeoLiteCity database from MaxMind) to get data about that IP and adds the data to the message. By default, this filter adds a lot of geo data to the message. You might want to trim some of the fields if it's more than you need. (For example, you might not want latitude and longitude.) Here's a sample screenshot of logstash/kibana with data logged from a LineRate proxy: Here's the Node.js script for LineRate. 'use strict'; var vsm = require('lrs/virtualServerModule'); var net = require('net'); var os = require('os'); var cm = require('connman'); // change these variables to match your env var virtual_server = 'vs_http'; var logging_server = { 'host': '10.190.5.134', 'port': 9999 }; var hostname = os.hostname(); var pid = process.pid; var sock; function onData() { // noop } function onReset() { // noop } function onReconnect() { // noop } function onConnect(socket) { console.log("Socket connected, pid = ", process.pid); sock = socket; } function log_message(message) { try { sock.write(JSON.stringify(message) + '\r\n'); } catch (ex) { // if you want an exception logged // when a write failure occurs, uncomment // the next line. Note this could have // very negative effects if you are logging // at a high request rate //console.log(ex); } } cm.connect(logging_server.port, logging_server.host, onConnect, onReset, onData); var processRequest = function(servReq, servResp, cliReq) { // object to store stats var message = {}; // record request start time var time_start = process.hrtime(); message['request'] = { 'method': servReq.method, 'url': servReq.url, 'version': servReq.httpVersion, 'source_ip': servReq.connection.remoteAddress, 'source_port': servReq.connection.remotePort, 'headers': servReq.headers }; message['vip'] = { 'virtual_ip': servReq.connection.address().address, 'virtual_port': servReq.connection.address().port, 'virtual_family': servReq.connection.address().family }; message['lb'] = { 'virtual_server': virtual_server, 'hostname': hostname, 'process_id': pid }; servReq.on('response', function onResp(cliResp) { message['real'] = { 'ip': cliResp.connection.remoteAddress, 'port': cliResp.connection.remotePort }; cliResp.bindHeaders(servResp); cliResp.pipe(servResp); // get total request time var time_diff = process.hrtime(time_start); //console.log('setting response time: ' + time_diff[1]); message['response'] = { 'headers': cliResp.headers, 'status_code': cliResp.statusCode, 'version': cliResp.httpVersion, 'time_s': time_diff[0] + (time_diff[1] / 1e9) }; log_message(message); }); /* continue with request processing, send the request to real-server */ cliReq(); }; var createCallback = function(virtualServerObject) { console.log('Logging script installed on Virtual Server: ' + virtualServerObject.id); virtualServerObject.on('request',processRequest); }; vsm.on('exist', virtual_server, createCallback); Finally, here's a sample message in raw JSON format retrieved from the elasticsearch datastore: { "_index": "logstash-2014.11.03", "_type": "logs", "_id": "PZTeM0KWRfiju9qUifmDhQ", "_score": null, "_source": { "message": "(redacted)", "@version": "1", "@timestamp": "2014-11-03T22:32:22.930Z", "host": "192.168.88.155:59625", "geoip": { "area_code": 614, "city_name": "Columbus", "continent_code": "NA", "country_code2": "US", "country_code3": "USA", "country_name": "United States", "dma_code": 535, "ip": "172.16.87.1", "latitude": 39.96119999999999, "location": [ -82.9988, 39.96119999999999 ], "longitude": -82.9988, "postal_code": "43218", "real_region_name": "Ohio", "region_name": "OH", "timezone": "America/New_York" }, "request": { "method": "GET", "url": "/test/page", "version": "1.1", "source_ip": "172.16.87.1", "source_port": 59721, "headers": { "User-Agent": "curl/7.30.0", "Connection": "close", "Accept": "*/*", "Host": "lrosvm", "X-Fake-Header": "fake-header-value" } }, "vip": { "virtual_ip": "172.16.87.153", "virtual_port": 80, "virtual_family": "IPv4" }, "lb": { "virtual_server": "vs_http", "hostname": "lros01", "process_id": 8177 }, "real": { "ip": "192.168.233.1", "port": 15000 }, "response": { "headers": { "Content-Length": "36", "Date": "Mon, 03 Nov 2014 22:32:22 GMT", "Content-Type": "text/plain" }, "status_code": 200, "version": "1.1", "time_s": 0.00395572 }, "_source": {} }, "sort": [ 1415053942930, 1415053942930 ] } Please leave a comment or reach out to us with any questions or suggestions and if you're not a LineRate user yet, remember you can try it out for free.655Views0likes0CommentsLineRate: Range header attack mitigation
Using the LineRate Node.js engine to mitigate HTTP Range header attacks on backend systems The latest details are emerging about a Range header vulnerability in Microsoft IIS (see MS15-034 and CVE-2015-1635). There have been other previous exploits in the byte range header, as well. F5 has several products available that can protect your backend servers from these exploits, including BIG-IP iRules, ASM (see Mitigating Remote Code Execution in "HTTP.sys" (CVE-2015-1635)), and LineRate. The LineRate Node.js script below will protect your backend servers by preventing requests with malformed and/or malicious range headers from ever reaching them in the first place. It will check the Range header and return a 416 status code ("Requested Range Not Satisfiable") in any of the following situations: A malformed header value Too many ranges requested (configurable value, default to 10) Range value too large (configurable value, default to 1GB) (Some might feel that using status code 416 is a little too liberal in this scenario. If that's you, feel free to swap 416 with 403, 400, 404 or whatever else suits you.) The Script If you're not already familiar with Node.js and the LineRate scripting engine, be sure to check out the LineRate Scripting Developer's Guide. Testing Malformed header: > curl -w " (%{http_code})\n" -s -H 'Range: malformed' http://172.16.87.157 Malformed header or invalid range (416) Too many ranges: > curl -w " (%{http_code})\n" -s -H 'Range: bytes=1-2,2-3,3-4,4-5,5-6,6-7,7-8,9-10' http://172.16.87.157 Too many ranges (416) Range value too large: > curl -w " (%{http_code})\n" -s -H 'Range: bytes=1-10000000000' http://172.16.87.157 Range value exceeds allowed maximum (416) Safe request: talley@BLD-ML-BTALLEY:~/ > curl -w "%{http_code}\n" -s -H 'Range: bytes=1-100' http://172.16.87.157 200 Please leave a comment or reach out to us with any questions or suggestions and if you're not a LineRate user yet, remember you can try it out for free.604Views0likes0CommentsLineRate: Excessive HTTP 404 Throttling
Fusker thwarting using the LineRate Node.js datapath scripting engine Fuskering is so fun to say, I couldn't resisting writing article about it. But, aside from just raising eyebrows when you use the term, fuskering is a real problem for some site maintainers. And having been in this position myself, I can verify that it's a difficult problem to solve. A flexible, programmable data-path, like the LineRate load balancer, makes light work of solving these kinds of problems. Background So, what exactly is fuskering? Simply stated, fuskering is requesting successive URL paths using a known pattern. For example, if you knew example.com had images stored at http://example.com/img01.jpg and http://example.com/img02.jpg . You might venture to guess that there is also an image at http://example.com/img03.jpg . And if I find that img03.jpg was there, I might as well try img04.jpg . Utilities, like curl, make automating this process extremely easy. Photo sites are a typical target for fuskering because image filenames are usually pretty predictable. Think about a URL like http://example.com/shard1/user/jane/springbreak14/DSCN5029.jpg and you start to see where this could be a problem. Not only is this a potential privacy concern, but it's also a huge burden on the datacenter assets serving those files. In some multi-tier architectures, serving a 404 is actually more burdensome than serving an asset that exists. When you request something that doesn't exist, it's possible that all of the following could happen: Cache miss on CDN, CDN requests from origin Front end load balancer receives requests, makes balancing decision, forwards request Web tier receives request, processes and sends to caching tier Caching tier receives request and consults memory cache and then disk cache. Another cache miss Services API tier receives request for URL to file system mapping At this point, either your well written object storage API will signal to you that the file really doesn't exist or you're not using object storage and you have to actually make a request to the local disk or NAS. In either case, that's a lot of work just to find out that something doesn't exist. Assets that do exist, end up in one of the caching tiers and are found and served much earlier in this process, typically right from CDN and the request never even touches your infrastructure. If your site is handling a lot of requests and they are spread across many servers - and possibly many data centers - correlating all the data to try and mitigate this problem can be tedious and time consuming. You have to log each request, aggregate it somewhere, perform analytics and then take action. All the while, your infrastructure is suffering. Options and limitations Requiring authentication and filename scrambling are two ways to reduce the likelihood that your site will attract fuskers in the first place. Of course, these methods do not actually make fuskering impossible, but by requiring the user to enter identifying information or by making the filenames extremely difficult to guess, the potential consequences and level of effort become too great, and the user will likely move on. The technique detailed in this article is just one of many ways to combat fuskers. Some other possible solutions are user-agent string checking, using CAPTCHA, using services like CloudFlare, traffic scrubbing facilities, etc. None of these is a silver bullet (or cheap, in some cases), but you could evaluate them all and figure out what works best for your environment and your wallet. There are also ways for a determined user to subvert a lot of these protective measures: Tor, X-Forwarded-For spoofing, using multiple source IPs, and adaptive scripts to minimize the effect of the block time window (i.e. Send max_404 requests, wait time_window , repeat), etc. [One] Solution Having a programmable data path makes solving (or at least mitigating) this issue easy. We can track each HTTP session, analyze the request and response, and have all the details we need to detect excessive 404's and throttle them. I provide a complete script to do this at the end; I'll describe the solution and how the script works next. This article uses fuskering as motivation for this solution, but realize that the source of excessive 404's could come from a variety of sources, such as: people that automate data collection from your site (and forget to update request paths when they change), a misbehaving application, resources that moved without a proper 301/302 redirect, or a true 404 DoS attack (which is meant to exploit all the things I mention in the Background section), just to name a few. Script overview We're going to use the local LineRate Redis instance for tracking the 404 info. We're storing a key-value pair, where the key is the client's IP address and the value is the number of 404 responses that client has received in a configurable time window. This "time window" is handled by setting an expiration on the key-value pair and then extending it if necessary. If no 404's are detected during the grace period, the entry expires and the client is not subject to any request throttling. When a new request is received, the client's IP is determined (see next section on Source IP) and checked against the Redis database. If a db entry is found and the corresponding value exceeds the allowed number of 404's, we intercept the request and respond directly from the load balancer with an HTTP 403. On the response side, when we detect a 404 is being returned to a client, we increment the counter for the client IP in the Redis db. If the client's IP doesn't exist, we add it and init the value to '1'. In either case, the time window is also set. The time window and the maximum number of 404's are configurable via the config object. Source IP To accurately analyze the 404's, you need to know the client's true source IP. Remember that any connection coming through a proxy is not going to have the actual client's source IP. Enter the X-Forwarded-For (XFF) header. If present, the XFF header will contain an ordered, comma-separated list of IP addresses. Each IP identifies another proxy, load balancer or forwarding device that the request passed through before it got to you. IPs are appended to this list, so the first IP is that of the actual client. In our script logic, we can check for the XFF header and if it's present, use the first IP in this list as the client IP. In the absence of a XFF header, we'll simply use the 'remoteAddress' of the connection object. redis There's a couple important things to point out in regards to using the included Redis server. First, the LineRate load balancer runs multiple instances of the Node.js engine and variables are unique to that instance. If you were to store 404 tracking info in local variables, you might get results that you don't expect. See here for more info. Second, using redis lends itself especially well to this example because you can run this script on all the virtual-servers for your site and get instant aggregated analysis and action. The Script If you're not already familiar with Node.js and the LineRate scripting engine, be sure to check out the LineRate Scripting Developer's Guide. requires and config Load the required modules and initialize the config object. You might want to tune config.time_window and config.max_404 to your environment. Continue reading to gain a better understanding of the implications of changing these values. var vsm = require('lrs/virtualServerModule'); var async = require('async'); var redis = require('redis').createClient(); // Change config as needed. var config = { vs: 'vs_http', // name of virtual-server time_window: 10, // window in seconds max_404: 10 // max 404's per time window }; redis Pretty basic stuff here, but note that we're loading the module and creating a client object all in one line. var redis = require('redis').createClient(); redis.on('error', function (err) { console.log('Error' + err); }); redis.on('ready', function () { console.log('Connected to redis'); }); onRequest() - async waterfall The async module is used to provide some structure to the code and to ensure that things happen in the proper order. When we receive a new request from a client, we get the client's IP, check it against the database and then handle the the rest of the request/response processes. Each of the functions are detailed next. function onRequest(servReq, servResp, cliReq) { async.waterfall([ function(callback) { get_client_ip(servReq, callback); }, function(client_ip, callback) { check_client(client_ip, callback); }, function(throttle, client_ip, callback) { doRequest(servResp, cliReq, throttle, client_ip, callback); }, ], function (err, result) { if (err) { throw new Error(err); // blow up } }); } get_client_ip() Check for the presence of the XFF header. If present, get the client IP from the header value. If not, use remoteAddress from the servReq connection object. function get_client_ip(servReq, callback) { var client_ip; // check xff header for client ip first if ('x-forwarded-for' in servReq.headers) { client_ip = servReq.headers['x-forwarded-for'].split(',').shift(); } else { client_ip = servReq.connection.remoteAddress; } return callback(null, client_ip); } check_client() check_client() is where we determine whether to block the request. If the client's IP is in redis and the corresponding value exceeds that of config.max_404 , we set throttle to true . Else, throttle remains false . throttle is used in the next function to either allow or block the request. function check_client(client_ip, callback) { var throttle = false; redis.get(client_ip, function (err, reply) { if (err) { return callback(err); } if (reply >= config.max_404) { throttle = true; } return callback(null, throttle, client_ip); }); } doRequest() In doRequest() , the first thing we do is check to see if throttle is true . If it is, we simply return a 403 and close the connection. If you wanted more aggressive throttling, you could also update the expiration time of the redis key associated with this client's IP here. If there is no throttle, we register a listener for the 'response' to the cliReq() and send the request on to the client. When we receive the response, we check the status code. If it's a 404, we increment the redis 404 counter. For any client that requests more than config.max_404 in a rolling window of config.time_window will start to get blocked. Once the time window passes, the requests will be allowed again. function doRequest(servResp, cliReq, throttle, client_ip, callback) { if (throttle) { servResp.writeHead(403); servResp.end('404 throttle. Your IP has been recorded.\n'); // note you could choose to bump the redis key timeout here // and effectively lock out the user completely (even for good requests) // until they stop ALL requests for 'time_window' return callback(null, 'done'); } else { cliReq.on('response', function(cliResp) { var status_code = cliResp.statusCode; if (status_code === 404) { redis.multi() .incr(client_ip) .expire(client_ip, config.time_window) .exec(function (err, replies) { if (err) { return callback(err); } }) } // Fastpipe response cliResp.bindHeaders(servResp); cliResp.fastPipe(servResp); }); cliReq(); return callback(null, 'done'); } } Testing This bash one-liner will use curl to send 15 consecutive requests for an image that doesn't exist and results in a 404 response. Note the change from '404' to '403' from request #10 to request #11. This is the throttling in action. > for i in $(seq -w 1 15);do echo -n "${i}: `date` :: "; curl -w "%{http_code}\n" -o /dev/null -s http://example.com/noexist.jpg;done 01: Fri Feb 13 09:59:44 MST 2015 :: 404 02: Fri Feb 13 09:59:44 MST 2015 :: 404 03: Fri Feb 13 09:59:44 MST 2015 :: 404 04: Fri Feb 13 09:59:44 MST 2015 :: 404 05: Fri Feb 13 09:59:44 MST 2015 :: 404 06: Fri Feb 13 09:59:44 MST 2015 :: 404 07: Fri Feb 13 09:59:44 MST 2015 :: 404 08: Fri Feb 13 09:59:44 MST 2015 :: 404 09: Fri Feb 13 09:59:44 MST 2015 :: 404 10: Fri Feb 13 09:59:44 MST 2015 :: 404 11: Fri Feb 13 09:59:44 MST 2015 :: 403 12: Fri Feb 13 09:59:44 MST 2015 :: 403 13: Fri Feb 13 09:59:44 MST 2015 :: 403 14: Fri Feb 13 09:59:44 MST 2015 :: 403 15: Fri Feb 13 09:59:44 MST 2015 :: 403 Pulling it all together, here's the full script. Happy cloning! Please leave a comment or reach out to us with any questions or suggestions and if you're not a LineRate user yet, remember you can try it out for free.405Views0likes0CommentsLineRate: HTTP session ID persistence in scripting using memcache
Using the LineRate Node.js datapath scripting engine to achieve session-based client/server affinity LineRate introduced the selectServer() method and the newServerSelected event extensions to the built-in Node.js http module in version 2.4. This opens up a lot of possibilities for dynamically selecting real-servers based on HTTP session information, such as HTTP headers (cookies, user agents) or any other metric, such as server load, time of day, geolocation, or pretty much any other metric you can think of. One use-case is particularly useful: obtaining server affinity based on a session ID such as PHPSESSION (PHP), JSESSIONID (tomcat), ASP.NET_SessionId (ASP), and connect.sid (Express/Connect). Read on to see how. A quick aside: LineRate added cookie-based real-sever affinity way back in version 1.6. This is a powerful feature in it's own right and might get you want you want with a minimal amount of effort - I urge you to check it out. But, if you're averse to injecting a new cookie or you just need more fine-grained control of the affinity algorithm, read on. I'll show you a script and walk through some of the highlights to accomplish session ID persistence via scripting below. For bonus points, I'm going to use memcache to store a key-value pair, which will consist of the session ID (key) and the real-server name (value). (See here for why you don't want to just store this data in a javascript data structure.) A Redis server is already pre-installed and ready to go on your LineRate instance, and it works great, but I already wrote an example of using Redis in another article, so I'll demo something different here. (This is in contrast to memcache where you'll need a separate memcache server/cluster running somewhere on your network.) The selectServer() method extends the http.clientRequest class and allows you to specify the real-server to receive the request. The newServerSelected event fires whenever the system forwards the request to a real-server that you did not specify. This is particularly useful (and critical for this example) when you don't care which server handles the session, just that the same server continues to handle that session. Keep reading for a discussion on the various pieces of the script; see the very bottom for the full script. If you're not already familiar with Node.js and the LineRate scripting engine, be sure to check out the LineRate Scripting Developer's Guide. requires and config Load the required modules and update the config object as needed. var vsm = require("lrs/virtualServerModule"); var cookie = require("cookie"); var async = require("async"); var memcache = require('memcache'); // Change config as needed. var config = { session_id_key: "connect.sid", // session-id key being used vs: "vs_http", // name of virtual-server memcache_host: '172.16.87.154' }; memcache memcache is a "high-performance, distributed memory object caching system". It's a really good option for an in-memory (read: fast), distributed, key-value store. Since storing keys and values is exactly what we need to do for this script, we'll take it for a test drive. Here's what the memcache code is doing: We load the module, create a new client object, define some event listeners and then connect to the memcache server. We'll also use the get() and set() methods later. This memcache code could be re-used in any scenario where a caching server is needed. If the client emits a 'close' or 'error' event, we wait one second before trying to reconnect using setTimeout() . var memcache = require('memcache'); var memcache_client = new memcache.Client(11211, config.memcache_host); memcache_client.on('connect', function () { console.log('Connected to memcache server at ' + config.memcache_host + ':11211'); }); memcache_client.on('timeout', function () { console.log('Memcache connection timed out; reconnecting...'); memcache_client.connect(); }); memcache_client.on('close', function () { console.log('Memcache connection closed; reconnecting (waiting 1s)...'); // wait 1s before re-connecting setTimeout(function () { memcache_client.connect(); }, 1000); }); memcache_client.on('error', function (e) { console.log('Memcache connection error; reconnecting...'); console.log(e); // wait 1s before re-connecting setTimeout(function () { memcache_client.connect(); }, 1000); }); memcache_client.connect(); async waterfall When we receive a new request from a client, a few things need to happen serially. And each of those sub-processes relies on data from the previous process. This is a classic use-case for the 'waterfall' control flow from the async module. The waterfall flow will run a series of functions and pass the results of each function to the next function. In this case, we're doing three primary things: getting the session ID from the request, selecting the real-server based on the session ID and then making the request. Each of these three functions are detailed next. function onRequest(servReq, servResp, cliReq) { async.waterfall([ function(callback) { getSessionIdCookie(servReq, callback); }, function(sessionId, callback) { selectServer(sessionId, cliReq, callback); }, function (cachedServerName, callback) { doRequest(servReq, servResp, cliReq, cachedServerName, callback); } ], function (err, result) { if (err) { throw new Error(err); } }); } getSessionIdCookie() Just like the name of the function sounds, here we're getting the session ID from the cookie header in the request. The sessionID variable is initialized to undefined . If the request doesn't contain a session ID, the sessionID variable will remain undefined , otherwise sessionID gets set to, you guessed it, the session ID. function getSessionIdCookie(servReq, callback) { var sessionId; // check for existence of session-id cookie if (servReq.headers.cookie) { var cookies = cookie.parse(servReq.headers.cookie); sessionId = cookies[config.session_id_key]; } return callback(null, sessionId); } selectServer() Here's where things start to get a little more interesting; this is where we dynamically select a real-server to which to send the request. If the original request did not contain a session ID in the cookie header, the request will just be processed 'normally' - the LineRate system will pick a real-server based on the configured load balancing algorithm for the virtual server. If the request does contain a session ID cookie, we look up the session ID in memcache. If memcache already has an entry with this session ID, the request is part of a previous session and we send the request to the same server to which previous requests were sent using selectServer(). The unassuming cliReq.selectServer(serverName); line is the key to this whole script and is what gives us real-server affinity using the session ID. If the selected server is different than what the system would have chosen using the configured load balancing algorithm, this will cause the 'newServerSelected' event to fire (see next section). function selectServer(sessionId, cliReq, callback) { if (!sessionId) { // no session-id cookie, proceed to next step return callback(null, null); } // lookup session id in memcache memcache_client.get(sessionId, function (err, result) { if (err) { console.log(err); return callback(err); } else { var serverName = result; if (serverName) { cliReq.selectServer(serverName); } return callback(null, serverName); } }); } doRequest() There's some interesting stuff happening in this function. We're piping the original request to the real-server we found in the selectServer() function. We're also listening for the real-server's response (cliResp). Why? This is where we snoop for a 'set-cookie' header signalling to our script that this is the first response in a new session. This set-cookie contains the session ID for the session. We record this session ID and real-server name pair in memcache. The newServerSelected event is guaranteed to be emitted before the response event for cliReq . This ensures that we have all the data we need when the memcache set() is called in the response event handler. Also note that the "selected server" will be null for the first request of any new session. This is expected and allowed. The request will default to choosing a real-server based on the configured load balancing algorithm. Lastly, the astute reader will note that we're using the expiration time of the session cookie to configure the expiration time of the memcache entry. This will ensure that the session info is in memcache for the duration it's needed and then cleans up after itself once it's expired. function doRequest(servReq, servResp, cliReq, cachedServerName, callback) { var selectedServerName; // Register "newServerSelected" handler to save the real-server name // which was used to send out the request. cliReq.on("newServerSelected", function(newServerName) { selectedServerName = newServerName; }); // Register response handler to optionally update cache if a new server // selection was made. cliReq.on("response", function(cliResp) { if(!selectedServerName) { selectedServerName = cachedServerName; } // retrieve session-id cookie from "set-cookie" header in response var set_cookie_header = cliResp.headers["set-cookie"]; if (set_cookie_header) { // Note: the "set-cookie" header is always an array containing the // set-cookie string as the first element. var set_cookie_object = cookie.parse(set_cookie_header[0]); var sessionId = set_cookie_object[config.session_id_key]; // calculate memcache exptime from session expiration time var expiration = (new Date(set_cookie_object['Expires']).getTime())/1000; if (sessionId) { // set the memcache entry expiration to the expiration time // of the set-cookie memcache_client.set(sessionId, selectedServerName, function(err, result) { // error handling here; depends on your env }, expiration); } } // Fastpipe response cliResp.bindHeaders(servResp); cliResp.fastPipe(servResp); }); // Fastpipe request servReq.bindHeaders(cliReq); servReq.fastPipe(cliReq); return callback(null); } Testing I have a pair of test real-servers running Express with the session module, called 'srv1' and 'srv2'. Each Express server responds with a JSON object with some information about the session. Here's a simple series of commands demonstrating what happens when the script is running. Note the 'real-server' is the same for every request and the session ID is the same in the 2nd and 3rd request and in the cookie.jar file. (I truncated the session ID for readability.) > curl -c cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"HQ1xyGhmrAUjZeJCG15JO019Pkb8K4NJ","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"HQ1xyGhmrAUjZeJCG15JO019Pkb8K4NJ","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"HQ1xyGhmrAUjZeJCG15JO019Pkb8K4NJ","expires":300000} Here's a contrasting series of commands demonstrating what happens when the script is not running. Note the round-robin'ing between real-servers 'srv1' and 'srv2'. > curl -c cookie.jar http://172.16.87.157 {"real-server":"srv2","name":"connect.sid","session_id":"OlxEv3KPZHqZ0J0Nby0FVaEO_S8x061D","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"IJ2x2jxnEt5NP6y3pO_7AZalX2sHBfOu","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv2","name":"connect.sid","session_id":"OlxEv3KPZHqZ0J0Nby0FVaEO_S8x061D","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv1","name":"connect.sid","session_id":"SuIOpe7qSm9hj0bcimAAIqOturyir0-l","expires":300000} > curl -b cookie.jar http://172.16.87.157 {"real-server":"srv2","name":"connect.sid","session_id":"OlxEv3KPZHqZ0J0Nby0FVaEO_S8x061D","expires":300000} And, pulling it all together, here's the full script. Happy cloning! Please leave a comment or reach out to us with any questions or suggestions and if you're not a LineRate user yet, remember you can try it out for free.347Views0likes0CommentsIntroducing LineRate Lightning series (and Snippet #1 - HTTP referer blocking)
We're big fans here of the iRules 20 Lines or Less series for F5's BIG-IP. LineRate proxy uses a Node.js scripting engine embedded into the HTTP data path, so we can't directly use iRules scripts. However, the power and flexibility of the Node.js engine allows us to do some pretty cool things - and a lot of these things don't require much code and can be implemented quickly. The LineRate Lightning series will contain snippets of code that aim to be quick, powerful and maybe a little bit flashy - kind of like real lightning! If you want to know more about scripting, be sure to check out the official documentation and the LineRate Scripting Guide. And remember that the F5 LineRate staff monitors DevCentral closely so this is a great place to get all your questions answered. If you're already a LineRate user, you have a good handle on what we can do with scripting. If you're not, this will be a great way for us to show you. Sit back and hold on while we ride the lightning. This first snippet is a simple one that does HTTP referrer blocking based on a whitelist of permitted referrers. Simply add the refering domains that you'd like to permit in the domain_whitelist list and change vs_http to match the name of your virtual server. Here's the script:304Views0likes0CommentsA LineRate script with lengthy initialization process
In some cases, your LineRate script needs to gather necessary information before processing any incoming HTTP request. It could be user profiles such as customer categories (e.g., paid vs. free, or test vs. live), GeoIP database, list of sites that should be blocked, or data translation tables for converting contents of specific headers. For this, typically, the script needs to access external database or file servers, which may take some time. For this type of synchronous (or sequential) processing, we normally employ the Async NPM module: e.g., async.series([initialization, process]); The async.series() method ensures that the process function (2nd element in the array) kicks off after completion of the initilization function (1st element). Well, what happens when HTTP requests hit the LineRate while the script is still initializing? The LineRate will pass the requests through to backend servers without any processing. This means, under A/B Web traffic steering scenario where test and live customers are directed to the test and production servers respectively for example, both customers are forwarded to the same server (where the requests go to depends on how you configure LineRate). This might be acceptable behavior in some implementations, but in some cases, you might want to block access completely until the system becomes ready with full information. This can be achieved by preparing a function that starts up quick and runs only during the initialization process. Using the async.series() method, it can be coded as below: async.series([process_pre, initialization, process_post]); The process_pre registers a listener function to the request event. The listener is active once process_pre kicks in and during the process of initialization. Once the initialization completes, you de-register the existing listener for the request event before you register the new listener for the post-initialization processing. De-registration is necessary because you can only register one listener to virtualServer's request event. This can be done by the Event.removeListener() method as below. VirtualServerModule.removeListener('request', process_pre); In the sample Node.js code below, the initialization function ( initialize) does nothing but 10s sleep. While initializing, the 1st listener ( process_pre) returns the 503 "Service Unavailable" error message to clients. It also returns the Retry-After header to indicate the maximum time for the service to become fully available. You can replace it with your intialization codes (e.g., database population or remote file access). As this is just an sample, the post-initialization function ( process_post ) does not do much: It just inserts a proprietary header for debugging purpose. 'use strict'; var vsm = require('lrs/virtualServerModule'); var async = require('async'); var vso; var sleep = 10; // sleep time (s) var message = 'Service Unavailable\nTemporarily down for maintenance' // Script initialization required to do any request processing // - assuming it takes a long time to complete. var initialize = function(callback) { setTimeout(function() { callback(null, 'initialization completed'); }, sleep*1000); }; // Request handling before initialization // Send back 503 "Service Unavailable" var process_pre = function(servReq, servResp, cliReq) { servResp.writeHead(503, { 'Content-Type': 'text/plain', 'Content-Length': message.length, 'Retry-After': sleep}); servResp.end(message); }; var register_pre= function(callback) { vso.on('request', process_pre); callback(null, 'pre-initialization processing'); }; // request handling (after initialiation) var process_post = function(servReq, servResp, cliReq) { // do whatever you need to do to the requests with the info obtained // through the initialization process servReq.addHeader('X-LR', 'LineRate is ready.'); cliReq(); }; var register_post = function(callback) { vso.removeListener('request', process_pre); vso.on('request', process_post); callback(null, 'post-initialization processing'); }; vsm.on('exist', 'vs40', function(v) { vso = v; async.series([register_pre, initialize, register_post], function(e, stat) { if (e) console.error(e); else console.log(stat); }); }); if (! vso) { console.log('vs40 not found. Script is not running properly.'); } LineRate may spawn multiple HTTP processing engines (called lb_http) depending on a number of vCPUs (cores/hyper-threads). In the code above, all the engines perform the same initialization process. However, if the result from an initialization can be shared amongst the engines, you might want to run it just once on a designated process for all the processes: e.g., data population onto a shared database.While you can achieve this by running the initialization only on a master process using Process.isMaster() method (LineRate extension), you now need to come up with a mechanism to share information amongst the processes. You can find data sharing methods in "A Variable Is Unset That I Know Set" in our Product Documentation or "LineRate and Redis pub/sub" DevCentral article. Please leave a comment or reach out to us with any questions or suggestions and if you're not a LineRate user yet, remember you can try it out for free.294Views0likes0CommentsEnforcing CORS With LineRate
As the web became more popular web applications became more complex. When the idea of adding a scripting functionality within a web-browser was conceived the security model assumed certain things about what "secure" interaction between the browser and the web server looked like; this led to the "Same Origin Policy" (SOP). The SOP allowed browsers to decide when a script within a browser was trying to do something the policy did not allow, namely using browser scripting in the context of one page (aka the "origin") to interact with resources originating from a completely different site that the browser perceived as being from a different "origin". The goal is to prevent exploitation of "Cross Site Scripting" (XSS) vulnerabilities, allowing a browser to detect when script code is doing something that violates the SOP, such as making XMLHttpRequest() calls within a page originating from origin "A" to a resource originating from origin "B". As the web continued to grow in popularity, web content became more abundant, and "mashups" began to appear, and some web sites even derived 100% of their content from other sites. In order to get around SOP restrictions, mashup sites could perform the fetching of content on their own servers, and serve the mashup content to the clients from one single origin they controlled. This method put additional bandwidth and storage demands on the mashup site operators, and the same limitations were faced by any other web application developers wanting to use this same technique to include external data sources in their applications. An updated browser security policy would be needed for efficient mixing of content from different remote sites within the same application. Browsers enforce the SOP, and although web servers can enforce restrictions on requests based on their header values (such as allowing or blocking requests based on the HTTP "Referer" header) web servers didn't originally have any way to indicate to a browser whether or not information it provided was expected to be used in other applications or not. In response to the demands of web application developers, the W3C developed CORS: Cross Origin Resource Sharing. CORS provides the means by which browsers can communicate their application origin to servers and request a resource's origin access policy information from servers, and gives servers a way to indicate access policy to browsers and a framework to determine the validity of a cross-origin request. The following list of browsers have CORS support built in: Chrome 3+ Firefox 3.5+ Opera 12+ Safari 4+ Internet Explorer 8+ More details on CORS support can be found here. Scripting a Solution The strategy we are going to employ for scripting CORS enforcement in the LineRate scripting environment is to create a "CorsService" object prototype as part of a module, and then require the module in a script where we can apply the CORS enforcement on a per-virtual server basis. Since not all applications will be identical, one of the parameters to the CorsService object's constructor is a configuration object with the following convention: { < Absolute URI where CORS is enforced >: { "origins": [ < List of origins allowed to query the parent URI > ], "allowCredentials": < Boolean indicating whether Cookies are allowed to this URI >, "allowHeaders": [ < List of headers to include as "allowed" > ], "methods": [ < List of allowed HTTP request methods > ] }, < Another URI to enforce if we want... >: { ... }, "maxage": < integer value indicating the validity of information received in a "preflight response" > } When we create the new "inline script" at the LineRate CLI we use the above convention to create the configuration of our CorsService object. The following command creates the inline script which will protect the URI "/api/v1.0/json" , allowing access from two different origins, accessible by the GET and HEAD methods, allowing a special header, and disallowing the Cookie header: Script Logic We take advantage of the NPM async module's waterfall functionality when implementing our CORS enforcement. We chose the "waterfall" method over other methods (e.g. serial, compose, etc) because of the waterfall method's default function, giving us a kind of error handling mechanism that simplifies how we need to write our functions implementing the CORS protocol. The high level logic is that we compose small functions that accomplish each discrete step in the CORS protocol handling process, and any time we encounter a condition that prevents us from providing a valid CORS response, we need to fall through to the default function. Our default function is a no-op when we provide a valid CORS response, otherwise we either bypass CORS or reject the request. In CORS, rejecting a request amounts to a 200 OK response with no CORS headers or body, resulting in the browser emitting an error event to the script function requesting the resource (e.g. XMLHttpRequest() would need to register a listener for the error event to detect this condition) At a more detailed level the async waterfall function list actions are: Check the request against the CorsService configuration, and verify whether we enforce the URI or not and if we do enforce, do we have a valid Origin? Check to see if this is a "preflight" request according to the CORS standard. If this is a "preflight" process the response as such, otherwise proxy the request to the real-server and add necessary response headers. DEFAULT ACTION: proxy the request and pipe back the response when the request URI is not enforced, else reject the request when the request is deemed to be an invalid CORS request. And here's the final script (NOTE: this example may require modification to work properly in your environment. Feel free to ask any questions about it on F5's Dev Central boards!) Remember: you can always try this code out with a free (as in forever) tier license for LineRate by visiting linerate.f5.com/try.287Views0likes0Comments